«“There is a great need for theorization precisely when emerging con gurations of data might seem to make concepts super uous to underscore that ...»

Jeremy Huggett

2 Digital Haystacks: Open Data and the

Transformation of Archaeological Knowledge

“There is a great need for theorization precisely when emerging con gurations of data might

seem to make concepts super uous to underscore that there is no Archimedean point of pure

data outside conceptual worlds. Data always has theoretical enframings that are its condition of

making... ”(Boellstor, 2013).

. Introduction

Since the mid-1990s the development of online access to archaeological information has been revolutionary. Easy availability of data has changed the starting point for archaeological enquiry and the openness, quantity, range and scope of online digital data has long since passed a tipping point when online access became useful, even es- sential. However, this transformative access to archaeological data has not itself been examined in a critical manner. Access is good, exploitation is an essential compo- nent of preservation, openness is desirable, comparability is a requirement, but what are the implications for archaeological research of this ow – some would say del- uge – of information? Lucas has recently pointed to the way archaeological reality can change as a consequence of intervention: as archaeologists change their mode of in- tervention so reality shifts and interpretations change (Lucas, 2012, p. 216). If this is true of archaeological practice, to what extent might the change in our relationship to data – the move from traditional modes of creation and access to digitally-enhanced methods – represent a potential paradigm shift in our archaeological reality, or place limits on future changes? As more data are ‘born digital’ with access to them open to an increasingly wide audience, is it realistic to assume that archaeological knowledge itself remains unchanged in the process? How does our relationship with archaeo- logical data change as the observations, measurements, uncertainties, ambiguities, interpretations and values encapsulated within our datasets are increasingly subject to scrutiny, comparison, and re-use? What are the implications of increasing access to increasing quantities of data drawn from di erent sources which are more or less open, more or less standardised, and increasingly reliant on search tools with greater degrees of automation and linkage? Given the fundamental – and frequently contested – nature of archaeological data, it is surprising that the implications of open access to those data remain largely uncontested. Instead, archaeology’s digital haystack repre- Jeremy Huggett: University of Glasgow, Glasgow, UK © 2015 Jeremy Huggett This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 License.

sents a largely unexplored set of practices mixing old and new in the creation of new infrastructures which transform the packaging, presentation, and analysis of the past.

Examining this entails revisiting the notion of the ‘archaeological record’ within the context of the new technological frameworks, and considering the consequences of this digital data intervention.

. Openness and Access

Open archaeology has been a concept receiving increasing attention in recent years, most evidently in an issue of World Archaeology which sought to extend awareness of the implications of open approaches to a wider archaeological audience (Lake, 2012, p. 471). As Lake observes, and as re ected in that issue and this volume, openness can cover the use and reuse of software, publications, creative works, and data, although within the archaeological debate attention has until recently focussed extensively, though not exclusively, on publication.

The most common starting point for considering ‘openness’ is the Open De nition: “A piece of data or content is open if anyone is free to use, reuse, and redistribute it – subject only, at most, to the requirement to attribute and/or share-alike.” (Open De nition, 2014). Archaeology may seem to be well-served with free access to archaeological data via organisations such as the Archaeology Data Service in the UK, tDAR and Open Context (USA), DANS (Netherlands), as well as national heritage organisations (for example, Royal Commission on the Ancient and Historical Monuments of Scotland, English Heritage) and regional Historic Environment Records. However, with some exceptions, much of this data is only partially ‘open’, leaving Kansa to suggest that openness remains largely at the margins of archaeological practice (2012, p. 499). In part, this is a consequence of distinctions between di erent levels of ‘open access data’ and ‘open data’. For example, a hierarchy can be de ned in increasing

order of ‘openness’:

1. Open access data which provides online access to view datasets, limited only by a presumption of Internet access and the requirement for a modern web browser.

Use of the data beyond viewing and searching online is restricted (commonly seen with most Historic Environment Records, National Monuments data and including commercial organisations such as CyArk etc.). A variant of this approach enables a map to be created on demand within desktop GIS software. This generally entails access to Web Mapping Services (WMS) which provide a graphical image as output, with limited functionality beyond the image itself. These are typically available for National Monuments data accessed via open government websites such as data.gov.uk.

2. Open access data which returns summary geographical information as a downloadable output of a search query or via Web Feature Services (WFS). This can then be further analysed using GIS software as if the data were held locally. For

example, the Archaeology Data Service’s ArchSearch has download functionality for registered users, and Historic Scotland/RCAHMS’s PastMap similarly enables summary location data to be accessed via downloadable comma-separated values les. Currently most WFS feeds in archaeology are used internally within organisations, or to create interoperable services from multiple feeds (resources such as PastMap itself, and Scotland’s Places) but are not accessible more widely (for example, McKeague et al. (2012)). Leaving technical issues aside, in part this seems to arise out of concern to limit bulk downloads of data: hence downloads from ArchSearch or PastMap are restricted to one or two hundred records at a time, for example.

3. Open access data consisting of entire datasets which can be downloaded but where restrictions apply to the use and reuse of data and hence is not truly open data in the technical sense. For example, the Archaeology Data Service Common Access Agreement (Archaeological Data Service, n.d.) speci es that the data should only be used for teaching, learning, and research purposes, although the de nition of ‘research’ is drawn very broadly such that it includes commercial funding, and the primary condition is that the results are placed in the public domain. In other cases, the restriction is more of a ‘health-warning’: for instance, the PastMap terms and conditions specify that the data provided is intended for information only and that professional advice should be sought to properly interpret it, emphasising the need to understand its limitations (PastMap, 2013). On the other hand, English Heritage’s Heritage Gateway applies strict copyright restrictions to data accessed and downloaded from the site (Heritage Gateway, 2007).

4. Open data which has no exclusions or restrictions on use, and conforms to the Open De nition or the most permissive Creative Commons licenses. In general these datasets relate to speci c projects, sites, or collections. For example, in the United States both Open Context and tDAR organisations use the Creative Commons CC-BY licence which enables the data to be shared and reworked, simply requiring attribution or citation of the original work. As Kansa points out, certain datasets within the Archaeology Data Service collections are now also governed by the CC-BY license rather than the standard terms and conditions (Kansa, 2012, p. 507).

Much archaeological data therefore is not truly ‘open’, and recent papers on open data in archaeology tend to focus on the desirability of increasing openness and the restrictions and impediments to achieving it (for example, Beale 2012; Beck and Neylon 2012;

Bevan 2012b; Kansa 2012). These are not new issues: for example, in a discussion of copyright and archaeological data in 1997 Carson asked: “Who owns the right to reproduce raw data? Who owns the right to publish a manipulated version of that data?

And who owns the right to produce second-generation items, such as models, from that data?” (Carson, 1996, p. 291). The ethical responsibility of archaeologists to make

their data available is frequently cited: for example, Carson argues that:

“Archaeologists, like other scientists, have an ethical obligation to publish, and to allow others to critique, their ndings. Publishing data sets in machine-readable form is the ultimate expression of this obligation, in that others are free to analyze the basis of an archaeologist’s ndings and come to their own conclusions.“ (Carson, 1996, p. 316).

Kansa puts the case more strongly, arguing that “the discipline should not continue to tolerate the personal, self-aggrandizing appropriation of cultural heritage that comes

with data hoarding” (2012, p. 507) and goes on to say:

“Failure to incentivize greater data transparency would demonstrate an egregious failure of leadership and utter dysfunction in a discipline supposedly devoted toward building and preserving knowledge of the past.” (2012, p. 507).

Most professional archaeology codes of practice emphasise this link between the stewardship of the past and the requirement to report and publish and to preserve the records made, including computer data. For example, the Institute for Archaeologists in the UK speci es that the results of archaeological work should be made available with reasonable dispatch (Institute for Archaeologists, 2013, Principle 4) and establishes that this includes the analysis and publication of data (Institute for Archaeologists, 2013, 4.4). In the light of this it would be tempting to ask why more open data is not available. One reason may be that the ethical codes emphasise that rights of primacy exist: in the case of both the IfA and the European Association of Archaeologists this persists for up to ten years (Institute for Archaeologists 2013, 4.4; European Association of Archaeologists (1997, 2.7)), although the Archaeological Institute of America, the Society for American Archaeology, and the Canadian Archaeological Association, for example, only specify the need to make results available in a timely fashion and to make evidence available to others within a reasonable time (of America 2008, I.4; Society for American Archaeology (1996, 5); Canadian Archaeological Association (n.d.)). Consequently rights of primacy may restrict access to data and, without enforcement, the timescales speci ed may be stretched: indeed, there is a long and unfortunate history of archaeological archive data being retained by an individual for a lifetime. In such a context, Kansa’s expostulation is understandable.

One issue regularly raised in relation to open archaeological data is that they frequently include spatial information which might facilitate looting (for example, Bevan 2012b, p. 7–8; Kansa 2012, p. 508–509). Degrading the quality of spatial data and making full resolution data available only to ‘approved’ users are approaches that have been adopted, but restricting access like this ies in the face of open data requirements. Other common arguments about the limits to open data relate to authority and the risk of reducing con dence as a consequence of revealing discrepancies and errors in the data. With datasets consisting of millions of records in some cases, it would be surprising if errors did not creep in, especially as the data are increasingly manipulated by automated means. Whether this damages the authority of the data is open to question: arguably issues with the data such as di erent levels of precision of lo

cational information are likely to be more problematic for would-be users than the occasional rogue item.

. Openness and Reuse In the light of the pressures for access to open data it is perhaps worth emphasising that there has been no empirical study of the demand for open data in archaeology.

This means that, to a large extent, the level of demand remains undemonstrated and unquanti ed. However, a recent study of the Archaeology Data Service sought to evaluate and quantify the ‘value’ of online access to data (Beagrie and Houghton, 2013).

It employs a range of approaches to assessing value: for example, investment value (amount invested in the services), use value (amount spent by users to access the service), contingent value (for instance, how much people would be willing to pay).

