This article is available at the URI http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/kansa as part of the NYU Library's Ancient World Digital Library in partnership with the Institute for the Study of the Ancient World (ISAW). More information about ISAW Papers is available on the ISAW website.

Except where noted, ©2014 Eric Kansa; distributed under the terms of the Creative Commons Attribution License
Creative Commons License

This article can be downloaded as a single file

ISAW Papers 7.10 (2014)

Open Context and Linked Data

Eric C. Kansa

Introduction

Archaeologists have long grappled with the challenges inherent in data sharing. They have traditionally relied on monographs and site reports to communicate, in detail, the results of excavations and surveys. However, growing dependence on digital technologies has eroded the utility of these traditional dissemination strategies. Archaeologists now collect far more (digital) documentation than can be feasibly and cost-effectively shared in print. There is also more to digital data than sheer quantity. Archaeologists routinely organize data into structures (usually tables or relational databases) in order to use software to search, query, analyze, summarize, and visualize data. As interest in structured data grows, archaeologists need new venues to access and share structured data.

“Data sharing” usually means sharing structured data in formats that can be easily loaded into data management software (ranging from Excel, to a GIS, to something more specialized), queried, visualized and analyzed. New rules imposed by granting agencies, especially “data management plans”, as well as changing professional expectations are all converging to make data dissemination a regular aspect of their scholarly communications. Archaeologists increasingly recognize the need to preserve the documented archaeological record by accessioning data into preservation repositories. At the same time, more researchers regard data sharing an aspect of good professional practice, so that data underlying interpretations and narratives of the past are available available for independent reinterpretation.

The following discussion outlines Open Context’s current approach to publishing archaeological data. The discussion explores ways Open Context attempts to situate data dissemination in professional practice, particularly with respect to Linked Data approaches toward making data easier to understand and use.

Why a “Publishing” Metaphor for Data?

While we currently see increasing interest in the management, preservation and sharing of structured data, we still do not have well-established venues and processes to support these activities (Faniel et al 2013). Many researchers focus on the need to preserve these data, especially because of the destructive nature of many archaeological field methods. Though data archiving is of critical importance, data management needs extend well beyond preservation for the sake of preservation. To be understood and useful in the future, and to be comparable to other datasets, datasets usually need rich documentation and alignment to standards and vocabularies used by other data sources. Though researchers often see integration as a desirable goal in data sharing, the challenges inherent in documenting and describing data for reuse, especially reuse that involves integrating data from multiple projects, need to be better understood.

Preparing data for reuse, especially integration with other data, can involve significant effort and special skills and expertise. Most archaeologists are not familiar with RDF, ontologies, controlled vocabularies, SPARQL or a whole host of other Web related technologies and standards. While wider appreciation and fluency in these technologies will be most welcome, not every archaeologist needs to become an expert Web technologist. Just as we do not expect every archaeologist to personally develop all of the expertise needed to run a print publication venue, a neutron activation analysis lab, or other specialization, we should not expect every archaeologist to become a Web technology guru. In other words, data dissemination can often benefit from collaboration with specialists that dedicate themselves to exploring informatics issues.

Collaborating with “informatics specialists” can take multiple forms. With Open Context, an open access data dissemination venue for archaeology, we are adapting a “publishing” model to help set expectations about what is involved in meaningful data dissemination involving the support of people specializing data issues (Kansa and Kansa 2013). The phrase “data sharing as publication” helps to encapsulate and communicate the investment and skills needed to make data easier to reuse. It conveys the idea that data dissemination can be a collaborative undertaking, where data “authors” and specialized “editors” work together contributing different elements of expertise and taking on different responsibilities. A publishing metaphor also helps communicate the effort and expertise involved in data sharing in a metaphor that is widely understood by the research community. It helps to convey the idea that data publishing implies efforts and outcomes similar to conventional publishing. Ideally, offering a more formalized approach to data sharing can also promote professional recognition, helping to create the reward structures that make data reuse less costly and more rewarding, both in terms career benefits and in terms of opening new research opportunities in reusing shared data.

Publishing Linkable and Linked Data

We initially launched Open Context in 2007 and the site has gone through a number of iterations reflecting both our growing understanding of researcher needs and reflecting larger changes on how scholars use the Web. Over the past few years, we have moved to a model of “data sharing as publication” in order to publish higher-quality and more usable data. Similar to the services conventional journals provide to improve the quality of papers, we provide data editing and annotation services to improve the quality of the data researchers share. Part of our shift toward greater formalism in sharing data centers on increasing our participation in the world of “Linked Open Data”.

Linked Open Data represents an approach to publishing data on the Web in a manner that makes it easier to combine data from different sources. It is an inherently distributed approach to promote the wider interoperability and integration of structured (meaning easily computable) data. Open Context contributes the larger body of Linked Open Data resources in two main ways (see also Kansa 2012):