This article is available at the URI http://dlib.nyu.edu/awdl/isaw/isaw-papers/20-2/ as part of the NYU Library's Ancient World Digital Library in partnership with the Institute for the Study of the Ancient World (ISAW). More information about ISAW Papers is available on the ISAW website.

©2021 Ryan Horne; text and images distributed under the terms of the Creative Commons Attribution 4.0 International (CC-BY) license.
Creative Commons License

This article can be downloaded as a single file

ISAW Papers 20.2 (2021)

Applying Linked Open Data Standards

Ryan Horne, University of California Los Angeles

In: Sarah E. Bond, Paul Dilley, and Ryan Horne, eds. 2021. Linked Open Data for the Ancient Mediterranean: Structures, Practices, Prospects. ISAW Papers 20.

URI: http://hdl.handle.net/2333.1/79cnpgk2

Abstract: This chapter serves as an introduction to the essential concepts of Linked Open Data (LOD). Key to many digital humanities initiatives and research, LOD is a collaborative endeavor built upon open access principles; the discovery, reuse, and expansion of a vast array of data; and new avenues of review and publication which can differ greatly from traditional scholarly methodologies. LOD has already revolutionized the field of classical studies, with an ever-growing network of interlinked projects that offers exciting new opportunities for exploring spatial, textual, and material culture. In order for a project to take full advantage of the LOD ecosystem, it is necessary to align data with other LOD participants. This chapter provides practical examples and best practices for preparing project data, reconciling that data with other LOD initiatives, and publishing data as a LOD resource. It discusses new tools and techniques that make this process accessible to a wide audience of scholars and students, and provides a useful reference work for the other chapters in this collection.

Library of Congress Subjects: Data curation; Linked data--Standards.

Introduction: LOD - Terror and Promise

Linked open data (LOD) can be a frightening concept. Far from the conventional model of research, where a monograph or article is a well-formed, clean, and polished culmination of inquiry conducted in private, LOD is public, often messy, and rarely seen as “finished.” Somewhat equivalent to electronically providing notes or a card-index, LOD is a collaborative enterprise which stresses open-access, connectivity, and new ways of thinking about publication and research which are often at odds with traditional models of recognition and reward. LOD also pushes many practitioners out of their traditional areas of expertise, with unfamiliar and often intimidating terms, technical jargon, and challenging infrastructure requirements.1

Despite these supposed obstacles, there is a remarkably robust community of practitioners and projects that embrace LOD methodology in the humanities, and in particular in the field of ancient studies. In the last decade there has been a fundamental shift in the way that digital projects of the ancient world are presented and created.2 Unlike some monolithic digital projects of the past, this new digital ecosystem is built upon shared data, connectivity between different initiatives, and a thriving community that is redefining interdisciplinary approaches and collaborative work.3 Although the number of ancient world studies projects that use LOD as a core component of their development is rapidly expanding, the intricacies of creating, curating, and using LOD can still seem complex and overwhelming. What follows is an overview of structuring and preparing data to get the best use of LOD resources.

LOD: A (Quick!) Overview

First, it is important to understand the concepts underlying linked data and LOD. Linked data has essentially four components: it is data that is on the web; it has a stable address (a URI) that provides information about a resource; that information is in a standard format; and there are links to other URIs about the same or related subjects.4 For example, a project that centers on the Roman empire could have a page on the city of Rome with a URI of exampleproject.org/Rome, which fulfills the first two conditions. This same project could have representations of Rome in standard data formats (explained below) at exampleproject.org/Rome/json or exampleproject.org/Rome/rdf, and within this data there could be links to other resources about Rome (such as https://pleiades.stoa.org/places/423025) which are themselves URIs. For a project to practice LOD, all of the above has to be true and the data needs to be released under an open license.5 

The use of an open license is essential for the success of LOD. Although individual licenses and terms vary, without the ability for any project or user to access data without restrictions beyond requiring citations could create pockets of inaccessible, yet still referenced, data. This is not only opposed to the spirit of the LOD movement, but such a situation would prevent the automatic retrieval and use of data, rendering the LOD approach moot. The essential benefit of LOD is that data about the same concept (Rome for instance) can be accessed and shared across different projects. Perhaps one project, such as Pleiades, focuses on the location of ancient places, while another project, SNAGG: Social Networks and Ancient Greek Garrisons, catalogs and studies different garrisons in the Greek world. If Pleiades offers its geospatial information about ancient locations in a LOD format, then the SNAGG project can link its records about garrisons to corresponding Pleiades records about places and gain all of the geospatial data and information in Pleiades. At the same time, SNAGG can link to other projects that also use Pleiades IDs; if SNAGG became interested in the minting activity associated with garrisons, it can automatically access all of the numismatic data (mints, coin hoards, production, and even individual coins) that are offered by the Nomisma project, which associates Pleiades IDs with ancient mints.6

This series of linked projects, all using recognized identifiers, LOD principles, and common data formats collectively forms a LOD ecosystem or LOD cloud. There is no central controlling organization or committee that determines who or what can participate; the LOD ancient studies community is constantly evolving as more project link their data to one another. From discovering geospatial data, and quantifying coin production to the distant reading of texts, creating prosopographies, and studying Syriac, the LOD ancient studies ecosystem offers ever-expanding ways for a project to discover and contribute insights into the ancient world.

The Basics: Getting Your Data Ready for LOD

The first step for any potential LOD project is understanding the nature of your data. Is the project focused on textual analysis? Is there a geospatial component? Does the research study individuals, collectively or individually, real or mythical? Are there any objects, whether they be coins, vases, or ships, artworks or tools, that feature in the study? What is the relationship between these people, places, objects, and concepts?

All of these questions are related to the entities under consideration. Even if they are not traditionally enumerated outside of a print index, each individual, place, object, or concept in a study can be treated as a discrete data record. What this means in practical terms is that each entity could be given its own row in a spreadsheet or database, with relevant information entered into each column or field. For instance, a person record should address certain common metadata categories: a common name, birthplace, date of birth, date of death, political affiliation, or any other number of characteristics. A place record could include geographic location, population size, references in primary sources, etc. Whatever the content of a record, each should have a unique ID (this can be any arbitrary but unique sequences of numbers, characters, or both) that differentiates it from other records. This is somewhat analogous to a social security number for each data element; although people and places can (and often do!) change and/or share names and affiliations, an individual ID will have a one-to-one, unchanging relationship with its record.7

Such work is an essential element for modeling and understanding the complex data of humanities research. As mentioned above, people and places could be known by any number of names in any number of languages at any number of different times. For example, The Eternal City, Roma, and Caput Mundi are different names which nevertheless refer to the same conceptual entity of Rome. At the same time the exact meaning of that entity to different audiences can change; for some, Rome is home, for others it is a workplace, and for another group it is a vacation destination. At the same time a place, concept or person may have different emotions attached to it; Rome could be a place of celebration, indifference, trauma, or hatred. By creating a system where each entity has a unique ID, any number of different meanings, names, and values can be related to the same conceptual subject, allowing any number of computational and semantic queries to be performed on the data. Much like a human knows that various names refer to the same conceptual entity of Rome, modeling data in this way allows a computer to understand the same connections between different definitions, names, and any number of subjects.

The essential methodology for creating such a system is already embedded in many research practices. For instance, a study of Cicero’s letters may wish to know how many letters there are, who was the intended recipient for each letter, what people and places were mentioned in the letters, and where the letters were written. Similarly, a project of Greek garrisons may wish to identify garrisons, their locations, commanders, and establishing authority. Simply listing and assigning a unique ID to the various answers to these research questions creates the basic elements of a record. Using our examples above, each one of Cicero’s letters should be a new row in a spreadsheet, as should each establishing authority of a Greek garrison. Additional information, such as references to ancient sources, categories for the place/person, or extended notes for each record, should be entered into new columns.

Illustration 1: An example of a database with each instance of a Greek garrison in literature, papyri, or inscriptions assigned a unique ID. In addition, each entry is associated with a particular Pleiades ID and has additional columns describing different aspects of the garrison.

After establishing an entity and its various data fields in a database, the next task is to describe its relationship to other records. For projects that wish to use a form of social network analysis (SNA), this often takes the form of another spreadsheet/table which focuses on the edges, or connections between different entities (nodes in SNA terminology). In its most basic form such a table has a source column (the node about which the connection is about) and a target (what the source is connected to). For example, a project may wish to model the Seleukid kingdom as a sum of its component communities, with each edge taking the form of “community x is part of political entity y”. Such a relationship can be modeled as Abydos → subordinate → The Seleukid Kingdom (see Illustration 2). This spreadsheet/table can have additional columns further specifying the type of connection, source references, and human-readable titles in addition to the source and target IDs. In addition, each connection should have its own unique ID (which can simply be a sequence of numbers or some other arbitrary scheme) so they can be disambiguated and offered as LOD in their own right.