This article is available at the URI http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/hafford/ as part of the NYU Library's Ancient World Digital Library in partnership with the Institute for the Study of the Ancient World (ISAW). More information about ISAW Papers is available on the ISAW website.

Except where noted, ©2014 William B. Hafford; distributed under the terms of the Creative Commons Attribution License
Creative Commons License

This article can be downloaded as a single file

ISAW Papers 7.7 (2014)

Linked Open Data and the Ur of the Chaldees Project

William B. Hafford

Research, at its core, is the act of making connections among data – building up systematically to a supportable idea or conclusion. The basic components, therefore, are the data themselves, the individual points that demonstrate the concept that unites them. To restate the matter: Researchers are dependent on their data.

Yet, data are not always easy to acquire, and many different researchers may end up gathering the same or similar data many different times, slowly building toward their own conclusions. If the data were readily accessible, already linked to similar instances or searchable in such a way as to complete the larger scale grouping for analysis, research into that data would be faster, easier, and would allow for less duplication of effort.

Such is one major idea behind linked data on the web. Imbedded hyperlinks in online documents have long led us to other documents that might be of use in finding more information, but data points within those documents have not been quickly extractable, and digital data repositories have for long periods existed in relative isolation. If computers can find and access similar data across many data stores, research becomes far more powerful.

As researchers of the ancient world, Archaeologists face the problems of any researcher: often the act of gathering data from various reports and repositories, physical or virtual, takes far longer than the process of connecting those data in order to come to some understanding of the ancient concept or practice being investigated. One person studying, for example, figurines from an archaeological site, may comb through field notes for occurrences of the objects, spending days, weeks, or months locating every one. Another person may later go through the same notes for occurrences of amulets or statuettes, covering many of the same items and spending their own days, weeks, or months. Perhaps the work of the earlier researcher has guided them somewhat to make their search more effective, but they likely still would have to go through every field note to see if items that meet the new criteria were missed. In a digital age, this sort of collecting of data can be done very quickly--if the data is arranged in a machine-readable way.

Such is the beauty of linked data on the web. They are published in their own self-defining schemas, related to other schemas wherever possible (Heath and Bizer 2011: p85-86, 99). This allows the computer to make connections across data and across different data stores. It would thus be possible to search not only figurines, statuettes, and amulets from one archaeological site, but from all sites published as linked data on the web.

To facilitate such research, archaeological data--especially those from early excavations that are at the foundation of our understanding of major portions of the ancient world--should be made accessible and machine-readable in a linked (and openly available) manner. This is one of the driving concepts behind the Linked Ancient World Data Initiative, and behind the project being conducted for a particular site entitled Ur of the Chaldees, a Virtual Vision of Woolley's Excavation. This project is jointly conducted by the University of Pennsylvania Museum and the British Museum with lead funding from the Leon Levy Foundation. Sir Leonard Woolley excavated at the ancient city of Ur from 1922 to 1934, uncovering huge amounts of private housing and public buildings as well as religious, industrial, and funerary areas from at least the Ubaid through the Persian periods--some 5,000 years of occupation. In so doing, he created an enormous amount of data, far more than just the tens of thousands of artifacts he recorded and sent to Baghdad, London, and Philadelphia.

During the excavation, more than 15,000 field catalogue cards were produced, covering more than 25,000 artifacts. Field photographs from the twelve years number 2,350 and at least 4,500 hand-written field notecards were also produced, used later to aid in publication and then stored in the British Museum. The actual publication record was good, with yearly reports appearing in the Antiquaries Journal, and eventually ten volumes on the excavations and nine volumes on the cuneiform texts, though the full series took some fifty years to produce.

These publications have long been held to be the definitive record of Ur and they do indeed hold much vital material. But the volumes, as extensive as they seem, say far from everything that can be said about the site. Moreover, they contain interpretations and only part of the data on which those interpretations were based. Woolley was a good archaeologist, but he could not publish every object nor completely explain every decision when limited to the slow process of paper publication. With the advent of linked open data on the web, it is possible to present it all in machine-readable formats. Furthermore, in this format any grouping made by Woolley or anyone else can be quickly deconstructed and new ones created to allow for still newer interpretations or reanalysis of the old.

The excavation of Ur led to much of our current understanding of the ancient Near East and as such is clearly important to analyze and reinvestigate. But there are many other reasons to aggregate the data from it and others like it. Not only will this allow for new visions, but it will also unite and protect the information.

First, unification: The current information is physically divided. Laws of the early 20th century typically allowed for finds to be split between host nation and excavating institutions. This meant that from Ur, half of the artifacts went to Baghdad and the other half was split between London and Philadelphia. Even the archival documents are relatively dispersed. Gathering the data from all of the institutions will reunite the excavated portions of the city in a virtual space.

Second, protection: In the wake of the second Gulf War the Iraq National Museum was looted, demonstrating the fragility of our hold on physical data even in the modern age. The loss of cultural heritage is tragic, but at least if the information were recorded in a virtual space, there would remain researchable data for the future. Much of the material looted from the museum was eventually returned, but even in the case of some items protected in the Rafidan Bank vaults or moved to other secret locations before the war, there was some damage due to the impromptu storage conditions (chiefly the Nimrud Ivories, see McCauley 7/2/2010).

Even in the western museums, some artifacts have been lost to environmental conditions over decades of storage and a few items are now listed as ‘Not Accounted For’ with no clear understanding of how they went missing. Misplacement, loss, damage, or theft can occur anywhere. This is not to say that such occurrences should be overlooked, but there will likely be a small percentage of loss no matter what actions are taken to prevent it. Thus, every object in the care of museums must be carefully recorded to mitigate loss and to demonstrate the importance of each piece. The data must then be made available to researchers for continued understanding of these objects, individually and in the aggregate.

After the looting of the Iraq National Museum in 2003, the British Museum and the Penn Museum began to look at their collections in hopes of assisting Baghdad with understanding what may have gone missing. Ur was a site that provided many of the first entries into the Iraq Museum, since the modern country itself was being formed at the time those excavations began. The records of artifacts in the IM were not as complete as they might have been, but if Philadelphia and London could show what they had from this most important site, Baghdad could better assess their own collection. As it turned out, the recording in the two western cities was not as complete as might be hoped either, and thus began a long project to upgrade records with the goal of helping all three museums reunite Ur in a virtual space.

It began by looking to a list of artifacts from the excavations but quickly expanded with the realization that there was much more information that should be digitally shared. Archives, photos, and field notes all held information on how and why Woolley had come to the conclusions he had. Furthermore, the artifacts were often not connected back to their field data, having lost their field numbers. The potential was there to reconnect them and to put them all online for everyone’s use. But, in 2003 digital scholarship had not advanced to the level where such data could easily be published. It was possible, but it would take a great deal of money and time. Funds were slow in coming and small in number. Thus the earliest project years managed to gather only sporadic material such as medium resolution scans of field notes in the British Museum and field photos in the Penn Museum.

Finally, in 2011 with the increased ability of and interest in digital humanities, the Leon Levy Foundation graciously granted funds to conduct an exploratory year during which an assessment of work to be done could be made. The project set about determining how much Ur material was outside of Iraq, how scattered it was, as well as how long it would take to make it all digital. The exploratory year began in 2012, putting all of Woolley's 15,000 artifact cards, many of which covered multiple objects, into a database and separating them to create a list of and primary data on every object excavated and written up at Ur during the excavations. This assessment showed that around 40% of the objects in each museum had no clear connection to their original field data. It recommended that every artifact from Ur in both museums be examined and reconnected to field records wherever possible. They must also be assessed for condition, the need for conservation and/or repairs. Publication of any item must be confirmed and any mistakes referenced in one, easily accessible place.

After the initial work, the Leon Levy Foundation and Hagop Kevorkian Fund provided continuation grants to begin the individual examinations of artifacts as well as to continue scanning and transcribing all field notes and missives from the field as well as all ancient texts found at the site. This work is currently underway in both London and Philadelphia. Baghdad is conducting its own inventories but will hopefully join when their work allows.

As soon as possible, and beginning with portions of the data so as not overly to delay, all of the information will be published with stable URIs and RDF/XML, JSON and/or other machine-readable references, connected to some version of ArchaeoML and/or CIDOC-CRM wherever possible (see Open Context for the format we are currently hoping to emulate). The site created will thus be a record of everything from Ur, essentially a modern publication, but also a research tool that provides ways of interlinking and presenting core data so that more can continuously be learned and published about the site and its history. In this way it will be a growing record, a continually expanding work. It will be searchable textually, visually, and spatially (though this latter aspect will not be possible within the first two years). In other words, any keyword, artifact number, or other indicator can be entered into the site and all occurrences found throughout field notes, transliterated and translated cuneiform texts, catalogues, and publications; photos can be browsed and similar objects called up; and maps can be searched by area, room or tomb with the site populating these spaces with artifacts in context wherever possible.

The site will be open to all and linked to related sites, such as the online catalogues of relevant museums and the Cuneiform Digital Library Initiative. Most importantly, the RDF that describes the data will be part of the overall linked data on the web so that other connections can be made with any other information also published in linked open format. This means that new ways of envisioning the data, new ways of assembling it with similar information from other sites, will be possible, leading to more encompassing and more complete understandings of the Ancient Near East as a whole.

Truly this will be a virtual vision of Woolley’s excavation–and so much more.