This article is available at the URI http://dlib.nyu.edu/awdl/isaw/isaw-papers/20-12/ as part of the NYU Library's Ancient World Digital Library in partnership with the Institute for the Study of the Ancient World (ISAW). More information about ISAW Papers is available on the ISAW website.
©2021 Anne Hunnell Chen and Jamie Folsom; text and images distributed under the terms of the Creative Commons Attribution 4.0 International (CC-BY) license.
This article can be downloaded as a single file
ISAW Papers 20.12 (2021)
Origins and Antidotes of Omission: Southeastern European Archaeology, Linked Open Data, and the Possibilities for Archaeological Integration
Anne Hunnell Chen, Hofsta University, and Jamie Folsom, Hofstra University
In: Sarah E. Bond, Paul Dilley, and Ryan Horne, eds. 2021. Linked Open Data for the Ancient Mediterranean: Structures, Practices, Prospects. ISAW Papers 20.
URI: http://hdl.handle.net/2333.1/xsj3v7j6
Abstract: Focusing on the region of Southeast Europe as a case study, this article first details the intersection of historical causes that have led to that region’s marginalization in traditional Western European and Anglophone archaeological scholarship. It is suggested that the historical factors that led to the marginalization of the region have also prevented local scholars from participating in recent international efforts among archaeologists to use Linked Open Data (LOD) to solve some of the challenges presented by traditional print publication practices. We then argue that adoption of LOD publication practices may be a key remedy to the marginalization of areas like Southeastern Europe, if concerted efforts are made to ensure the integration of data and/or participation from geographic areas not currently well-represented by LOD initiatives. We propose that legacy data that is currently available in stand-alone digital databases be consolidated and published as LOD to “jumpstart” the integration of data from such regions, and that additional steps be taken to help ease the barriers to entry for willing local partners into the Linked Open Data ecosystem. As an example of the proposed “jumpstart” approach, the article concludes with the introduction of the Southeast Europe Digital Documentation (SEEDD) project and shares practical observations pertaining to the organization and undertaking of such an endeavor.
Library of Congress Subjects: Communication in archaeology--Balkan Peninsula; Linked data.
Introduction
Accelerating and facilitating the adoption of the set of practices and technologies referred to as “Linked Open Data” presents a distinct opportunity for archaeological scholarship. Archaeology has long been defined by a publication tradition that due to the nature of fieldwork results in the presentation of features and finds in multiple small publications, and those publications are often segregated by material type and modern political boundaries irrelevant to the original contexts. Linked data technologies provide some promising solutions to challenges that have plagued traditional archaeological publication practices.
In particular, the vision of Linked Open Data (“LOD”) holds great promise for opening up data silos by allowing otherwise isolated resources to be interlinked, and by enabling the reuse of authoritative data. But while such advances present distinctly positive possibilities for both research and pedagogy, there is a danger that historiographic conditions with deep roots may result in differential uptake of these promising advances and inadvertently intensify the marginalization of historically marginalized data.
Archaeology has been relatively slow to embrace the potential of Linked Open Data, and in the meantime datasets continue to be developed, especially outside of Western Europe and Anglophone countries, that do not adhere to the emerging best practices and data standards that support interoperability. How do we deal with the fact that important datasets exist, both in print format and, increasingly, in difficult to access, proprietary, and closed digital formats? How can we begin to bring data “into the fold” from parts of the world that historically have been marginalized, and are therefore not currently participating in important international conversations about the future of research in archaeology, either due to a lack of awareness of these Linked Open Data standards and practices, or to a lack of available manpower to devote to the problem? Essentially, how can we use technology to reduce, rather than to exacerbate, the marginalization of historically marginalized regions?
Focusing on the region of Southeast Europe as a case study, this article first details the intersection of historical causes that have led to that region’s marginalization in traditional archaeological scholarship. We suggest that the historical factors that led to the marginalization of the region have also prevented local scholars from participating in recent international efforts among archaeologists to use LOD to solve some of the challenges presented by traditional print publication practices.
We then argue that adoption of Linked Open Data publication practices may be a key remedy to that marginalization, if concerted efforts are made to ensure the integration of data and/or participation from geographic areas not currently well-represented by LOD initiatives. We propose that legacy data from the region—that is, data already compiled and currently freely available in stand-alone digital databases—be consolidated and published as LOD to “jumpstart” the integration of data from such regions, and that additional steps be taken to help ease the barriers to entry for willing local partners into the Linked Open Data ecosystem.
Finally, as a practical example of the “jumpstart” approach we propose, we present the Southeast Europe Digital Documentation (SEEDD) project, an initiative currently in development that is aimed at harmoniously integrating a handful of extant digital datasets that all pertain to Roman archaeological material from Southeastern Europe and publishing the results as LOD. As these datasets are composed in different languages, conceived at different times, and defined by distinct organizational apparatuses, we enumerate the challenges we have encountered in the development of this project, and propose initial solutions to the difficulties confronted thus far. It is our hope that this project and the lessons learned in its organization may produce a sort of blueprint that will reduce the start-up time for others interested in designing similar interventions.
Origins of Omission
Among the international scholarly community, the 1980s and 90s saw the development of two intertwined research agendas among Roman archaeologists that were especially relevant to Southeastern Europe: the rise of Late Antique and Provincial Studies. Due to the intersectionality and theoretical prominence of these new research directions, the 80s and 90s should have been decisive for the international recognition of the archaeological importance of Southeastern Europe. Alas, this missed opportunity is in part what the title of this section is about.
With the publication of Peter Brown’s seminal World of Late Antiquity in the 1970s, the period between the ancient and medieval that was once widely derided as an era of decline was positively recast as vibrant and creative (Brown 1971). This recalibration opened up the subfield of Late Antique studies that has steadily gathered steam ever since. Meanwhile, the application of post-colonial theory to Roman and Native interactions in the provinces throughout the 80s and 90s stimulated a greater appreciation for and interest in the diversity of Roman material culture beyond—as well as within—the Italian peninsula (i.e. Versluys 2014; Mattingly 1997; Webster and Cooper 1996). Being both provincial and a hotbed for Late Antiquity, this should have been a prime moment for the territories of Southeastern Europe to rise to prominence on the international scholarly radar. However, this was in fact not the case, and the omission requires explication. Like most issues of this sort, the explanation for the marginalization of Southeastern Europe from international archaeological conversations lies in a confluence of many factors. These factors can be grouped into two broader categories—one pre-, and one post-circa 1980.
1. Pre-1980s
1.1. The Linguistic Barrier
The first of the pre-1980s phenomena that has contributed to contemporary marginalization of this region in international archaeological discourse is, unsurprisingly, linguistic. With their slavic roots, Bulgarian, Serbo-Croatian, Macedonian, and Slovenian—the principal languages used in previous generations for the publication of research conducted within the national boundaries to which they pertain—are languages that few in Western European and Anglophone communities have command of (Simons and Fennig 2018). As a Romance language, Romanian is perhaps less of a barrier, but still not a language with the global reach of, say, French, German, or English. This is not to say that no effort has been made on the part of Southeastern European scholars to publish their research in non-native languages in order to increase the reach of their work—there are summaries, and increasingly, whole articles in Western European languages, especially in more recent decades. For instance, today, the flagship journals Starinar and Archaeologia Bulgarica, published by the Archaeological Institute in Belgrade and Bulgaria’s National Archaeological Institute, respectively, as well as many of the catalogues and monographs published under the auspices of the same national entities, provide full texts or substantive summaries in both the local language and a widely-read second language (often English). However, for many decades the vast majority of discourse pertaining to the Roman archaeology of the region required engagement with resources in unfamiliar languages, and this requirement alone set the bar for entry rather high for non-speakers.
1.2. Circulation Barriers
In the pre-internet, pre-globalization-boom age, hand-in-hand with the linguistic barrier was the more limited circulation of journals and other publications where the majority of new research from the region was presented. Although an imprecise measure, a WorldCat search to determine which worldwide libraries today contain an uninterrupted series of archaeological periodicals from the region supports this hypothesis: selected volumes, not necessarily the uninterrupted series (in fact, holdings are often limited to more recent volumes that are more commonly available via e-Journal), are available worldwide for Starinar in c. 250 libraries; Situla in c. 67 libraries; Dela in 15 libraries; Arheološki Pregled in eight libraries; Razvitak in 11 libraries, and Archaeologia Jugoslavica in 10 libraries. Notably, the German library circuit is particularly well-furnished with regard to periodical publications from Southeast Europe. For comparison’s sake, according to WorldCat, 1,307 worldwide libraries contain the print volumes of the American Journal of Archaeology and 906 have access to the same periodical’s e-Journal. If circulation of periodicals from Southeastern Europe is thin in our own era of e-journals and increased ease of international contact, one can imagine that the case must have been at least the same, and most likely much more limited in previous decades.
As further elaborated below, Bulgaria, Romania, and Albania’s place behind the Iron Curtain likely complicated both the acquisition of print publications and the face-to-face exchange of ideas and research with scholars in the west. And while from the late 1940s Tito’s communist Yugoslavia no longer aligned with the Soviet Union and maintained cordial relationships with Western democracies, its government only opened its borders to all foreign visitors and relaxed visa requirements in the late 1960s (Glenny 2012: 545-633; Lane 2004). The Cold War engendered the isolation of people and ideas from nations with opposing politics (Nygård, Strang, and Jalava 2018), thus inadvertently helping to shape the Roman historical narrative as perceived by Western powers.
1.3. Historical Connections
But the region’s marginalized position in Western European and Anglo-American archaeological awareness is not solely the fault of fallout from Cold War politics. It has roots that go back much further to the eras of the Grand Tour and European imperialism (Nygård, Strang, and Jalava 2018; Wolff 1994). Much of the received canon of touchstone Roman monuments was codified with the evolution of Grand Tour itineraries from the late 16th to the 19th centuries (Haskell and Penny 1981). Routes were more or less standardized, taking Tourists generally from London, through France to Paris, and throughout Italy, often with stops in the Netherlands, Germany, and Switzerland. Only a very few adventurers went further afield to Spain, Greece, or Turkey (Buzard 2002). Ottoman-controlled territories, including Southeastern Europe, were never part of the standard itineraries. Rocky, mountainous terrain, as well as poorly maintained roads and limited railways made travel in the region particularly difficult. Intermittent war and Muslim-Christian tension did nothing to encourage Western travelers to stray from the standardized itineraries into Eastern Europe (Sugar 1977). Despite such deterrents, the region was not completely off the map for Western travelers. Famed English architect and designer Robert Adam visited Diocletian’s coastal palace at Split in the 18th century, as did a handful of other Western European artists and travelers (Belamaric and Šverko 2017; Pelc 2018; Živić 2003: 17); Split’s coastal location, along with the fact that much of the Adriatic coast (including Split) never fell to Ottoman occupation (Sugar 1977), made the palace more accessible than other impressive landlocked sites in the region. It should come as no surprise, then, that it is this monument alone among Roman commissions in Southeastern Europe that has a place in the introductory texts and surveys used in Anglo-American classrooms (ie. Ramage and Ramage 2015; Ward-Perkins 1994; Sear 1982).
The imperialism of Western European nations probably also has a role to play in Southeastern Europe’s marginalization. The Roman provinces that did not intersect with Grand Tour itineraries, but where modern Western European and Anglo-American knowledge of archaeology is more pronounced than it is for Southeast Europe, are—not coincidentally—territories where Western powers held colonies, protectorates, or client states at the height of European imperialism in the 19th and early 20th centuries. The French, for instance, were active in North Africa, Syria, and Lebanon, while the British held Egypt and Palestine. It makes good sense that the early scholars, travelers, and adventurers upon whose knowledge today’s traditional Western narrative of Roman history is based, would choose to travel to and work in places where colonial connections could facilitate communication and access (Ouahes 2018; Fraser 2017; Picarella 2007; Munzi 2004; Mattingly 1996). The territories of Southeastern Europe, meanwhile, were largely under Ottoman control from the 14th century until the latter part of the 19th century (Sugar 1977). There was no earlier nor subsequent long-term occupation of these territories by the imperialist states of Western Europe, and thus no colonial ties on which Western antiquarians could draw.
2. Unrealized Opportunities for Integration: Post-c. 1980
By 1980, then, there were already a host of circumstances conspiring to place the archaeology of Southeast Europe outside the intellectual milieu of most English-speaking and Western European scholars. But an additional two factors meant that the region faced a particularly steep barrier to popular attention. First, provincial topics of any geographic locality were marginalized within Roman studies prior to the pioneering work of the 1980s and 90s. And second, Southeast Europe’s strength, as judged both by the importance and density of its archaeological sites, including, among other gems, significant remains of at least four imperial palaces, a critical military limes, and an archbishopric built to monumentalize the birthplace of Justinian, lay particularly in Late Antiquity, which was itself a period long understudied due to its pejorative designation as an age of decline. Taken together with the legacies of linguistic, circulation, and political factors discussed previously, the region was triply damned.
With the rise of Late Antique and Provincial Roman studies from the end of the 20th century--two intersecting growth-areas of scholarly concern that were particularly relevant for the region of Southeast Europe--one would expect that international engagement with the archaeological material of the region saw a dramatic transformation in this era that should have served to counter pre-1980 trends in scholarship. While there was indeed a limited uptick in international interest in the archaeology of the region at this time and in the following years (Oltean 2010; Gudea and Lobüscher 2006, Wilkes 2005; Wilkes 1992; Lengel and Radan 1980), the region has failed to take-up the prominent position, particularly in the consciousness of scholars of the Roman provinces, that the importance of the archaeological remains warrants. Why hasn’t the archaeology of Southeastern European risen to eminence in recent decades, especially (at least) among scholars of the Roman period? The answer probably comes down to politics.
The precise period of increased scholarly interest in Late Antique and provincial topics is simultaneously a period wherein Bulgaria, Romania, and Albania remained behind the Iron Curtain (until 1989 and 1990), and Yugoslavia descended into the ethnic conflict that would ultimately result in war and crippling international sanctions throughout the 1990s (Glenny 2012; Lane 2004). Archaeological possibilities—both in work on the ground for foreign and local archaeologists, and opportunities for local archaeologists to present their research to an international community—are shaped by political realities. While excavation did continue behind the Iron Curtain and in Yugoslavia during its period of tumult in the 90s, incorporation of the ideas and material generated by such excavations into international conversations is precipitated by opportunities for scholars to meet at conferences and/or read each other’s work in published form—both factors that were crippled by the lasting effects of Cold War isolationism and the crumbling Yugoslavia’s dire economic circumstances. Journal subscriptions, conference registration fees, and travel are all politically- and economically-dependent luxuries that have been limited by Cold War and Yugoslav tensions and the economic aftermath that continues into the present. The instability of the area in the 80s and 90s also made it a difficult, if not impossible, region for foreign scholars to travel in for the purposes of making professional connections and visiting archaeological sites at the crescendo moment for Late Antique and Provincial Studies. Political and economic factors no doubt shaped the potential for international collaborations and the circulation of scholarly literature in both directions.
Although the wars--both cold and hot--are finished and sanctions lifted, the economic impact of international sanctions has had a lasting effect on the former Yugoslav portion of the region. Extreme financial hardship (Kowalczyk 2017) has continued to curtail the ability of local scholars to travel frequently to conferences where the majority of Western European and Anglophone scholars congregate. Meanwhile, the fracturing of national borders with the break-up of Yugoslavia caused an even greater disjunction of archaeological scholarship. As scholars in the region continue their efforts to distribute their research more widely and in languages with larger readership, it is today common for analog publications (ie. Migotti 2012) as well as digital database projects to take modern national borders as convenient delimiters for the material they consider. Such a trend can mean that sites and objects that should be considered together are sometimes held in artificial isolation. While the nationally-delimited digital databases that are being developed and maintained in the region, as for instance the admirable National Archaeological Record of Romania (http://ran.cimec.ro), go a long way toward increasing the accessibility of some archaeological data from the region in an effort to mitigate the historical factors discussed previously, linguistic factors are still a limitation on circulation. Information in such databases is frequently only available in the native language, and such resources are not currently well networked with other similar international resources. The absence of updated, synthesizing publications or digital resources that highlight the important work accomplished by local colleagues, has made the region hard for outsiders to get command of. This, coupled with the fact that the region has not traditionally featured prominently (if at all) in the classroom education of the current generation of Roman archaeologists outside of Southeast Europe and immediately adjacent areas (most notably Germany), makes the region a difficult one to integrate into the contemporary Western classroom. This in turn further entrenches the marginalization of the region in international consciousness for yet another generation.
In Search of Antidotes to Omission
1. Digital Databases: An Important, but Limited Beginning
Digital databases and the development of an LOD digital ecosystem holds great promise for overcoming the confluence of factors, including the inherent limitations of print media, that have led to the marginalization of regions like Southeastern Europe in the international archaeological conversation. Open access digital databases are key to addressing circumstantially-shaped data circulation issues, and an important first step in increasing the likelihood that such regions will begin to be better integrated into international classrooms in the future. As alluded to above, however, digital databases are not without problems that still present challenges to the international circulation and useability of the data they contain. While important, open access digital databases cannot alone stand as the whole solution to better integrating regions long peripheralized in international archaeological awareness. Unfortunately, for the linguistic and data-structuring reasons detailed below, simply converting analog data into a digital format and making that data available online does not necessarily make it as widely discoverable or as useful for sharing across projects as one might initially expect.
In the case of Southeastern Europe, even open-access digital database efforts centered on data from the region, drawbacks and all, are limited. Perhaps this is tied to economic fall-out from Cold War isolation and economic sanctions in the wake of Yugoslavia’s dissolution. With financial resources tight, and ever-present conservation issues and rescue projects understandably taking precedence in state-run archaeological and cultural heritage institutes, increasing international access to legacy data has not been the place financial or personnel resources have been focused.
There are several reasons why stand-alone digital databases provide only a limited improvement over traditional ‘analog’ data presentation. In the first place, with digital projects related to the ancient world quickly multiplying and based in countries across the globe, and no authoritative directory or other aid to assist researchers in discovering all relevant resources, it is difficult to have knowledge of and keep tabs on all available academic digital content. Even with such knowledge, the current majority of digital resources and databases relevant to archaeology are not well-integrated with one another and require that a researcher perform stand-alone searches on each individual platform and manually assemble results (Bagnall and Heath 2018).
A limiting factor with regard to scholar-knowledge of available digital resources ties back to language. User-interfaces centered around a singular language are still prevalent, and Southeastern Europe’s diverse array of local languages (which has expanded even further with the break-up of Yugoslavia), none of them with a large linguistic base outside of their native area, surely limits the ready discovery of digital databases and other online resources produced in the region. This is again exacerbated by the financial limitations for countries in the region since researchers are often limited in their ability to travel widely to international conferences to network and “market” the merits of locally-developed resources. While the design of digital databases or projects originating in the region sometimes makes it possible to toggle between a primary (local) language and a secondary language with wider circulation, often this is only the case for navigation features and metadata fields (as for example between Romanian and English on the National Archaeological Record of Romania database). This limits the data to the primary language and therefore does little to reduce the language barrier that has long impacted international engagement with materials from Southeast Europe, and secondarily, assuming linguistic differences among commonly-searched keywords, limits serendipitous discovery of such resources.
Related, but potentially even more critical, is the fact that as it currently stands in terms of digital and traditional print publication, there is a wide array of terminology worldwide (and in multiple languages) that refers to the same concept. Take the concept of the loom weight, an object common to a variety of archaeological sites. In English alone, it is possible to write this as loom weight, loom-weight, and loomweight. Add to this the fact that every other language in the world has their own way of expressing the same concept (peso de telar in Spanish, webgewicht in German, and so on) and it becomes clear why it would be difficult for either a human or a machine-based search to gather all of the relevant material from either print publications or digital resources (Harpring 2018; Harpring 2010; May, Binding, and Tudhope 2014).
There is an analogous problem regarding the naming traditions that refer to a singular place (whether a settlement or a particular feature within a site). English archaeological literature alone often has variations in the names used to refer to a singular building. Think of, for example, specific interpretationally-based names that have a long tradition in the literature, like the “Temple of Jupiter” versus a more conservative and banal “Tetrarchic period temple,” both referring to the same structure but the latter acknowledging the insecurity of the conventional identification. Add to this different linguistic or chronological variants and the problem multiplies (Jenstad, de Beer, and Vitale 2018; Harpring 2018; Harpring 2010). All of this reinforces data silos.
In addition to these language-related challenges, there are a number of data modeling issues that can complicate the use of independently-designed databases. We review a few of those here, with a focus on problems likely to be encountered by a researcher without specific database training or help.
As a simple example, relational database modeling, commonly used in research data management, collects records about each conceptually distinct real-world “kind of thing” into their own distinct table, with a row per object, and as many columns as needed to capture that kind of thing’s esoteric metadata. Records in separate tables can then be reconnected to one another by asserting links, or relationships, between them in a third “association” table.
It would be logical to store archaeological sites and objects in their own respective database tables, for instance, and to create a relationship between a record in the “sites” table and a record in the “objects” table, to assert that an object came from a site. That same site record can be related to many objects, and need never be re-entered. If there were more detail about a site, or newer data, which becomes available later, that detail could be added to the existing site record, and all of its related objects would get the latest information with no further effort. While doing that kind of data structuring can be very helpful in accelerating work, and reducing error, it requires training not necessarily possessed by scholars trained primarily as archaeologists, beyond instances like the one above, which map neatly to real world objects and concepts. (Baker, 2014)
Perhaps more importantly, even when a researcher does have either skill (or help) in structuring data, decisions about the structure of those data are certain to be driven by the specific concerns of the project at hand, and the preferences, skills and knowledge of the primary members of the research team. For these reasons among others, it would be surprising for any two databases to be structured exactly the same way, let alone more than two. Do “site” records contain the full-text name of the region they are found in, for instance, or latitude and longitude values (or both), or perhaps a reference to another database table, containing those details? While it is sometimes possible to extract equivalent results from two differently-structured databases, using correctly crafted queries, it is often not so simple to combine data from those same two databases into one.
Even when such significant data modeling challenges are met at the technical level, having a digital dataset and putting it online does not necessarily ease collaboration. Different projects may make different choices as to database software, which may present an additional technical barrier to interoperation between projects. One project may select Filemaker Pro, another Microsoft Access, another MySQL, each of which stores, structures, and queries data differently from the others. While that issue may be mitigated by establishing a process for exporting raw data to a common format to be shared across projects (such as comma-separated values in a text file), such a process adds work, slows collaboration, and means that changes to the source dataset render the exported copy out of date.
While putting data online for others to find, search, use, and copy, is an important step in increasing access to archaeological data, it unfortunately does little if anything to address the problems of data structure and format described above. It also does not guarantee those data will be discovered by others to whom they would be useful, nor that if they do, they will not take a copy of the data, which will fall out of date when the source data are updated.
2. Archaeological Linked Open Data: Pros and Cons
The concept of Linked Open Data could go a long way toward capitalizing on the positives of digital datasets while helping to solve some of the difficulties associated with them, such as those outlined above. In addition, joining the move toward making archaeological data available digitally with an uptake of LOD principles holds the potential to mitigate several long-standing problems stemming from trends in traditional print publication.
2.1. What is Linked Open Data and Why is it Useful?
The aforementioned challenges with regard to making data structures and formats interoperable and discoverable are not concerns limited to archaeology. Researchers in any data-intensive field are faced with similar challenges of maintaining and sharing datasets, and of creating connections between differently-developed and separately-maintained datasets.
A practical set of solutions to those common challenges, collectively referred to as “Linked Data,” has been under discussion and development, and increasingly widely adopted for government, academic, and commercial applications, among others, for over 10 years. Tim Berners Lee, the author of some of the technological underpinnings of the World Wide Web, articulates the idea in a way that can be summarized as follows.
Just as content published on a website, by a person, for other people’s consumption, can contain a link to content published on another website, data should be able to refer to other data, using the same basic infrastructure. A Uniform Resource Identifier (a URI, like a web address) can identify an object unambiguously; Hypertext Transport Protocol (HTTP) can be used to retrieve it; and a standard data format (in this case Resource Description Framework) can be used to describe it in ways that the data consumer can understand it. Taken together, and when used to link data to other data, this practice is called “Linked Data.” When the data in question is made available under a license that permits its reuse, it’s called “Linked Open Data” (“LOD”). (Berners-Lee 2006)
Perhaps the most important feature of LOD, beyond making it possible to describe your own data in a standard way, using the Resource Description Framework (RDF), and making it easy to share with others, is that it makes it easy to refer to, or link to, others’ datasets from within your own. This is of particular interest in academic and research contexts, and highly relevant to the challenges of most concern within the archaeology of Southeastern Europe, where it is clear that scholarship is highly dispersed and fragmented along a number of important axes.
If a researcher can assert in her data record about an object, for example, that it was found at a given Roman site in Southeastern Europe about which another researcher’s data is authoritative, that may add significant, new, value to the collective knowledge, both about the object, and the site, while not fragmenting the knowledge about either. LOD makes those kinds of assertions not only possible, but practical.
In cases in which researchers compile and structure data separately and differently (along the lines described in the previous section), the features offered by LOD have great potential. Researchers can publish records in a standard format, attach unambiguous terms for the concepts they contain, and refer to similarly published records in other repositories. This reduces or eliminates some or all of the barriers posed by differences in the underlying schemata, systems, and naming conventions of the systems in which those data are collected.
2.2. Archaeology and Linked Open Data
The analog publication process presents a number of limitations for archaeological data. In the traditional print medium, due to the nature of fieldwork that often unfolds over a number of field-seasons, archaeological features and finds from a single site are frequently presented in multiple small publications (see for instance, the plentiful, but thus far unintegrated, publications on recent archaeological work at Felix Romuliana/Gamzigrad). Furthermore, in both print and digital resources, data is often segregated by material type (epigraphy with epigraphy, glass with glass, churches with churches, etc.) and by modern political boundaries irrelevant to the original contexts. Together, these conventions and trends have sometimes made it difficult to appreciate and fully understand the visual and archaeological environments to which excavated structures/objects belong.
As discussed above, Linked Open Data has the potential to join up data sets across national lines, languages, and metadata schema, in ways that may make it easier to appreciate how archaeological objects and sites were contextually situated. Global adherence to a LOD ecosystem would reduce the burden on scholars to keep tabs on all multi-linguistic print publications and stand-alone digital resources in development across the world, and since it doesn’t require wholesale re-engineering of existing tools and practices, nor wholesale aggregation, cleanup and translation of data to begin to show results and benefits, it offers significant promise especially in areas and for historical periods where there has been fragmentation of the scholarship for a range of reasons. However, it has been noted that the field of archaeology has been slow to fully embrace the potential of LOD (Geser 2016; May, Binding, and Tudhope 2014). Given that a relative consensus about “best practices” in LOD only emerged a decade ago (and as far as practical details are concerned, is still emerging), it is not surprising that such practices have been adopted somewhat unevenly in archaeology.
But the field, especially ancient-world archaeology, is gradually warming to the idea of Linked Open Data and finding solutions to the initial hindrances to its application among archaeologists (concisely articulated in Geser 2016, 35-42). In an effort to solve some of the linguistic, terminological, and data-structuring difficulties inherent to independent digital datasets (and to some extent print publications as well) alluded to above, and to advance the use of LOD principles in the field of archaeology, significant efforts have been made to establish standard terms for both common, and archaeology-specific concepts. The W3C has developed the Simple Knowledge Organization System, or SKOS, as a general organizing conceptual framework (W3C, 2009). The Getty Trust has created the Art and Architecture Thesaurus (AAT) and other thesauri relevant to cultural heritage (The Getty Research Institute, 2017); the CIDOC-CRM and its extensions focus on cultural heritage documentation (CIDOC-CRM, 2015), and the ARIADNE project builds its archaeology-specific conceptual model on the foundation of the CIDOC-CRM (ARIADNE, 2012). The SENESCHAL project, and the STAR and STELLAR projects on which it builds, is focused specifically on making vocabularies for archaeology available as Linked Open Data. (May, Binding and Tudhope, 2014).
These standards form a foundation on which digital archaeology, and archeological LOD publication efforts can build, by offering terms to be used to refer unambiguously to real-world concepts, and frameworks whereby to build domain-specific vocabularies. In addition, there have been efforts at incorporating standard ontologies into software tools to support archaeology research (Kansa, 2014). Especially in historically neglected regions and periods, where a good deal of basic practical data collection and cleaning work remains to be done, these standard ontologies may be useful in disambiguating domain-specific concepts and terms.
As for the prevalence of archaeological LOD, the density maps in the ARIADNE portal (http://www.ariadne-infrastructure.eu/), a major initiative designed to integrate existing digital archaeological datasets using Linked Open Data principles, shows that Western Europe has made great strides in joining up fragmented datasets, particularly those related to material in Austria, Britain, the Czech Republic, France, Ireland, Italy, and the Netherlands. Within Southeastern Europe, however, the situation is much different. At present, the ARIADNE density map shows much of Southeastern Europe unrepresented in archaeological Linked Open Data. Slovenia is a notable exception, while Bulgaria and Romania indicate the beginnings of LOD up-take. Among areas where the Roman empire once ruled, Albania, Bosnia and Herzegovina, Croatia, North Macedonia (FYROM), Hungary, Montenegro, and Serbia currently show virtually no Linked Open Data adherence.
This situation clearly indicates that closer ties to Western Europe, that is, integration within the European Union, and presumably access to the economic and intellectual connections and resources that EU membership engenders, is impacting the likelihood of LOD uptake among contemporary archaeologists in Southeast Europe. In the meantime, datasets continue to be developed, especially outside of Western Europe and the EU--whether analog or digital--that do not adhere to developing best-practice standards that allow for LOD interoperability. Rather than providing a way to correct past oversights by furnishing a path to the integration of Southeastern European archaeological data into international consciousness, the current development trajectory of LOD among archaeologists is shaping up to further entrench the marginal status of the region’s data long ago established due to historiographical factors and the limitations of the print medium.
With Linked Open Data initiatives among archaeologists on the rise, it is essential that international LOD adherence patterns, together with intellectual circulation networks more generally, be critically examined in order to identify--and if possible address--areas of non-adherence. Such concerted attention may help to head-off inadvertent continued exclusion of areas already marginalized in international archaeological narratives. One possible solution to such a challenge is for archaeological scholars in areas where Linked Open Data is on the rise (ie. Western Europe and Anglophone countries) to invest in partnering with local institutions to bring legacy data resources relevant to traditionally marginalized areas into Linked Open Data compliance, thereby establishing a baseline of archaeological data for areas not currently well-represented.
Apart from generating more useable, linkable data for the region of Southeast Europe (to the benefit, presumably, of both local and international scholars), such projects should also make it a priority to seek-out working relationships with local colleagues in order to identify and work toward addressing the specific factors that each local partner faces in publishing legacy data as LOD. Easy-to-follow, step-by-step guidelines, including how to work with specific legacy data formats, may lower the barrier to entry for local partners who wish to publish datasets as LOD but lack necessary resources or know-how. It is therefore a core objective of the Southeast Europe Digital Documentation project (SEEDD), described in the next section, to facilitate that “onboarding” process.
SEEDD is an initiative designed to facilitate the integration of Southeastern Europe into the growing archaeological LOD ecosystem, and catalyze the further uptake of LOD standards in the region and elsewhere. Below we detail the general design considerations of the project, as well as some of the challenges and proposed initial solutions for undertaking such an initiative. It is our hope that the lessons learned in the conception of this project may help to jumpstart similar initiatives for analogous regions in the future.
SEEDD as a Sample Project
The Southeast Europe Digital Documentation project (SEEDD) is focused on aggregating and publishing archaeological data from Southeastern Europe currently housed in stand-alone digital databases, as LOD. This intervention is aimed at establishing, with relative speed, a substantial baseline for a region currently represented at distinct disadvantage relative to Western European countries. Additionally, it is our hope that such an endeavor may serve as a demonstration of the reciprocal benefits of LOD to local and international audiences alike, and with time inspire the continued uptake of LOD principles within the region.
The region today designated as Southeastern Europe was once entirely within the Roman Empire, but is now subdivided into several modern nation states. As a result of the superimposition of modern political borders, and the concomitant cultural and linguistic divergence, there exists a certain amount of fragmentation among modern research efforts in the region.
SEEDD is attempting to reconcile several specific datasets, and to establish a corpus of LOD for Southeastern European archaeological data relatively quickly. Preliminary partners for the SEEDD project include the Last Statues of Antiquity Database (Oxford, http://laststatues.classics.ox.ac.uk/), the National Archaelogical Record of Romania (RAN; http://ran.cimec.ro/sel.asp?Lang=EN), and Die Bilddatenbank Ubi Erat Lupa (http://lupa.at/). In coming to grips with the challenges inherent in that process, and by documenting and sharing our experience, we may help others avoid the time-consuming trial-and-error of our own work. Below, we share some insights, and flag other areas where our forthcoming conclusions may interest researchers aimed at similar ends. In sharing the process we are developing, and the toolset we have assembled, including considerations of both data cleanup and LOD publishing, we intend not only to provide insight into our project’s specific process, but also a template for projects aimed at solving analogous problems in the field. We plan to:
- Describe problems encountered and initial solutions to integrating data from independently conceived digital databases.
- Identify a range of entry points so that projects like SEEDD, which seek to break down data silos, can make use of SEEDD’s process.
- Describe and document a process for adoption of open data practices by projects from each entry point.
- Collect and document existing services and tools for implementing those processes.
We are using the following process to develop those tools.
Process
Overview
For projects like SEEDD, there are at least three main areas of effort. The first two are concerned simply with collecting the data in a relational database system, the third, with moving beyond the database “silo,” to publish LOD.
- To compile an initial dataset, and refine the domain-specific metadata schemata. In our view, it is natural for a project to model and capture its data however is most appropriate and efficient at this stage, although we will try to offer some guidance drawn from our experience and from the standards efforts we have encountered.
- To integrate data from other relevant sources if any, modifying and expanding data schemata as necessary to support the combined datasets, and to support multiple languages.
- To publish data using an appropriate RDF serialization format for Linked Open Data, and either to make it discoverable on a stable domain, and/or to share it with an aggregator like ARIADNE to support research across repositories.
Those steps advance the project’s data collection goals, while providing opportunities both to evaluate appropriate process and tools, and also to document challenges, particularly regarding aggregating data from several source languages, and contending with different data schemata.
Compile
In collecting an initial dataset, a project will inevitably replicate some of the “siloing” that we are ultimately seeking to address. However, such a step has two advantages. First, it is very practical for capturing data at the highest resolution and fidelity possible, including coming to grips with the questions of divergent schemata, different languages, and so on. Indeed, that advantage explains to a great extent, the reason that archaeology suffers from such pervasive data fragmentation, and struggles to share data effectively. But since we are setting out to address those issues, taking this as a first step has a second advantage. It allows a project to take a “first person” perspective, from which to make recommendations to owners of other existing datasets, about how they can move their data from silos to a more open relationship with the larger research community.
Example
As noted, projects that partner with SEEDD, whether they collaborate by sharing data to be aggregated, or make their collections available to be indexed, may store object and site metadata in divergent schemata and different languages from one another. As an example, we take a particular portrait head of a late Roman male figure, for which records exist in (at least) two different stand-alone digital databases, in two different languages, and with differing identifying data (perhaps most conspicuously, in the object’s attributed “name”).
At Last Statues of Antiquity (LSA Database, from the University of Oxford), there is a record for the object in English. The database authors have notably opted for a descriptive rather than interpretive “name” for the object:
At Lupa, there is a record for the same object in German, but here the database author has favored an interpretive label definitively identifying the portrait as one of emperor Carus:
To compile these records, SEEDD envisions a tool to allow scholars to assert and record the presence of these two records, and the fact that the referent of both of these two records the same. That activity doesn’t compile the metadata from these two records, or make assertions about the correctness or the completeness of either record; it simply allows an expert scholar to record their existence, and relationship to the object and to one another. The data model of the data records created by this compilation would capture the identity of the referent object using the OWL sameAs
property, to express that these two records refer to the same object.
Integrate
If and when a project (which has compiled a “primary dataset”) seeks to incorporate already-compiled data from another project (which has compiled an “external dataset”), it is likely they will encounter several challenges, and they will have decisions to make about whether to maintain them separately, and create links between them, or to modify one or both of them to allow them to be integrated. There are several kinds of mismatches to be considered (Kansa and Kansa, 2010).
- Columns present in the primary dataset, but absent from the external dataset
- Where the primary dataset contains a field or column for which the external dataset does not capture data at all, data import should not be complicated; there would be null fields in the destination system where the imported data had no column or value.
- Columns present in the external dataset, but absent from the primary dataset
- When other databases contain fields the primary schema does not, new columns must be created, or data must be modified and stored in existing columns to avoid a loss of resolution.
- Data present in both, but labelled, structured, or populated, differently
- When both capture similar data in different ways, a custom migration plan must be crafted. Options for migration strategies include:
- Maintain the dataset separately, and create links as appropriate;
- Copy matching data into the primary dataset’s schema, leave non-matching data out (or keep it separate).
- Copy matching data into the primary dataset’s schema, creating new columns for non-matching data.
- Restructure external data to match the primary schema.
- When both capture similar data in different ways, a custom migration plan must be crafted. Options for migration strategies include:
- Data in different languages
- When relevant datasets exist in several languages, there are both technical and non-technical challenges to consider. The primary database schema should be made to accommodate field values in an arbitrary number of languages. We presume that:
- Many if not most values will not be translated into any other language at first.
- It will be desirable for a maximum of values to be translated eventually.
- Translators will be non-technical people, who speak a range of first languages.
- When relevant datasets exist in several languages, there are both technical and non-technical challenges to consider. The primary database schema should be made to accommodate field values in an arbitrary number of languages. We presume that:
A worthwhile objective therefore, is to support the storage of multiple values for each column (May, Binding, and Tudhope 2014: 179), and to support an internationalized translation interface, to ease the work of translation.
- About identical objects
- If there appear to be significant data about the same sites from two or more sources, in two or more languages, a project should anticipate a) providing for storage of all values, and b) providing ways to create, retrieve, update, and delete values for each language on a single record.
- About different objects
- In instances where the primary and external datasets contain records in different languages about different objects, we believe the standard established for traditional print publication is reasonable; that the consumer be responsible for translation.
Most of the work in a project like SEEDD will likely be related to this integration step. If done well, as described by Kansa and Kansa (2010), the subsequent publication step will rest on a solid foundation.
Example
To take the same example of the two records about the late Roman portrait head, the integration step would be to collect and collate the metadata from those two records about that object in standard formats.
For this, SEEDD envisions a tool that would build upon the work being done by the World Historical Gazetteer project, which in turn builds upon the earlier work by the Pelagios Network, to adopt standard JSON-LD data formats for expressing Linked Places (LOD records about places), and Linked Traces (LOD representations about objects, events, people, or phenomena which can be related to those linked places).
The SEEDD integration tool would allow the annotation of the records like those seen above, asserting that the objects (or sites, places, people, or periods) mentioned, are the same ones mentioned in existing gazetteers or other LOD authorities or sources, or if there are no existing sources, compiling our own, in Linked Places and Traces formats.
Publish
Blaney (2017), Graham (2018) and Lincoln (2015) describe publishing collected data as LOD, and documenting how to access it, including discussion of the use of both RDF-XML and JSON-LD as formats for LOD and use of both Web APIs and SPARQL endpoints for access and discovery. Some other best practices which emerge in that literature include: using existing vocabularies wherever possible; linking to other LOD sources wherever appropriate and possible, and uploading the dataset to appropriate clearinghouses such as ARIADNE.
Example
Compiled and integrated data would be published as Linked Open Data, and made discoverable and searchable.
SEEDD aims to build upon the work done by the Pelagios Network, which has identified the publication of linked data about places, via registries, as a key goal, and we intend to follow the model they have articulated.
Conclusion
It seems clear that the developing international Linked Open Data ecosystem has the potential to go a long way toward mitigating challenges that archaeologists have encountered for generations regarding the circulation, discoverability, and international integration of their data. Traditional print publication practices and stand-alone digital resources, while important efforts to make archaeological data available to the field at large, both contribute to data siloing that in concert with other historical, economic, and political factors can result in the inadvertent marginalization of data from particular areas, as it has in the case of Southeastern Europe. While LOD holds great promise to ease the sharing and inter-communicability of multilingual, variously structured data, and improve data discoverability, thus helping to break down data silos, these benefits are dependent upon buy-in from data owners. The same barriers to data circulation that challenge the reuse of information contained in print publications and stand-alone digital datasets may also impact the uptake of LOD principles. As efforts are made to facilitate LOD adoption among archaeologists in order to partake of its benefits, it is essential that international LOD adherence patterns and general intellectual circulation networks be examined critically in order to identify barriers to adoption, and if possible, to design intervention efforts to address them, so that the adoption of LOD does not inadvertently reinforce pre-existing marginalizations. It is our hope that the example of SEEDD’s intervention strategy aimed at creating a baseline of LOD for a region currently not well-integrated into the emerging network of archaeological Linked Open Data, along with our efforts to share practical observations pertaining to the organization and undertaking of such an endeavor (both those articulated above, and others that we anticipate generating in the project’s later stages), may prove useful for projects aimed at solving analogous problems in the field.
References
Alcock, S. E., M. Egri, and J. F. D. Frakes (eds.) 2016. Beyond Boundaries: Connecting Visual Cultures in the Provinces of Ancient Rome. Los Angeles: Getty Publications.
ARIADNE. 2012. Available at: http://www.ariadne-infrastructure.eu/ [Last accessed 17 August 2020].
Bagnall, R. S. and S. Heath. 2018. “Roman Studies and Digital Resources.” Journal of Roman Studies, 108: 171-189. DOI: 10.1017/S0075435818000874.
Baker, J. 2014. Preserving Your Research Data. The Programming Historian 3. Available at: https://programminghistorian.org/en/lessons/preserving-your-research-data [Last accessed 17 August 2020]
Belamarić, J. and A. Šverko (eds.) 2017. Robert Adam and Diocletian’s Palace in Split. Zagreb: Školska knjiga.
Berners-Lee, T. 2006. Linked Data. Available at: https://www.w3.org/DesignIssues/LinkedData.html [Last accessed 17 August 2020].
Blaney, J. 2017. Introduction to the Principles of Linked Open Data. The Programming Historian 6. Available at: https://programminghistorian.org/en/lessons/intro-to-linked-data. [Last accessed 17 August 2020]
Brown, Peter. 1971. World of Late Antiquity. New York: Harcourt Brace Jovanovich.
Buzard, J. 2002. “The Grand Tour and After (1660–1840).” In Hulme and Youngs (eds.) The Cambridge Companion to Travel Writing. Cambridge: Cambridge University Press, pp. 37-52.
CIDOC-CRM. 2015. Version 6.2.1 Available at: http://83.212.168.219/CIDOC-CRM/Version/version-6.2.1 [Last accessed 17 August 2020].
Fraser, E. 2017. Mediterranean Encounters: Artists Between Europe and the Ottoman Empire, 1774–1839. University Park: Pennsylvania State University Press.
The Getty Research Institute. 2017. Getty Vocabularies as Linked Open Data. Available at: http://www.getty.edu/research/tools/vocabularies/lod [Last accessed 17 August 2020].
Geser, G. 2016. “WP15 Study: Towards a Web of Archaeological Linked Open Data.” ARIADNE. Available at: http://legacy.ariadne-infrastructure.eu/wp-content/uploads/2019/01/ARIADNE_archaeological_LOD_study_10-2016-1.pdf [Last accessed 29 September 2020].
Graham, S. et al. 2018. The Open Digital Archaeology Textbook. Available at: https://o-date.github.io/draft/book [Last accessed 17 August 2020].
Gudea, N. and T. Lobüscher. 2006. Dacia: Eine Römische Provinz Zwischen Karpaten und Schwarzen Meer. Mainz am Rhein: Philip von Zabern.
Harpring, P. 2018. LOD and the Getty Vocabs. Los Angeles: J. Paul Getty Trust. Available at: http://www.getty.edu/research/tools/vocabularies/Linked_Data_Getty_Vocabularies.pdf [Last accessed 17 August 2020].
Harpring, P. 2010. Introduction to Controlled Vocabularies: Terminology for Art, Architecture, and Other Cultural Works. Los Angeles: Getty Publications.
Haskell, F. and N. Penny. 1981. Taste and the Antique: The Lure of Classical Sculpture. New Haven: Yale University Press.
Kansa, E. and Kansa, S. 2010. “Publishing Data in Open Context: Methods and Perspectives.” The CSA Newsletter, Vol. XXIII, No. 2. Available at: http://csanet.org/newsletter/fall10/nlf1001.html [Last accessed 17 August 2020].
Kansa, E. 2014. “Open Context and Linked Data.” ISAW Papers 7.10. [online access at: http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/kansa/, last accessed 17 August 2020].
Kowalczyk, M. 2017. “Between Wealth and Poverty: Former Yugoslavia 25 Years After the Breakup.” Central European Financial Observer. 5 June [online access at:https://financialobserver.eu/cse-and-cis/between-wealth-and-poverty-former-yugoslavia-25-years-after-the-breakup/, last accessed 17 August 2020].
Lane, A. 2004. Yugoslavia: When Ideals Collide. New York: Palgrave MacMillan.
Lengyel, A. and G. T. Radan. 1980. The Archaeology of Roman Pannonia. Lexington: University of Kentucky Press.
Lincoln, M. 2015. Using SPARQL to Access Linked Open Data. The Programming Historian 4. Available at: https://programminghistorian.org/en/lessons/graph-databases-and-SPARQL [Last accessed 17 August 2020].
Mattingly, D.J. 1996. “From One Colonialism to Another: Imperialism and the Maghreb.” In J. Webster and N. Cooper (eds.), Roman Imperialism: Post-Colonial Perspectives. Leicester: Leicester University Press, pp. 49–70.
Mattingly, D.J. (ed.) 1997. Dialogues in Roman Imperialism. Power, Discourse, and Discrepant Experience in the Roman Empire. Portsmouth: JRA.
May, K., C. Binding, and D. Tudhope. 2014. “Barriers and Opportunities for Linked Open Data Use in Archaeology and Cultural Heritage.” Archäologische Informationen 38. Available at: http://nbn-resolving.de/urn:nbn:de:bsz:16-ai-261628 [Last accessed 17 August 2020].
Migotti, B. 2012. The Archaeology of Roman Southern Pannonia: The State of Research and Selected Problems in the Croatian Part of the Roman Province of Pannonia. Oxford: British Archaeological Reports.
Munzi, M. 2004. “Italian Archaeology in Libya: From Colonial Romanità to Decolonization of the Past.” In M. L. Galaty and C. Watkinson (eds.), Archaeology Under Dictatorship. New York: Springer, pp. 73–108.
Nygård, S., J. Strang, and M. Jalava. 2018. “At the Periphery of European Intellectual Space.” In S. Nygård, J. Strang, and M. Jalava (eds.), Decentering European Intellectual Space. Leiden: Brill, pp. 1-18.
Oltean, I.A. 2010. Dacia: Landscape, Colonisation and Romanisation. New York: Routledge.
Ouahes, I. 2018. Syria and Lebanon Under the French Mandate: Cultural Imperialism and the Workings of Empire. London: I.B. Tauris.
Pelc, M. 2018. Grand Tour Dalmatia. Available at http://grandtourdalmatia.org/chrono-geographical-database/database [Last accessed 17 August 2020].
Picarella, G. 2007. “Alla scoperta delle Villes d’Or: archeologia, storia e identità coloniale nel Nordafrica di Louis Bertrand.” ArcoJournal, pp. 1-22.
Sear, F. 1982. Roman Architecture. Ithaca: Cornell University Press.
Simons, G. F. and C. D. Fennig (eds.) 2018. Ethnologue: Languages of the World. Twenty-first edition. Dallas: SIL International. Available at: http://www.ethnologue.com [Last accessed 17 August 2020].
Sugar, P. F. 1977. Southeastern Europe under Ottoman Rule, 1354-1804. St Louis: University of Washington Press.
Versluys, M. J. 2014. “Understanding Objects in Motion. An Archaeological Dialogue on Romanization.” Archaeological Dialogues, 21(1), pp. 1–20. DOI:10.1017/S1380203814000038.
Ward-Perkins, J. B. 1994. Roman Imperial Architecture. New Haven: Yale University Press.
Webster, J., and N. Cooper (eds.) 1996. Roman Imperialism: Post-Colonial Perspectives. Leicester: University of Leicester.
Wilkes, J. J. 2005. “The Roman Danube: An Archaeological Survey.” Journal of Roman Studies 95, pp. 124-225.
Wilkes, J. J. 1992. The Illyrians. Oxford: Blackwell.
Wolff, L. 1994. Inventing Eastern Europe: The Map of Civilization on the Mind of the Enlightenment. Stanford: Stanford University Press.
The World Wide Web Consortium (W3C). 2009. SKOS Simple Knowledge Organization System Reference. Available at: https://www.w3.org/TR/skos-reference/ [Last accessed 17 August 2020].
Živić, M. 2003. Felix Romuliana: Fifty Years of Solving. Zaječar: Narodni Musei Zaječar.