This article is available at the URI http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/vankeer/ as part of the NYU Library's Ancient World Digital Library in partnership with the Institute for the Study of the Ancient World (ISAW). More information about ISAW Papers is available on the ISAW website.

Except where noted, ©2014 Ellen Van Keer; distributed under the terms of the Creative Commons Attribution License
Creative Commons License

This article can be downloaded as a single file

ISAW Papers 7.30 (2014)

Moving from cross-collection integration to explorations of Linked Data practices in the library of antiquity at the Royal Museums of Art and history, Brussels

Ellen Van Keer

Introduction

The Royal Museums of Art and History rank among the largest cultural heritage research institutes in Belgium. Thousands of artifacts and historical objects from around the globe, dating from prehistoric to modern times, are on display in the galleries or kept in the storerooms.1 In addition, the library keeps thousands of journal articles, books and other scholarly publications regarding the museum's collections. In the current infrastructure, the museum and library databases are two separate systems. However, the materials they contain overlap not only on thematic level (e.g. "ancient Egypt") but also on entity level (e.g. entries on objects in exhibition catalogs). As part of the project "Bridging Knowledge Collections" we will create a cross-domain integration.2

As a first objective, we will make the two datasets cross searchable in a single online user interface, a major challenge because the museum and library sectors traditionally use totally different sets of standards (Prescott & Erway 2011). As a second objective, following the demands of the curators, we are researching a workflow that will link objects and documents on record level and support the input of object bibliography in the museum database. To avoid the manual creation of duplicate datasets, we will reuse existing bibliographic information from the library catalog in the museum database and link back to the full references by adding a system identifier. We will not use the library's internal database primary keys as identifiers, but instead, we will be using the permalinks produced on the library OPAC, because they are open to everyone and allow a direct hyperlinked access from within the museum database to the on-line library application and its range of functionalities and user-services.

Furthermore, we are considering a Linked Data implementation. As the museum system also generates permalinks for our objects, machine actionable links to identify both citing publications and cited artifacts are already in place. Moreover, we are not (yet) producing our own RDF, but we can rely on our partnership in the larger-scaled Europeana community for initiatives in this direction. Last year, Europeana published its API and released 20 million objects from its providers as RDF dump under open licenses (CC0), including our entire collection of Egyptian objects.3 What actually happened is that we exported our SPECTRUM compliant museum data as LIDO XML and mapped this to ESE, the earliest Europeana datamodel, which was Dublin Core based.4 ESE is currently being replaced by the RDF based EDM, and Europeana has also transformed the content ingested in the older format into this new model now.5 As a result of this procedure the resulting RDF remains rather crude and lacks contextualization links to other sources. However, in newer Europeana projects, we are directly transforming our LIDO export to EDM and a semantic layer is being implemented in the ingestion process.6

Important for our present purpose, is that the core semantic layer of EDM includes the “related works” element:

This is an adequate predicate for linking cited resources from the museum database to citing resources in the library system - and thus produce RDF triples. At least, that’s the general idea. There are of course practical obstacles.

Museum Linked Data

On the museum side, we will have to make additions to our LIDO XML export, as this remains currently our basis for all further mappings, also to EDM. More particularly, we want to add a new relatedWorksWrap to the LIDO,7 and map this element to <IsReferencedBy> in the ESE/EDM transformation. Unfortunately, we depend on (resources for) the museum system vendor making any changes to our export. Also in this regard, linking to bibliographic records with permalinks is more efficient than describing publications at the lower levels of author, title, year, etc. A new LIDO event type for "Publication" is a complex element and requires more modifications. However, in the Europeana context, tools are being developed for giving content providers more control over their application data export in the future.8

As a side note here, but essential for the adoption of Linked Data, we would also want to improve the “permalinks” for the objects the museum system produces, e.g.

http://carmentis.kmkg-mrah.be/eMuseumPlus?service=ExternalInterface&module=collection&objectId=84853&viewType=detailView

The permalinks should rather be application and vendor independent HTTP URI's that are composed of the institutional namespace and object ID's (Heath & Bizer 2011), e.g.

Likewise, it is relevant that Europeana has established its own identifier for this object:

Of course, changing “permalinks” and multiple identities are very delicate matters with implications far beyond the limited framework of this library-based project.

Library Linked Data

On the library side, identifiers pose the additional difficulty of having to decide which one(s) to use, both locally and globally. As we are part of a large library network and the “permalinks” produced on our library OPAC are actually queries into the shared database, they will change for every library "view”. Therefore, a publication will have a different permalink when accessed depending on whether access is through the general catalog of the network or through our local library “view”, e.g.

Still, we can easily extract a better URI automatically by stripping after the last ampersand which gives us

But additionally, the newly implemented library discovery tool LIMO produces again different sets of "permalinks" for the same publication, which we cannot refine manually, e.g.

However, this last link will lead our users directly to our local LIMO “view”, with customized new features such as user-recommendations and access to locally licensed e-content.9 For this practical reason, we will deploy it as our primarily linking source. Nevertheless, we would also want users in the rest of the network and, ideally, the entire world to be able to discover objects from our collections through citing publications they have access to. LIMO should be able to achieve this functionality through server-side enrichment of related records. Moreover, this new tool is based on Ex-Libris' Primo service and the company is working on producing persistent LOD-friendly URI's that will return RDF for all Primo PNX records (Koster & Harper 2013).

The global context also poses direct challenges to us. To start with, co-reference, the problem that multiple identifiers point to the same entity, is inherent in the global library context. While museum pieces are usually unique, libraries all over the world can hold (multiple) copies of the same publication. As a result, it will get numerous URI's through different library systems and aggregators, e.g.

Moreover, in producing a scholarly bibliography of our museum objects, linking to a specific edition of a publication in another library is just not an option. Library systems serve traditionally primarily as tools for locating physical copies (FRBR "items") of publications, but object citations are content disclosure and operate on a more general level. However, larger-scaled projects are increasingly trying to serve more globally scoped and content oriented (FRBR "manifestation/expression") identifiers, which suit our purpose better (Gatenby e.a. 2012).

And in the end, the development and implementation of RDA is intended as a remedy, since it (ideally) enables referencing entities on all four levels of FRBR group 1.10 Digital libraries as well have this advantage of being independent of the physical item, but they proliferate the identity crisis further (Hull e.a. 2008). Through publisher DOI's, commercial services such as Jstor, Open Access repositories such as OpenDOAR, and bibliographic or subject-related databases such as Zotero, APh, OEB, papyri.info... many additional sets and types of permalinks and URI's (often closed but nevertheless useful for researchers) can be assigned to the same publication or some part of it, e.g.

It is clearly impossible to trace, add and check all co-references of citing publications in all these systems manually. Implementing web-based systems and semantic technologies should allow automating this procedure in the future and addressing the issue on a larger scale.

Semantic enrichment and alignment of our datasets with external sources at a lower level of description can also be framed in the broader library and museum communities. In the Europeana context we are for instance transforming our thesauri to SKOS and aligning them with other reference terminologies in the Galleries, Libraries, Archives, and Museum sector (GLAM), such as the Getty Thesauri.11 Moreover, new tools and workflows are being developed to support Linked Data production and enrichment locally, before mapping to EDM (e.g. de Boer e.a. 2012). Also, direct mappings of LIDO aggregations as Linked Data sets are being investigated, to achieve rich and interoperable resource descriptions (Tsalapati e.a. 2012). Ideally, implementing a fully-fledged top-level ontology such as CIDOC-CRM would allow us to describe, relate and align our heterogeneous, distributed local datasets at the lowest and fullest level of detail, resolving the issue of cross-collection integration at the record level we are (still) dealing with now. However, it would not immediately solve all problems. URI disambiguation is a more general challenge in the consumption of Linked Data (Jaffri e.a. 2008) and the ubiquitous use of <owl:SameAs> has even invoked an identity crisis at the lowest description level (Halpin e.a. 2010). The adoption of Linked Data involves embracing of the global knowledge space in all its complexity.

Notes

1 http://www.kmkg-mrah.be.

2 http://www.belspo.be/belspo/fedra/proj.asp?l=en&COD=AG/LL/167. The project is coordinated by Wouter Claes, chief librarian of the RMAH. It is partnered by LIBIS, http://www.libis.be, the central IT service for libraries, museums and archives of KULeuven.

3 http://pro.europeana.eu/datasets.

4 http://pro.europeana.eu/ese-documentation.

5 http://pro.europeana.eu/edm-documentation.

6 http://www.europeanafashion.eu and http://www.partage-plus.eu.

7 http://www.lido-schema.org/schema/v1.0/lido-v1.0-schema-listing.html

8 http://www.europeana-inside.eu and http://dm2e.eu.

9 If at least the supplier agrees to have its data index in the system, which is not a natural matter of course, as was illustrated by the a recent debate between Ebsco and Ex-Libris (Pohl 2013).

10 http://www.loc.gov/aba/rda/

11 http://www.athenaplus.eu. We are actually the leader of Work Package 4 on “Terminologies and Semantic enrichment”.

Works Cited

De Boer e.a. 2012: Boer, V., Wielemaker, J., Gent, J., Hildebrand, M., Isaac, A., Ossenbruggen, J., & Schreiber, G. "Supporting Linked Data Production for Cultural Heritage Institutes: The Amsterdam Museum Case Study". In: E. Simperl, P. Cimiano, A. Polleres, O. Corcho, & V. Presutti (Eds.), The Semantic Web: Research and Applications, 7295 (2012), p. 733–747, doi:10.1007/978-3-642-30284-8_56.

Gatenby e.a. 2012: Gatenby, J., Greene, R. O., Oskins, M. W., Thornburg, G. "GLIMIR: Manifestation and Content Clustering within WorldCat". Code{4}lib Journal 17 (2012), http://journal.code4lib.org/articles/6812.

Halpin e.a. 2010: Halpin, H., Herman, I., Hayes, P. J., McGuiness, D.L., Thompson, H. S. "When owl:sameAs isn’t the Same: An Analysis of Identity Links on the Semantic Web". In: The Semantic Web – ISWC 2010. Lecture Notes in Computer Science 6496 (2010), p. 305-320, doi: 10.1007/978-3-642-17746-0_20, http://iswc2010.semanticweb.org/pdf/261.pdf.

Heath & Bizer 2011: Heath, T. & Bizer, Ch. Linked Data: Evolving the Web into a Global Data Space, Morgan & Claypool: 2011, http://linkeddatabook.com/editions/1.0/

Hull e.a. 2008: Hull, D., Pettifer, S. R., & Kell, D. B. "Defrosting the digital library: bibliographic tools for the next generation web". PloS computational biology 4/10 (2008), doi:10.1371/journal.pcbi.1000204

Jaffri e.a. 2008: Jaffri, A., Glaser, H. Millard, I. "URI disambiguation in the context of linked data". In: Linked Data on the Web, Beijing, April 2008, http://eprints.soton.ac.uk/265181/

Koster & Harper 2013: Koster, L. & Harper, C. "Linked Open Data at the IGelU conference in Berlin 2013". http://igelu.org/special-interests/lod/meetings/igelu-2013

Pohl 2013: Pohl, A. "Discovery silo's versus the open web". Blogpost http://openbiblio.net/2013/06/23/discovery-silos-vs-the-open-web/

Prescott & Erway 2011: Prescott, L. & Erway, R. Single Search: the quest for the holy grail, OCLC Research 2011, http://www.oclc.org/research/publications/library/2011/2011-17.pdf

Tsalapati e.a. 2012: Tsalapati, E. Simou,N. Drosopoulos, N., Stein, R. "Evolving LIDO based aggregations into Linked Data", In: CIDOC2012 - Enriching Cultural Heritage, Helsinki, Finland, June 2012, http://www.image.ntua.gr/php/pub_details.php?code=767