This article is available at the URI http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/almas-babeu-krohn/ as part of the NYU Library's Ancient World Digital Library in partnership with the Institute for the Study of the Ancient World (ISAW). More information about ISAW Papers is available on the ISAW website.
Except where noted, ©2014 Bridget Almas, Alison Babeu, and Anna Krohn; distributed under the terms of the Creative Commons Attribution License
This article can be downloaded as a single file
ISAW Papers 7.3 (2014)
Linked Data in the Perseus Digital Library
Bridget Almas, Alison Babeu, and Anna Krohn
Overview
The Perseus Digital Library is currently working towards making all of its data available according to the best practices outlined by Heath and Bizer (2011).
We started by thinking carefully about the URIs that we are using to name and address the Perseus texts, catalog metadata, and other data objects from the Perseus Digital Library, so that we could feel reasonably confident in ensuring that these URIs will be stable and properly dereferenceable. We solicited and took into account feedback from members of the digital classics community on our approach to the definition of our URIs.
Once we completed the step of defining our URI schemes, our next priority has been to embark on publishing stable URIs for the various pre-existing resources in the library. Once we have completed this step for all of the major resource types in the library, we will begin to alter the way in which the resource content is represented to advertise its linkable features via RDF-A.
We decided upon this incremental approach because, given the large volume of resources in the Perseus Library, and the limited amount of manpower to get the work done, we felt it would be most beneficial to our own work, and our community of users, to publish our URIs as we go, and not wait until delivery of all the underlying resources could also be made compliant.
URIs in the Perseus Digital Library
As of this writing, we have released URIs for texts, citations and bibliographic catalog records. Work is in progress on authors, names and place entities, Greek and Latin lexical entities, and artifacts and images. Future efforts will include a variety of annotation types.
All Perseus data URIs are published under the http://data.perseus.org URI prefix, followed by one or more path components indicating the resource type, then a unique identifier for the resources, and an optional path component identifying a specific output format for the resource.
Citations and Texts
The Perseus stable URIs for citations and texts leverage Canonical Text Services (CTS) URNs (Blackwell and Smith), enabling us to take advantage of the CTS data model while still supporting Linked Data standards.
Individual passage citations can be retrieved at URIs which adhere to the following syntax:
http://data.perseus.org/citations/<CTS PASSAGE URN>[/format]
Currently supported data formats for citations are HTML and XML. In the future RDF/XML and JSON-LD may also be supported.
The URI syntax for an entire text, without a passage citation is:
http://data.perseus.orgs/texts/<CTS TEXT URN>[/format]
HTML is the only currently supported format for full text URIs, but the XML format will be available soon.
Combining CTS and URI standards
By combining the CTS and URI standards we produce semantically meaningful URIs for texts and citations. This is maybe best illustrated by example.
The following URN identifies the notional work, Homer's Iliad, without reference to a specific edition or translation of that work:
urn:cts:greekLit:tlg0012.tlg001
We can append an edition identifier to the above URN to specify the unique resource which is Perseus' TEI XML version of Homer's Iliad that is identified in the Perseus CTS inventory as 'perseus-grc1'.
urn:cts:greekLit:tlg0012.tlg001.perseus-grc1
The next URN identifies Book 1 Line 100 of the notional work the Iliad:
urn:cts:greekLit:tlg0012.tlg001:1.100
And this one identifies that line in the specific Perseus edition:
urn:cts:greekLit:tlg0012.tlg001.perseus-grc1:1.100
Any of these URNs can be resolved at a stable URI by prefixing them with the http://data.perseus.org URI prefix and path which identifies the resource type (i.e. in this case 'text' or 'citations'):
http://data.perseus.org/texts/urn:cts:greekLit:tlg0012.tlg001
http://data.perseus.org/texts/urn:cts:greekLit:tlg0012.tlg001.perseus-grc1
http://data.perseus.org/citations/urn:cts:greekLit:tlg0012.tlg001:1.100
http://data.perseus.org/citations/urn:cts:greekLit:tlg0012.tlg001.perseus-grc1:1.100
Note that, per the CTS standard, if you request a URN for a notional work without including an edition or translation specifier, then we return the default edition for that work in our repository, which in this case happens to be the perseus-grc1 edition.
Although not yet implemented, in the future we will take advantage of the subreference feature of CTS URNs to support URIs for every word or contiguous sequence of words in a text, for example:
http://data.perseus.org/citations/urn:cts:greekLit:tlg0012.tlg001.perseus-grc1:1.1@μῆνιν[1]
You can explicity link to the TEI XML format for the citation, rather than the default HTML display, by appending the optional format path to the URI:
http://data.perseus.org/citations/urn:cts:greekLit:tlg0012.tlg001.perseus-grc1:1.100/xml
Catalog Records
We currently publish URIs for work and edition/translation level records, as well as the CTS textgroup. URIs for authors, editors and translators are forthcoming. We leverage the CTS URNs for the texts in the catalog record URIs. The URI for a catalog record can be distinguished from that of the text itself by the 'catalog' path element:
http://data.perseus.org/catalog/<textgroup urn>[/format]
http://data.perseus.org/catalog/<work urn>[/format]
http://data.perseus.org/catalog/<edition urn>[/format]
http://data.perseus.org/catalog/<translation urn>[/format]
So for example, the following are the canonical URIs for objects in the CTS hierarchy for Homer's Iliad:
http://data.perseus.org/catalog/urn:cts:greekLit:tlg0012
http://data.perseus.org/catalog/urn:cts:greekLit:tlg0012.tlg001
http://data.perseus.org/catalog/urn:cts:greekLit:tlg0012.tlg001.perseus-grc1
http://data.perseus.org/catalog/urn:cts:greekLit:tlg0012.tlg001.perseus-eng1
You can explicity link to an ATOM feed, rather than the default HTML interface, of the catalog metadata for a textgroup, work or edition/translation by appending the optional format path to the URI:
http://data.perseus.org/catalog/urn:cts:greekLit:tlg0012.tlg001/atom
Support for alternate formats of RDF and JSON for catalog records is forthcoming.
Publication
The URIs themselves are currently published in the user interfaces of the Perseus Digital Library. For texts and citations, they are included in the "Data Identifiers" widget which appears on any text display:
For the catalog records, they appear on the top of each catalog display:
VoID files for all URIs are forthcoming.
HTTP Responses
The default response for any URI request whose HTTP Headers indicates that the calling client accepts text/html is a redirect, specifying the HTTP 302 response code, to the HTML display for the page for the requested resource in the corresponding Perseus user interface. As discussed above, requests for specific supported formats (currently xml for citations, atom for catalog records) can be made by appending a path element for the format to the resource URI. The response for these requests will typically be the resource contents in the requested format, and an HTTP 200 response code.
In order to enable people to cite textual resources which have CTS URNs assigned, but which are not currently digitized in the Perseus Digital Library, we redirect URIs referencing these resources to the Perseus Catalog. The response code for these redirects is HTTP 303 (See Other). If we have a catalog record for the requested resource, the target of the redirect is that catalog record, which may contain links to other locations at which you can find the actual text (such as the Internet Archive or Google Books, etc.). Although not yet implemented, in the future, if a resource not found in the Catalog is requested, we plan to redirect to an interface through which data for the resource can be submitted for inclusion in the catalog.
Resource Contents
As mentioned previously, currently the resources served at the Perseus URIs do not advertise any linked/linkable data contained within them via RDF. This is essential for full compliance to linked data best practices and is on our roadmap for future releases of the Perseus interfaces to the data.
Data Sharing Initiatives
In our efforts to connect with other groups in the scholarly and library communities, the Perseus Digital Library has made our authority record data for classical authors available to the Virtual International Authority File. The contribution of names from our author records will expand the VIAF name clusters, adding different forms of a given author’s name, and assist in VIAF’s goal of building truly international authorities that are useful to libraries and scholars. This relationship will also help the catalog to provide links to the VIAF clusters so as to make as much information about an author available and forward the further development of the Semantic Web.
Works Cited
Blackwell, Christopher and Neel Smith (2012). An overview of the CTS URN notation. The Homer Multitext. Available at http://www.homermultitext.org/hmt-doc/cite/cts-urn-overview.html.
Heath, Tom and Christian Bizer (2011). Linked Data: Evolving the Web into a Global Data Space (1st edition). Synthesis Lectures on the Semantic Web: Theory and Technology, 1:1, 1-136. San Rafael: Morgan & Claypool.