This article is available at the URI http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/heath/ as part of the NYU Library's Ancient World Digital Library in partnership with the Institute for the Study of the Ancient World (ISAW). More information about ISAW Papers is available on the ISAW website.

Except where noted, ©2014 Sebastian Heath; distributed under the terms of the Creative Commons Attribution License
Creative Commons License

This article can be downloaded as a single file

ISAW Papers 7.8 (2014)

ISAW Papers: Towards a Journal as Linked Open Data

Sebastian Heath

Introduction

The present contribution to the set of essays published under the rubric of “ISAW Papers 7” is necessarily self-referential. ISAW Papers, the Institute for the Study of the Ancient World’s digital scholarly journal, is both its topic and its venue. These overlapping roles will prove useful by allowing direct illustration of the progress ISAW has made in implementing the goals with which the journal was initiated. By way of high-level overview, those goals are to publish article-length scholarship that (1) is available at no-cost to readers, (2) that can be reused and redistributed under a Creative Commons license, and (3) that is stored in formats that are very likely to be readable into the far future. Additionally, articles in ISAW Papers should link to stable resources similarly available on the public Internet. This last goal is intended to increase the discoverability and utility of any individual article as well as of the growing network of digital resources available for investigating the ancient world.

In describing progress to date, the following paragraphs will not shy away from raising technical issues. They do not, however, offer complete instructions for deploying Linked Open Data in a journal context nor detailed introductions to the technologies described. The discussion is practice oriented and so makes reference to the articles published to date. This approach and the movement from overview to specifics is intended to introduce readers to some of the opportunities ISAW Papers has recognized and also to the challenges it faces.

To start broadly, the editorial scope of ISAW Papers is as wide as ISAW’s intellectual mission, which itself embraces “the development of cultures and civilizations around the Mediterranean basin, and across central Asia to the Pacific Ocean.” (ISAW n.d.) Temporally, ISAW is mainly concerned with complex cultures before the advent of early modern globalization. Though it is important to note that ISAW does not try to impose strict limits on what falls within its intellectual purview. Indeed, the origins, development and reception of all phases of the Ancient World are fair game at ISAW.

Review and Licensing

Two additional concerns of a scholarly journal - review and licensing - can also be addressed efficiently. ISAW Papers publishes anonymously peer-reviewed articles as well as articles read and forwarded for publication by members of the ISAW faculty. This aspect of the editorial process is made clear for each article. The goal here is to provide a balance between the many benefits that peer review can provide to an author while similarly ensuring that it is neither a barrier to new work nor an impediment to timely publication. In terms of licensing, ISAW asks authors to agree to distribution of their text under a Creative Commons Attribution (CC-BY) license. The same applies to images authors have created on their own or which ISAW creates during the editorial process. We consider such open distribution to be an important component of a robust approach to future accessibility. It is, however, the case that authors have needed to include images whose copyright is held by others. This situation remains a fact of public scholarly discourse. Accordingly, we ask that authors obtain permission for ISAW to publish such images in digital form but do not require explicit agreement to a CC license. As with peer-review, a reasonable balance of current realities and future possibilities is the goal.

Partnership with the NYU Library

Initial public availability takes place in partnership with the New York University (NYU) Library. So for example, the text you are reading now will be accessible via the URI “http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/heath/”. While ISAW has complete responsibility for the editorial process, that is for shepherding an author’s intellectual content into a form that enables both long-term accessibility and immediate distribution, we rely on the Library to provide the infrastructure for that long-term preservation. Each party in this relationship brings its institutional strengths to the endeavor. In particular, it is very useful that the library assigns a Handle to each article (CNRI n.d.). For example, the URL “http://hdl.handle.net/2333.1/k98sf96r” will redirect to whichever URL the NYU Library is using to host ISAW Papers’ first article (Jones and Steele 2011). If a reader follows that link within a few years of the publication of this current discussion, it is likely she or he will be redirected to “http://dlib.nyu.edu/awdl/isaw/isaw-papers/1/.” Further out into the future, the handle may resolve to a different address. But we at ISAW are confident that an institution such as the NYU Library offers a very strong likelihood of ongoing availability. And it is of course the case that we encourage readers and other institutions to download and re-distribute any and all ISAW Papers articles. Such third-party use and archiving, enabled through initial distribution by the Library, will also contribute to the long-term preservation of this content.

An additional result of collaboration with NYU Library staff, particularly my colleagues in the ISAW library, is the creation of individual records in the NYU Bobcat library catalog for each article. This local initiative leads in turn and automatically to the creation of a Worldcat record for each article. Accordingly, “http://www.worldcat.org/oclc/811756919” is the Worldcat “permalink” for the record describing C. Lorber and A. Meadow’s 2012 review of Ptolemaic numismatics. The journal itself has a Library of Congress issued International Standard Serial Number (2164-1471) as well as its own Worldcat record at “http://www.worldcat.org/oclc/756047783”.

Broad Strokes and Specific Citations

There is a future point at which the following short list will describe the main components of a born-digital article published in ISAW Papers:

The two new abbreviations in the above list - XHTML and RDFa - can bear further explanation. As is probably well-known to many readers, HTML, specifically its 5th version HTML5, is the standard published by the Worldwide Web Consortium (W3C) that specifies the format of text intended for transmission from Internet servers to web browsers. As a simple description, HTML allows content-creators to specify the visible aspects of a text: e.g., that titles and headings are in bold, that paragraphs are visually distinct by indentation or spacing, and other aspects such as italic or bold spans. For its part, the W3C has quickly become a standards-setting body with global impact. At this moment, HTML5 documents can be directly read - that is rendered into human readable form on screen - by many applications running on many different forms of computing devices ranging from desktops and notebook computers to tablets and phones. It is likely that this easy readability of HTML documents will continue far into the future and ISAW believes some degree of readability for such content is guaranteed in perpetuity to the extent that that can be reasonably foreseen.

XHTML is the variant of HTML that adheres strictly to the requirements of the Extensible Markup Language (XML). XML is in turn a standard that provides more explicit indications of the structure of a text than does HTML. For example, an item in a list in HTML can be indicated by “<li>An item in a list”, whereas XHTML requires that the markup be “<li>An item in a list</li>”. Note the terminating “</li>”, which is required in XML. While a full discussion of XML and XHTML would take up excessive room here, it is fair to say that their added requirements are geared towards enabling more reliable processing by automated agents, meaning the manipulation of the text and rendering of results by computer programs.

At this point in the discussion it is worth highlighting one particular aspect of XHTML that ISAW Papers utilizes extensively. On the public internet, the presence of a “pound sign” or “#” in a web address often indicates a reference to a particular part of a document. When used in this way, the exact part referenced is indicated in the HTML document itself by the presence of an ‘id’ attribute. Meaning that HTML’s ‘p’ element, which is used to mark paragraphs, can be identified by mark up of the form ‘<p id=”p10”> … </p>’. In ISAW Papers, all paragraphs in the main body of an article have such an id and can therefore be directly referenced via URLs. For example “http://dlib.nyu.edu/awdl/isaw/isaw-papers/6/#p3” is a direct link to the third paragraph of M. Zarmakoupi’s (2013) article on urban development in Hellenistic Delos.

Towards Linked Open Data

Most of the discussion so far should be considered as preliminary to a focus on ISAW Paper’s implementation of the principles of Linked Open Data (LOD), principles that were summarized in the Introduction to this set of articles. With that description in mind, ISAW Papers can make some claim to being “5 Star” linked data as defined in Berners-Lee’s fundamental note of 2006. Its articles are available at stable URLs that can be considered URI-based identifiers and XHTML is a machine readable and non-proprietary format. Furthermore, and as only suggested by the short list of “main components,” ISAW Papers does provide RDF. That is, each article has embedded within it statements in the form of ‘triples’ that describe particular aspects of that article’s content. An example will make this aspect of the journal more clear.

The URL “http://dlib.nyu.edu/awdl/isaw/isaw-papers/3/#p70” is a link to the 70th paragraph of G. Bransbourg’s article on market integration in the Roman economy during the imperial period. Looking at the source of that text shows the following XHTML markup:

… the presence of a Tyrian colony in <span class="reference" rel="dcterms:references" typeof="dcterms:Location"><a rel="rdfs:isDefinedBy" href="https://pleiades.stoa.org/places/432815" property="rdfs:label">Puteoli</a></span> …

The ‘<a href="...">’ component of that markup is “plain old” HTML that allows the text of Puteoli to be highlighted in a browser so that a user can follow the link to the Pleiades page. That is standard functionality on the world-wide web. It is the additional markup that makes the meaning of that link machine readable. In English, the semantics indicated here can be stated as, “ISAW Papers 3 makes reference to a location. That location is, in turn, defined at the webpage “https://pleiades.stoa.org/places/432815”, and has the label “Puteoli” in the context of this article.” Similar markup for references appears in this article and in other ISAW Papers articles. For example, the first paragraph of A. McCollum’s note on Syriac geographic knowledge - accessible via “http://dlib.nyu.edu/awdl/isaw/isaw-papers/5/#p1” - contains a reference to the scholar Gregory bar ‘Ebrāyā, with a definition of that individual provided by a link to Wikipedia.

And while the following is beyond the scope of this discussion, it should be noted that the link between Puteoli in this text and the Pleides URI is entered by hand. It is hoped, even assumed, that such named entity recognition will become more automated in the futuure.

RDFa and Triples

Fundamental to the design principles of ISAW Papers is that the markup used here conforms to an existing W3C standard, specifically “HTML+RDFa 1.1” (Sporny 2013), which is itself part of the RDFa 1.1 group of standards (Adida et al. 2013). For its part, “RDFa” is the second abbreviation given in the brief list of “main components” above. It stands for “Resource Description Format in Attributes.” As a very short description, RDFa allows discrete machine-readable statements to be embedded in XHTML. These statements are called “triples” and take the form of:

To summarize and repeat, the triples indicated by the markup drawn from Bransbourg (2012) read “ISAW Papers 3 references a Location” and further specifies that “The Location is defined at “https://pleiades.stoa.org/places/432815”. Furthermore, these triples use publicly defined vocabularies. In the snippet above, “dcterms:references” indicates that ISAW Papers uses the vocabulary published by the “Dublin Core Metadata Initiative”. For a definition of the term “references” see “http://dublincore.org/documents/dcmi-terms/#terms-references”.

It is not the goal of this discussion to provide a full explanation of RDFa or triples. But it is worth stressing the strategic goal that the use of RDFa forwards. To state that goal simply, ISAW Papers articles intends to represent links to stable resources in such a way that the meaning of those links can be read and used by automated agents. That progress towards this goal is actually being made is indicated by the ability of current tools to read and query the data inherent in the articles published to date. For example, the W3C tool titled, “RDFa 1.1 Distiller and Parser” and available at the time of writing this at “http://www.w3.org/2012/pyRdfa/Overview.html” will recognize the triples in “http://dlib.nyu.edu/awdl/isaw/isaw-papers/3/”. Readers can try this themselves by pasting the article’s URI into the W3C Distiller’s “URI:” field. Doing so will show a large number of links to stable resources. In particular, using any such tool to list the triples in that article will reveal machine readable information related to authorship and subject in addition to clear specification of links to geographic entities beyond Puteoli. Additionally, bibliographic information is shown to be specified using the BibliographicResource and bibliographicCitation terms of the Dublin Core.

Issues in the Implementation of Linked Open Data

The word "towards" in the title of this contribution is intended to communicate to readers that the process of defining how ISAW Papers will implement LOD is not yet finished. Articles are available at stable URIs and do provide machine-readable links to other URIs. Nonetheless, this “Linked Open Data” has not reached a final form.

Keeping to the markup surrounding the reference to Puteoli in G. Bransbourg’s article, that was given as RDFa above, with the semantics of that RDFa “translated” into, admittedly stulted, English. Rendering that RDFa as turtle - another common format for commnuncating triples - gives the following excerpted sequence:

@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

<http://dlib.nyu.edu/awdl/isaw/isaw-papers/3/> dcterms:references [
     a dcterms:Location;
     rdfs:isDefinedBy <https://pleiades.stoa.org/places/432815>;
     rdfs:label "Puteoli"@en
   ] .

It is possible that no deep expertise in reading turtle is necessary for readers to see that this is an alternate rendering of the English, “ISAW Papers 3 makes reference to a location. That location is, in turn, defined at the webpage “https://pleiades.stoa.org/places/432815”, and has the label “Puteoli” in the context of this article.” There are strengths here. No suggestion is made that the webpage https://pleiades.stoa.org/places/432815 is in fact the site of Puteoli. When you go to the page, you get information about the site, a “definition” as it were, which is why the predicate rdfs:isDefinedBy is used. And again, that a W3C sponsored tool can render the information in an ISAW Papers article is a demonstration of progress towards interoperability.

But it is very important to note that this specific use of the the Dublin Core vocabulary in combination with the RDF Schema vocabulary published at http://www.w3.org/2000/01/rdf-schema# is idiosyncratic. And it is idiosyncratic because there is no universally accepted standard for deploying public vocabularies to describe relationships between documents. There are a number of vocabularies that could be used - some readers will be familiar with Bibo and Cito terms - but their use is not fully settled.

There is also room for progress on the creation of stable URIs for named entities, although many solutions are appearing. It is clear that VIAF (OCLC 2010-13) will be the publisher of identifiers for authors. Pleiades provides URIs for geographic entities. The Perseus Catalog (Perseus Digital Library n.d.) will provide URIs for many ancient texts, particularly those drawn from Greco-Roman culture. Likewise, ISAW Papers will continue to link to identifiers for numismatic concepts established by Nomisma.org (Meadows and Gruber 2014) and welcome progress being made by disciplines such as Syriac studies with its developing portal at http://syriaca.org.

The exact form of references to all such resources should be standardized across projects, or rather, variation in form should be reduced. It is certainly the case that many of the papers in this collection show excellent progress towards that goal. And it is hoped that ISAW Papers can contribute to the development of such standards by highlighting the need for them with usable data. From the particular perspective of this one born-digital journal, agreement on basic issues such as how to specify the semantics of links to well known resources will be a large step towards enabling the deposition of archival versions of all ISAW Papers articles into NYU’s Digital Archive.

Conclusion

That last point can stand as a conclusion to this discussion, intended as it is to capture a particular moment in an ongoing process. ISAW Papers is achieving its motivating goal of distributing high-quality scholarship relevant to the Ancient World. While there is much more work to do, particularly on the experience of reading an article online, it is fundamental that readers have current access to this scholarship at no cost and that it is made available in such a way that ongoing access is likely. Those aspects of the journal themselves adhere to the principles of Linked Open Data. To the extent that articles provide machine-readable data, the specific patterns used should be considered models and suggestions. Their utility will come clear as the data is consumed and as the data conforms more fully to best-practices developed by the wider Linked Open Data community, particularly those parts of that community focused on the Ancient World.

Works Cited

Adida, B., M. Birkbeck, S. McCarron and I. Herman (2013). RDFa Core 1.1 - Second Edition. <http://www.w3.org/TR/rdfa-syntax/>

Bransbourg, G. (2012). Rome and the Economic Integration of Empire. ISAW Papers, 3. <http://dlib.nyu.edu/awdl/isaw/isaw-papers/3/>

CNRI (n.d.) Handle System. <http://handle.net>.

ISAW (n.d.). Institute for the Study of the Ancient World. <http://isaw.nyu.edu>.

Jones, A. and Steele, J. (2011). A New Discovery of a Component of Greek Astrology in Babylonian Tablets: The “Terms.” ISAW Papers, 1. <http://dlib.nyu.edu/awdl/isaw/isaw-papers/1/>

McCollum, A. (2012). A Syriac Fragment from The Cause of All Causes on the Pillars of Hercules. ISAW Papers, 5. <http://dlib.nyu.edu/awdl/isaw/isaw-papers/5/>

Meadows, A. and E. Gruber (2014). Coinage and Numismatic Methods. A Case Study of Linking a Discipline. ISAW Papers, 7. <http://dlib.nyu.edu/awdl/isaw/isaw-papers/7/meadows-gruber/>

OCLC (2010-2013). VIAF: Virtual International Authority File. <http://viaf.org>.

Perseus Digital Library (n.d.). The Perseus Catalog. <http://catalog.perseus.org>.

Sporny, M., Ed. (2013). HTML+RDFa 1.1. <http://www.w3.org/TR/rdfa-in-html/>.

Zarmakoupi, M. (2013). The Quartier du Stade on late Hellenistic Delos: a case study of rapid urbanization (fieldwork seasons 2009-2010). ISAW Papers, 6. <http://dlib.nyu.edu/awdl/isaw/isaw-papers/6/>