This article is available at the URI http://dlib.nyu.edu/awdl/isaw/isaw-papers/20-3/ as part of the NYU Library's Ancient World Digital Library in partnership with the Institute for the Study of the Ancient World (ISAW). More information about ISAW Papers is available on the ISAW website.
©2021 Adam Rabinowitz; text and images distributed under the terms of the Creative Commons Attribution 4.0 International (CC-BY) license.
This article can be downloaded as a single file
ISAW Papers 20.3 (2021)
Time for Linked Open Data
Adam Rabinowitz, University of Texas at Austin
In: Sarah E. Bond, Paul Dilley, and Ryan Horne, eds. 2021. Linked Open Data for the Ancient Mediterranean: Structures, Practices, Prospects. ISAW Papers 20.
URI: http://hdl.handle.net/2333.1/3j9kdg35
Abstract: Descriptions of time and the temporal attributes of people, events, and things are fundamental to the discussion of the past. When scholars attempt to represent information about the past as structured data, however, the inconsistencies in the ways we describe and annotate time become major stumbling blocks to the aggregation and reuse of historical datasets. This chapter presents some practical steps scholars can take to standardize chronological information in their data, and to take advantage of Linked Data resources to enrich their own datasets and make their data more useful to others. Using a hypothetical list of papyri as an example, the chapter discusses calendrical systems, standardized date formats for absolute time in digital environments, and the definition of relative expressions of time-divisions like periods with reference to external Linked Data authorities. It concludes with an overview of the process for adding new user-generated periods to the PeriodO Linked Data gazetteer of period definitions.
Library of Congress Subjects: Linked data; History--Periodization.
Introduction
The human measurement of time lends itself to computation, but it also presents particular problems for computers. Anyone who remembers the short-lived panic over “Y2K”, the idea that the turn of the millennium would break computer systems that had been built to understand the date “00” only as “1900”, can confirm this. On the one hand, humans have for a very long time measured time by observable, predictable, and quantifiable natural cycles, especially the movement of the celestial bodies. The most obvious of these cycles are the alternation of light and dark as the Earth spins on its axis, the waxing and waning of the moon, and the shifting position of the sun in the sky as the Earth progresses through the full circuit of its orbit. These are the same for everyone on the surface of the planet.
On the other hand, some of these phenomena are not in sync with others (for example, the lunar versus the solar cycle) – and more importantly, the conceptual division and verbal expression of time has varied widely across cultures from the beginning of written history to the present. Human beings have kept time in terms of more or less accurate solar years; lunar months, which do not coincide with solar years; the movement of the stars across the sky over a solar year; the count of days or years or cycles from a culturally-significant starting-point; repeated or extraordinary events; and so on. And they have created larger subdivisions of time on the basis of perceptions of associated attributes, with more or less specific boundaries: these have had many names (genos, saeculum, “epoch”, “age”, “era”), but outside of the specific system of geological description, they are now most commonly referred to in English by the term “period”.
This tension between the quantifiable and the conceptual offers both opportunities and challenges to producers and users of temporal digital data. The opportunities are clear in the “Big Data” revolution in information science, which makes extensive use of time-stamped information: in fact, the vast quantities of spatial data being harvested from cell-phones and social-media use would be much less valuable without associated times and dates (it is not as useful to know where something is happening if you don’t know when it is happening). These time-stamps are invariably expressed in terms of divisions used in the time-keeping system of the modern “West” – that is, years of 365 days with an extra day every four years, counted up from the birth of Christ and modified by Pope Gregory XIII in 1582 to bring the length of the year into harmony with the solar equinoxes (the Gregorian calendar); twelve months, each composed of a number of days between 28 and 31; and a night-and-day cycle composed of 24 hours, each of which is divided into 60 minutes, each of which can be divided in turn into 60 seconds. The consistency and interoperability of this convention means that millions of records can be visualized in time as well as space, revealing patterns in human activity that would be impossible to reconstruct at this scale without digital means (Deville et al. 2014).
But as fears of a Y2K bug showed, interoperability is vulnerable to simple differences in notation, which emerge as soon as cultural differences and changing conventions are taken into account. In US notation, 6/10 is June 10th; in most European countries, it is October 6th. Over the last several decades, the notations “B.C.” and “A.D.” (“Before Christ” and “Anno Domini”) have been replaced in some contexts with the notations “BCE” and “CE” (“Before the Common Era” and “Common Era”) for socio-cultural reasons. As one moves back in time or further afield in space, things become even more complicated. The traditional Islamic calendar, used for the calculation of festivals, is lunar, and is reckoned from the flight of Muhammed to Medina; the Hebrew and Chinese calendars are both lunisolar but begin their counts at different moments, and all three calendars change from the old to the new year in different months, none of which coincide with the Gregorian new year. The Gregorian calendar replaced the Julian calendar, which, because it was slightly longer, drifted over time with respect to the solar year, and in order to make the correction advanced the date by several days – but it was only gradually adopted outside the Catholic world, and in some places the Julian calendar was still used for civic date-keeping into the 20th century (this is why the “October Revolution” in Russia is dated in current terms to the beginning of November). By the time we get back to antiquity, the variety of dating systems is dizzying, as Thucydides’ famous expression of the date of the Theban raid on Plataea shows:
In the fifteenth year [of the truce], in the forty-eighth year that Chrysis was priestess at Argos, with Ainesios as ephor at Sparta and with Pythodorus still being archon at Athens for another two months, six months after the battle at Potidaia and at the beginning of spring… (Thuc. 2.2)
At least the dates in all of these systems are quantifiable, and calculations can be performed to reconcile them, as I will discuss below. The situation becomes even more complicated when one is dealing with historical periods, which have more and more imprecise dates the further one moves back in time – and the boundaries and names of which can differ not only from culture to culture, but also from scholar to scholar. Both boundaries and terminology can also change over time as new evidence is collected or new historical paradigms are adopted. Periods are not real, quantifiable historical phenomena; they are conceptual divisions created, usually after the fact, to organize information for particular discursive communities, and they generally have spatial boundaries as well as temporal ones. As a result, the reconciliation of different period expressions in different datasets requires an act not only of calculation, but of translation. To make matters even more complex, periods are often bounded by events, which themselves have both names and duration in time and thus echo the structure of the period for which they provide start and end dates.
This complexity must be taken into account when one attempts to convey temporal information as structured data, and the representation of time in datasets modeled as Linked Data presents additional challenges and considerations. This chapter addresses time as structured data and as Linked Data, beginning with the simplest modeling and expression of Gregorian calendar dates before moving on to the modeling and expression of periods and other temporal expressions. It assumes a basic familiarity with the idea of Linked Data and text markup as conveyed in other chapters in this work (those on Nomisma and SNAP, for example), but it will attempt to frame the material as “recipes” accessible to those with little experience in this area. Before we turn to technical details, however, I would like to note that the first question to ask is not “how do I make my data into Linked Data?” but “what kinds of data am I modeling, for whom and for what purpose?”. Time-stamped data – social media posts, phone-calls, digital images or computer files – contain information about absolute, continuous time represented with a high degree of precision, sometimes down to the second. Relative expressions of time of the sort used in natural language (“after last week”, “during the Hellenistic period”), on the other hand, cannot easily be converted into years, days, hours and minutes. Absolute and relative temporal data therefore require different modeling and offer different possibilities for aggregation and analysis. While absolute numeric values only need to be expressed in a consistent format to be legible for a computer, relative designations of time in a dataset require both normalization of human-readable terms (choosing to use either “Late Roman” or “Late Antique”, but not both) and additional parsing to enable absolute dating (stating the start and end dates of “Late Roman” in the context of a given dataset). Linked Data approaches make it possible to connect one’s own terms with standard “defined vocabularies” – “dictionaries” of appropriate terms – like the Getty Art & Architecture Thesaurus (AAT), and they also permit connections with resources that specify absolute temporal boundaries for periods, like PeriodO or chronOntology. Projects seeking simply to normalize their temporal vocabularies might only need an external thesaurus, but projects interested in facilitating chronological search or visualization may also have to address the parsing of relative temporal terms in absolute time.
It is therefore important to consider, at the beginning of a project, what kinds of temporal information will be included, how that data will be used by the project itself, and how it might be reused by others in the future. This chapter will begin by discussing standards, formats, and issues in the representation of absolute time, and then addresses the ways in which Linked Data approaches can enrich and connect relative temporal expressions on a semantic level as well. The first sections will deal with calendrical systems and machine-readable date formats, while the following sections introduce the semantic representation of both absolute dates and the relative chronological concept of the “period”. The final section provides a brief guide to the creation of new referenceable period definitions in the PeriodO Linked Data gazetteer.
Absolute Time: Calendar Dates
Of course all expressions of time are relative to some extent, in that dates are expressed in relation to a particular point of reference. Different points of reference account for different calendar systems. For the purposes of this chapter, however, “absolute” time refers to numeric dates expressed in any given calendar system, while “relative” temporal expressions are those that lack any inherent numeric dates. It is easier to standardize datasets in which chronological information is expressed in terms of absolute calendar dates, to ensure that this information can be consistently read and understood by both humans and computers. Let us take as an example a spreadsheet listing Greco-Roman papyri that includes a column labeled “date”, containing heterogeneous absolute date values such as “April 21, AD 565”, “155 BC”, “1st c. BCE”, “beginning of the 7th c.”, or “3rd-5th century AC”. These values represent the way most historians express absolute chronological dates of varying resolution, but they present several problems for the representation of dates as structured data. First, some of them are not fully quantified (what does “beginning of the 7th c.” mean in terms of calendar dates?). Second, the calendar being used is not specified. And third, the value in each cell is not represented in a standardized fashion – some values include days and months, others only years, others only centuries.
To illustrate this lack of standardization and the steps that can be taken to make dates machine-readable and thus suitable for visualization or reuse, we will use the hypothetical dataset of papyri below, which is structured to reflect the way many of us keep track of research data.
| papyrus_number | subject | location | date | notes |
|---|---|---|---|---|
| p. Example 25 | tax list | Karanis | the first of the month of Tubi in the fifth year of the reign of Trajan | |
| p. Example 32 | literary fragment | Tebtunis | 4th-3rd century BCE | |
| p. Example 45 | letter | Oxyrhynchus | uncertain | Late Antique? |
The first step necessary for standardization is the specification of a calendar. If the dates recorded in the dataset are in a non-Gregorian calendrical system, that system should be specified in the metadata that describes the spreadsheet columns, or in the date notation itself (e.g. AH for Anno Hegirae, the Hijri year in the Islamic calendar), or both. For an unambiguous reference, it is even better to include in the metadata a reference to an external defined vocabulary – for example, the Getty AAT, which includes persistent unique identifiers for the major calendrical systems.1 Using one of these identifiers – also known as Uniform Resource Identifiers, or URIs – ensures that a future user of the data will know exactly what version of what calendrical system the database designer had in mind. This is especially important when a dataset’s chronological attributes refer not to a calendrical system in common use, but to another system of reckoning – for instance, the Julian Day system favored by astronomers, which involves a continuous count of days starting at noon from January 1, 4173 BCE, set by Joseph Scaliger as the starting-point of the Julian calendar (Grafton 1975), or more distant dates expressed in kilo-years (ky, ka) or mega-years (my, myr, mya) (that is, in thousands or millions of years before the present). And it is even more important when conventional meanings have been adjusted in idiosyncratic ways in a particular dataset: for example, BP (“before the present”) always uses a “present” of 1950 when it is used in the context of radiocarbon dating, but in some other contexts the same notation has begun to assume a present of 2000.2
The use of URIs does not only facilitate transparency; it can also enrich one’s dataset by directing the user to additional information. For calendrical systems, this might come in the form of explanatory resources. At the level of individual dates, however, such additional information might also include other documents or objects that share that date. The “Graph of Dated Objects and Texts” (GODOT) project, for example, currently provides persistent identifiers for specific Roman consular and tribunician years, and will eventually provide URIs for a wide range of ancient date attestations.3 It is possible to look up year 5 of the reign of Trajan, in our hypothetical spreadsheet, in GODOT, and find three other documents related to this year. If we include the URI for that year (https://godot.date/id/7aLvi4Xn9QbhdTJFYu5Sdm) in our spreadsheet and later share the data with an aggregation platform like Trismegistos (https://www.trismegistos.org), our document will be added to these four, providing a new attested date (the first of the month of Tubi). GODOT also provides a useful tool for conversion of ancient dates to the proleptic Julian and Gregorian calendars, allowing us to parse the date in our spreadsheet as December 27 (Julian) or December 26 (Gregorian), 101 CE.
| papyrus_number | subject | location | date | notes | godot_uri_year | Gregorian_date |
|---|---|---|---|---|---|---|
| p. Example 25 | tax list | Karanis | the first of the month of Tubi in the fifth year of the reign of Trajan | https://godot.date/id/ 7aLvi4Xn9QbhdTJFYu5Sdm | December 26, 101 CE | |
| p. Example 32 | literary fragment | Tebtunis | 4th-3rd century BCE | |||
| p. Example 45 | letter | Oxyrhynchus | uncertain | Late Antique? |
The proleptic Julian calendar (that is, the application of the Julian calendar to dates before the stabilization of the leap-year system in the early 1st c. CE) works well for ancient dates, but for digital purposes, it is generally preferable to express dates using the Gregorian calendar. Date interoperability in digital environments depends heavily on the use of this calendar, which is now widely used for computing even in cultures with their own traditional calendars. The Gregorian calendar becomes proleptic when applied to dates before October 1582, or before its adoption in a particular region. The second step is thus the translation of dates in the original calendrical system to dates in the Gregorian calendar. For individual dates, there are now a number of useful online date converters,4 and while individual websites may become obsolete, it is very likely that such calculators will be available online for the foreseeable future. If dates after 1582 are involved, it is important to know when the Gregorian calendar was adopted in the place where the date in question was assigned (a document from the British colonies in North America dated to October 31, 1751, for example, would use the Julian calendar and would be dated to November 11, 1751 in the Gregorian calendar, since Britain did not adopt the Gregorian calendar until 1752). It is also important to note the date of the new year in a given calendrical system, since many non-Gregorian calendars place this in the spring, summer, or fall, not at the beginning of January.5
Absolute Time: Formatting
Once the dating system has been specified and Gregorian dates have been added, the next step toward interoperability is the standardization of the format for those dates. This is easier, because there is a commonly agreed-upon standard for representation. The standard is ISO8601, established in 1988 by the International Standards Organization specifically for the numerical representation of dates and times, and most recently revised in 2004.6 The basic components of the ISO8601 standard are a four-digit Gregorian year, with BCE dates indicated by a leading minus-sign, followed by a two-digit Gregorian month, followed by a two-digit Gregorian day, with hyphens as separators (YYYY-MM-DD). The minimum element is the four-digit year, which the standard does not allow to be truncated, in order to avoid the Y2K problem. Notation may also be extended beyond the date to include 24-hour clock time in relation to UTC time. Two revisions to the standard after 1988 are especially significant for those who work with ancient-world data. First, while the original ISO8601 standard followed the Gregorian calendar in moving directly from 1 BCE to 1 CE, with no zero year, revised versions after 2000 have permitted a zero year to create a continuous number line. As a result, 1 BCE becomes 0000 in ISO8601 terms, 2 BCE becomes -0001, and so on. Second, while the insistence that years are represented by least four digits is unchanged, it is now permissible to express past or future years with more than four digits. The ISO8601 standard also provides a standard notation for temporal intervals, which are perhaps more common than single dates in ancient-world datasets. This notation takes the form <start date>/<end date>, where start and end dates are both expressed according to the standard.
| papyrus_number | subject | location | date | notes | godot_uri_year | Gregorian_date | iso8601 |
|---|---|---|---|---|---|---|---|
| p. Example 25 | tax list | Karanis | the first of the month of Tubi in the fifth year of the reign of Trajan | https://godot.date/id/ 7aLvi4Xn9QbhdTJFYu5Sdm | December 26, 101 CE | 0101-12-26 | |
| p. Example 32 | literary fragment | Tebtunis | 4th-3rd century BCE | -0398/-0199 | |||
| p. Example 45 | letter | Oxyrhynchus | uncertain | Late Antique? |
So in our hypothetical spreadsheet of papyri, we might take the value “4th-3rd c. BCE” and represent it as “-0398/-0199”, where “-0398” is 399 BCE (the first year in the 4th century) and “-0199” is 200 BCE (the last year in the 3rd). The date of an Egyptian papyrus signed on “the first of the month of Tubi in the fifth year of the reign of Trajan” can be represented as “0101-12-26” (that is, December 26, 101 CE, the proleptic Gregorian equivalent of that Egyptian month and Roman regnal year).7 The use of this standard representation of dates increases the potential for visualization, reuse, and aggregation of the dataset, even without any further Linked Data measures. Tools are also emerging to facilitate the automated rendering of natural-language temporal expressions in ISO8601-compatible formats.8 On the other hand, the first representation implies a precision and a degree of certainty that may not have been present in the original statement: “around the 4th or 3rd century BCE” is not the same as “between the precise dates of 399 and 200 BCE”. The Library of Congress has recently addressed this problem with an extension to the ISO8601 standard, the Extended Date-Time Format (EDTF), which includes notations to express approximation, uncertainty, and missing information.9 It also permits the addition of the letter “Y” before the first numeral of a year with more than four digits, to avoid confusion (so the date “10001 B.C.” would become “-Y10000”), and specifies ways to indicate precision with the annotation of significant digits.
If the dataset was not created with a set format for dates, it may also be necessary to clean and standardize the existing date values before converting them to the ISO8601 standard. There are a number of tools available for database cleaning, but it is worth noting here the OpenRefine platform,10 for two reasons. First, it is a well-documented open source project that is both easy to learn and widely adopted, so it is likely to continue to be available for some time; and second, it allows the user to install reconciliation services that permit the matching of database values with values in external vocabularies. This is especially useful for the transformation of a temporal dataset into Linked Data: the URIs for the consular and tribunician dates in GODOT, for example, can be reconciled with a dataset containing such dates through an OpenRefine reconciliation service11 – as can the period definitions in PeriodO, as we will see below.
Absolute Time: Semantic Representation
The inclusion in a dataset of Gregorian dates represented according to the ISO8601 standard, perhaps using the conventions of the Library of Congress EDTF, makes the use of time more transparent, and makes it easier to combine that dataset with others that use the same standards. It is not, however, sufficient for the text-encoded markup representation of the dataset. “Text-encoded markup” means here that the dataset can be represented as a single plain-text file in which the structure of the dataset is explained in a human- and computer-readable format with reference to one or more ontologies and/or metadata schemata, and in which the value in every cell of an original table is annotated with the metadata of the column that defined that value. Since this representation does not encode information in tabular terms, with columns and rows, but treats each row as a complete item in itself, containing both attributes and their values, it is often referred to as a serialization. Serialization is a step toward the transformation of a dataset into a Linked Data format, but it is not the only step: at the most basic level, markup defines values in a dataset, not the relationships between them. For the latter, expression as subject-predicate-object triples is necessary. Since the metadata are defined relationally in a valid markup document, however, and since some of the same structures are used in Linked Data representations, it will be useful to begin with the representation of temporal data in Extensible Markup Language (XML). This is the language used by the Text Encoding Initiative, which has extensive participation from the Classics community (Dee, Foradi, and Šarić 2016) and by related ancient-world initiatives like EpiDoc (Cayless et al. 2009), so it is a particularly appropriate place to start in this context.12
XML, like other metadata schemata, is based on a series of elements, conceptual terms that unambiguously identify different data types. An XML document generally includes a header element that specifies the schema it uses – for example, the W3C recommended schema at <http://www.w3.org/2001/XMLSchema>. That schema has elements that describe particular kinds of temporal values: “date” for ISO8601 dates precise to the day, “dateTime” for the addition of clock time, “gYear” for Gregorian calendar year.13 So xs:gYear, a tag identifying an XML element, could be used to mark up the string “-0398” to explain that the numbers refer to a Gregorian year according to the standards of the XML Schema. The TEI standard extends this further with elements that can provide additional information about dates that are not expressed in the ISO8601 format, such as non-Gregorian calendar systems, termini post and ante quem (“notBefore” and “notAfter”), and named periods, to which we will return below.14 If our example dataset were being serialized in TEI-XML, for example, the value “December 27, 101 CE” in the proleptic Julian calendar could be expressed as <origDate when-custom="0101-12-27" datingMethod="#julian">December 27, 101 CE</origDate>, where the element “origDate” denotes the marked-up information as the date of origin of a document or object, the element “when-custom” specifies a date in a non-Gregorian calendar system, and “datingMethod” specifies the calendar system for that date by referring to an element defined elsewhere in the TEI-XML file.15 Further markup could be added to indicate additional attributes like degree of certainty.
TEI-XML is a metadata schema specifically designed for texts, but ancient world datasets reflect material culture and art as well, and different types of data have different metadata requirements. Other widely-used metadata schemata or ontologies also have elements that facilitate the structured representation of absolute time, and several of these are commonly used across other information fields, which can make it easier to connect data across disciplines. The most important are the Dublin Core metadata terms and the W3C OWL-Time Ontology.16 Both of these standards can be used to make explicit the treatment of temporal information in a document, but they serve different purposes. There are several Dublin Core metadata elements that deal with time, but the one of greatest relevance here is dcterms:temporal, which indicates that a value refers to the “temporal characteristics of the resource” it qualifies (rather than, for example, the time of a record’s creation or a time-period that is the subject of a work). The Dublin Core specification is thus flexible but fairly non-specific. The OWL-Time Ontology, on the other hand, provides a rich set of elements that allow complex temporal logic (as described by Allen 1984) to be expressed through the metadata associated with the values in a dataset. Where the Dublin Core “temporal” element might be a calendar date or a range of dates or a verbal expression of time, the OWL-Time Ontology uses specific elements to deal with different temporal datatypes (intervals vs. instants, for example) and structures. Although it is helpful for working with Linked Data to have some knowledge of metadata and ontologies, for the purposes of this cookbook it may be enough simply to know what element should be used to name a given column in a spreadsheet. Using “dcterms:temporal” as a column header for object dates is a clearer expression of what the values in that column mean than just the term “date”, and it is easier to adapt to a Linked Data representation of the dataset.
| papyrus_number | subject | location | dcterms:temporal | notes | godot_uri_year | xs:date |
|---|---|---|---|---|---|---|
| p. Example 25 | tax list | Karanis | the first of the month of Tubi in the fifth year of the reign of Trajan | https://godot.date/id/ 7aLvi4Xn9QbhdTJFYu5Sdm | 0101-12-26 | |
| p. Example 32 | literary fragment | Tebtunis | 4th-3rd century BCE | -0398/-0199 | ||
| p. Example 45 | letter | Oxyrhynchus | uncertain | Late Antique? |
Relative Time: Periods
Our spreadsheet of papyri now has its original “date” column, with the original natural-language date expressions and the new label “dcterms:temporal”, and its additional column for standardized ISO8601 dates, now identified as “xs:date” values. A spreadsheet with this information could be fairly easily mapped to the EpiDoc standard and serialized for inclusion in an aggregation platform like papyri.info.17 But if it is like most spreadsheets related to the ancient world, a number of rows will have no absolute date values in any of these columns. Instead, the “date” column – or maybe a separate column called “period” or “notes”, as here – might contain a relative chronological expression like “Late Antique”. This presents a greater challenge for interoperability and linking: what does this particular spreadsheet mean by “Late Antique”? Is it referring to a named date range, a stylistic quality, or something else altogether? And if it is referring to a named date range, what chronological and geographic boundaries are implied?
One way to make this clear is by mapping a column of period terms to an ontology like OWL-Time (Cox and Richard 2015) or the simpler Dublin Core period datamodel,18 both of which make it possible to clearly explain the nature and boundaries of the temporal entities one is describing. There are other options more adapted to heritage disciplines: for example, the CIDOC Conceptual Reference Model, or CRM, which has an extension for archaeology and which has paid particular attention to the modeling of temporal entities like periods (Doerr, Kritsotaki, and Stead 2010; Niccolucci and Hermon 2015b; 2015a).19 Ontological mapping may be beyond the capabilities of many of us who would nevertheless like to share our data, however, and in this case there is a simpler solution: the meaning of a term like “Late Antique” can be made explicit by connecting it to a definition in an external authority, like a thesaurus, a defined vocabulary, or a gazetteer, just as other temporal entities like calendar systems or dates can. Other contributions to this cookbook will explain how to link other “named entities” like people or places to external references; the next part of this chapter will explain how to do so with periods.
Relative Time: Alignment
Let us return to our spreadsheet. With absolute dates, preparation for a Linked Data representation was a matter of standardization of calendar and format according to shared conventions. With period terms, however, there is no internationally agreed-on, language-independent format or standard: while the underlying concept of a period term like “Late Antiquity” may be recognizable, each country, each discipline, each school of thought within a discipline understands its temporal boundaries somewhat differently, and some use other terms altogether to describe the same conceptual entity. One way to deal with this lack of standardization is to express clearly within a dataset what is meant by a specific term. After ontological mapping that makes it clear that the entity involved is a temporal period, an internal defined vocabulary with specified dates may be used to provide valid values for the “period” element. This is the way the Pleiades project, for example, organizes its references to periods, which appear as follows in the RDF+XML representation of Late Antique period in a specific place record:
<pleiades:during>
<skos:Concept rdf:about="https://pleiades.stoa.org/vocabularies/time-periods/late-antique">
<skos:scopeNote xml:lang="en">
The Late Antique period in Greek and Roman history. For the purposes of Pleiades, this period is
said to begin in the year 300 and to end in the year 640 after the birth of Christ. [[300, 640]]
</skos:scopeNote>
<skos:prefLabel xml:lang="en">Late Antique (AD 300-AD 640)</skos:prefLabel>
<owl:sameAs rdf:resource="http://n2t.net/ark:/99152/p03wskd6psm"/>
<owl:sameAs rdf:resource="http://pleiades.stoa.org/vocabularies/time-periods/late-antique"/>
<skos:inScheme rdf:resource="https://pleiades.stoa.org/vocabularies/time-periods"/>
</skos:Concept>
</pleiades:during>
The pleiades:during element indicates that in the Pleiades datamodel, the place in question is attested in the historical record within a named timespan; the skos:Concept element refers to the Simple Knowledge Organization System datamodel, explaining that the named timespan is a conceptual object;20 and the rdf:about element points to a local defined vocabulary or thesaurus.21 In the Pleiades gazetteer, the definition of the period in time (and in some cases, though not here, in space) is contained in the period’s name (“AD 300-AD 640”), and explained in more detail in the scope note.
There is an additional piece of information here, however. The owl:sameAs element indicates that the concept described by the value “Late Antique” in this record is the same concept as that described in another source – in this case, in the PeriodO gazetteer of period definitions. This will become important for the discussion of the Linked Data representation of periodized information below, but at this stage it is worth pausing at the notion of an external vocabulary, because this is the second way that the meaning of a period term can be made explicit in a dataset. Even if full ontological modeling is not possible for a given dataset or data manager, named entities can still be mapped to external vocabularies that have already carried out that descriptive work, and can serve as shared points of reference to help connect datasets that use different terms or concepts. Several of these external resources are available for temporal data.
The most well-known is the Getty AAT, which maintains an extensive period vocabulary.22 Getty AAT identifiers for periods offer the advantage of a single conceptual representation for named entities that appear in idiosyncratic forms in individual sources. This advantage comes at the cost, however, of specific temporal boundaries: if “Iron Age” is to be a valid concept on a global level, it cannot be defined by dates that apply only in one place, so the only structured references to calendar time in the semantic representation of “Iron Age” in the Getty AAT refer to the creation and modification of the record itself. Temporal information about the period boundaries is expressed only in the note, which explains that the Iron Age “developed at different times in various parts of the world, first appearing in the Middle East and southeastern Europe around 1,200 BCE, and in China around 600 BCE. In the Americas, it did not develop from the Bronze Age but was introduced to Stone Age cultures by European explorers”.23 This definition works well for the explicit definition of the concept to which a dataset refers, but less well to identify the specific dates of the Iron Age in a particular place, which is likely to be closer to the way an ancient-world dataset uses the term (the Iron Age in Greece, the Iron Age in Britain, etc.).
A emerging resource that resolves some of the tension between abstract concept and concrete boundaries is the chronOntology project of the German Archaeological Institute.24 This project uses the CIDOC-CRM to model periods as entities that are both conceptual and situated in time and space. Like the Getty AAT, chronOntology provides URIs for periods, but because these entities are associated with both temporal and spatial boundaries, they make it easier to compare their duration with that of similar conceptual entities in other places (this comparison is further assisted by a timeline visualization of related concepts in the current interface). A search for the term “Late Antique”, for example, produces three unique, dated matches from the chronOntology dataset, which are distinguished by type (in terms of the domain where the temporal entity is used: politics, culture, material culture, architecture), source, and/or date range.25 We might choose one of these as a definition for our own use of “Late Antique” in our spreadsheet, and we could add its URI to a new column in which the URI appears for every row where an item has already been classified as “Late Antique”. The chronOntology platform has the added virtue of a union search across other period vocabularies – specifically, the Getty AAT and the PeriodO gazetteer.
Where the Getty AAT and the chronOntology dataset have attempted to unify period definitions under the smallest number of conceptual headings possible – on the assumption that two different users are likely to have the same concept in mind when they apply the label “Bronze Age”, and different opinions on specific spatiotemporal bounds can be reconciled within a single concept – the PeriodO gazetteer of period definitions has taken a different approach (Rabinowitz et al. 2016).26 Like the Getty AAT and chronOntology, it presents a semantic model for named conceptual entities and provides URIs for instances of those entities; but unlike those resources, its basic unit of analysis is a named period as defined in time and space by an individual authoritative source. The Getty AAT provides a URI for “the Iron Age” in the most general sense of the term; chronOntology provides URIs for period concepts that seem to be broadly accepted in a particular region, and as a result includes far more information about spatial and temporal boundaries than the AAT. Both resources, however, focus on the abstract concept more than concrete usage, and therefore include period designations that lack specific coordinates in space or time or both. Because PeriodO seeks to capture spatiotemporal usage rather than unified concepts, it only includes period definitions with coordinates in both space and time – and because usage can vary from authority to authority even with reference to the same abstract concept in the same place, it sets no limit on the number of varying spatiotemporal definitions related to the same abstract concept.
PeriodO is thus the most granular of the these three sources, and allows the most specific definition of the meaning of “Late Antique” in a given context: not the Getty AAT’s broadest “period in history and the style of art that developed after Severan rule in the Roman Empire”, nor chronOntology’s general definition for the Mediterranean as inferred from a particular publication, but “Late Antique” exactly as defined by the British Museum with respect to “Greece, Cyprus, eastern Balkans, parts of the Caucasus (Georgia, Armenia, etc.), Asia Minor, Levant, Egypt and parts of Italy”, in the original language and terms of the source (with the addition of an ISO8601 representation of the dates and an English translation of non-English labels).27 Like chronOntology, PeriodO uses four-part date ranges (earliest start, latest start, earliest stop, latest stop) to express temporal period boundaries that are themselves intervals, parsing the original natural-language temporal expressions where intent is clear.28
| papyrus_number | subject | location | dcterms:temporal | notes | godot_uri_year | xs:date | period_uri |
|---|---|---|---|---|---|---|---|
| p. Example 25 | tax list | Karanis | the first of the month of Tubi in the fifth year of the reign of Trajan | https://godot.date/id/ 7aLvi4Xn9QbhdTJFYu5Sdm | 0101-12-26 | ||
| p. Example 32 | literary fragment | Tebtunis | 4th-3rd century BCE | -0398/-0199 | |||
| p. Example 45 | letter | Oxyrhynchus | uncertain | Late Antique? | http://n2t.net/ark:/ 99152/p08m57hmdw6 |
These period thesauri or gazetteers thus operate at different levels of specificity, and as a result have different uses. For datasets in which the primary concern for data-linking is to identify the use of a shared concept, rather than a definition used by any one scholar, national community, etc., the Getty AAT is probably the most valuable resource – and certainly the simplest to search, since it will give the user one and only one “Late Antique” period. For datasets that require a greater degree of specificity about the type and coverage of a period concept, but not necessarily both spatial and chronological information, chronOntology offers both a union search and an API that can be exploited programmatically to extract period values according to specific search parameters. Here, however, the parsing of the results requires more effort on the part of the user: a free-text search for “Late Antique” returns seven matches from the chronOntology thesaurus, three of which have different date-ranges. And in PeriodO, a free-text search for “late antiqu*” returns 23 different definitions of “Late Antique” and “Late Antiquity”, with 15 clearly distinct date-ranges across 16 distinct spatial areas, in nine different original languages from seven different authorities. This certainly helps the user develop a sense of the term’s range of meanings, but it can make it more difficult to choose a specific definition from among them.29 To ease the matching of period definitions in PeriodO with named entities in a dataset, the project provides a reconciliation service that works with OpenRefine.30 This service allows the user to specify values in the existing data – period labels, dates, spatial coverage, etc. – that can be used to identify best matches from the PeriodO dataset. Those matches are provided through the reconciliation service, and the user can add them to a new column in the data table. The process of matching concepts or terms in one dataset to those in another is generally referred to as “alignment”, and it is important for the representation of a given dataset as “five-star” Linked Open Data.
Relative Time: Semantic Representation
The stars in “five-star Linked Open Data” refer to a ranking system originally proposed by Sir Tim Berners-Lee, who has suggested that data on the semantic web could be assessed according to its degree of openness and interconnectivity (Berners-Lee 2010). In the same post, he provides a succinct explanation of the transformation of a dataset like our hypothetical papyrus spreadsheet into Linked Data: “The simplest way to make linked data is to use, in one file, a URI which points into another.” Aligning the period terms in our spreadsheet to URIs for those terms in the Getty AAT, chronOntology, or PeriodO makes our temporal assertions more transparent and gives us an authoritative source for a date-range expressed in ISO8601-formatted dates. But when the data in that spreadsheet are mapped to a semantic-web datamodel like that of the Resource Description Format (RDF) (Heath and Bizer 2011) and expressed in that format or another serialization such as JSON-LD,31 the use of a URI from an external resource in connection with a value in the dataset makes it possible for that dataset to be connected to the broader web of Linked Data. To walk through Berners-Lee’s five stars with our spreadsheet, we can acquire the first star simply by publishing a screenshot of the spreadsheet to the web under an open license; by making the original Excel spreadsheet itself available under an open license, we’ve already expressed the data in machine-readable structured form, which gets us a second star; and if we make that spreadsheet available in a non-proprietary format like CSV, we reach three stars. The fourth star is awarded for the representation of the data as RDF or its equivalents, where the RDF datamodel and references to external, open, shared metadata schemata and ontologies explain what the data in the dataset mean. The final star is reached when the data themselves are connected through URI citations to other Linked Open Data datasets, creating the web of Linked Open Data.
The inclusion of a period URI from a Linked Data source like the Getty AAT, chronOntology, or PeriodO in a periodized dataset thus prepares that dataset for connection to a wider world of periodized information. As more datasets make those connections, it will become easier to discover information from the same period or date-range across multiple heterogeneous datasets. The proof of this concept has been provided by the Pelagios project, which has used references to spatial gazetteers shared across various datasets to link, aggregate, and visualize spatial data in those datasets (Isaksen et al. 2014; Simon, Barker, and Isaksen 2012).32 Several ancient-world data standards now include specifications for the inclusion of periods as named entities in a serialized dataset. These specifications can be taken as models for the semantic representation of periods as Linked Data. I will describe three of them here: EpiDoc TEI-XML, the Pelagios Gazetteer Interchange Format, and the Linked Pasts “Linked Places Format.”
The EpiDoc standard has already been cited for the expression of absolute dates. As authorities for periods as named entities have begun to appear, EpiDoc has been expanded to include this datatype. A period can be attached to an EpiDoc record according the following basic format, using the @period attribute and an external URI:
<origDate notBefore="0501" notAfter="0700" period="http://n2t.net/ark:/99152/p08m57hmdw6">Late Antique</origDate>
In the origDate element, EpiDoc permits the addition of notations that specify the precision of the temporal boundaries with the @precision attribute (e.g. precision="medium") and the type of evidence on which the dating is based with the @evidence attribute (e.g. evidence="lettering"). The specification also states that the period referenced “need not correspond to the numerical date range offered”, so that a given inscription can be associated with both its own absolute dates and a broader cultural or stylistic period.33
The Pelagios Gazetteer Interchange Format (PGIF) was the format developed to connect places between different gazetteers in order to aggregate spatial data from the ancient world and to facilitate the creation of new Linked Data through semantic annotation platforms like Recogito.34 Because both the names and locations of places are time-dependent, this standard included ways to associate places with timespans. Gazetteers published to Pelagios had to be represented as single RDF files. Within those files, references to calendar dates and periods were formatted as follows (example from the TTL/N3 representation of the Epigraphic Database Heidelberg):
@prefix xsd: <http://www.w3.org/2001/XMLSchema> .
@prefix pelagios: <http://pelagios.github.io/vocab/terms#> .
@prefix relations: <http://pelagios.github.io/vocab/relations#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<https://edh-www.adw.uni-heidelberg.de/edh.inscriptions.n3#HD057724> a pelagios:AnnotatedThing ;
dcterms:title "identification inscription of Sparta (modern Spárti, findspot: Theater) HD057724" ;
dcterms:identifier <https://edh-www.adw.uni-heidelberg.de/edh/inschrift/HD057724> ;
foaf:homepage <https://edh-www.adw.uni-heidelberg.de/edh/inschrift/HD057724> ;
dcterms:temporal "201/400" ;
dcterms:temporal <http://n2t.net/ark:/99152/p0jrrjb67vs> ;
dcterms:language "la" .
The @prefix values represent the prefix-namespace bindings that provide the metadata elements used in the representation, so @prefix dcterms: <http://purl.org/dc/terms/> points to the Dublin Core namespace, which identifies the meaning of the temporal attribute. That attribute is used here both for an absolute date range and for a reference to an external URI: in this case, a URI for a representation of the Epigraphic Database Heidelberg’s “Late Antiquity” period in the PeriodO dataset. Here the external identifier corresponds directly to the period vocabulary used internally, but unlike a simple textual label, it provides a stable reference and structured-data representation that makes it easier to align those temporal expressions with other datasets. Note that the date-range for the inscription is not identical to the date-range for the period: as in EpiDoc, an entity can have its own specific dates but also be described with a more general period reference.
PGIF has now officially been superseded by the Linked Places Format developed by the Linked Pasts working group of Pelagios Commons, in close collaboration with the World Historical Gazetteer (the conceptual foundation for this format is described in Grossner, Janowicz, and Kessler 2016).35 Like the PGIF, the Linked Places standard focuses on interoperability between spatial datasets, but it makes time a much more central element; it relies, for example, on GeoJSON-T, a temporal extension to GeoJSON, which is a JSON standard used to describe space. The Linked Places standard, expressed as JSON-LD, includes elements that describe the general properties of a temporal interval – @start, @end, @earliest, @latest, @duration – as well as @periods, an element that can be used to describe one or more named intervals associated with a place. The RDF ontology that underlies this standard makes extensive use of the OWL-Time ontology, which makes it possible to bound named intervals with other named intervals and to apply Allen’s temporal logic to chronological relationships.36 The @earliest and @latest elements can be combined with the @start and @end elements, so that an interval may have an earliest and latest start and/or an earliest and latest end, as in the four-part dates of the PeriodO and chronOntology datamodels. This structure is due in part to the interest of this ontology in events, which, like periods, are named entities with durations, but offer a finer degree of precision for temporal reasoning (Shaw, Troncy, and Hardman 2009). Temporal attributes in this ontology can be represented as follows, according to the JSON-LD example available in the Linked Pasts GitHub repository; here I adapt the temporal information for the inscription from Sparta in the Epigraphic Database Heidelberg used in the previous example.37
"when": {
"timespans": [
{
"start": {"in":"0210"},"end": {"in":"0400"}
}
],
"periods": [
{
"name": "Late Antiquity",
"@id": "periodo:p0jrrjb67vs"
}
],
"label": "Date range for creation of HD057724",
"duration": "P190Y"
},
This describes an object with temporal attributes including a start in 210 CE, an end in 400 CE, and a duration of 190 years, as well as an association with the PeriodO URI http://n2t.net/ark:/99152/p0jrrjb67vs, which refers to the definition of “Late Antiquity” used by the Epigraphic Database Heidelberg.
The World Historical Gazetteer uses the Linked Pasts standard to describe the spatio-temporal, place-centered datasets that it integrates. It also offers a reconciliation service that allows users with relatively simple gazetteer data in the LD-TSV format to upload their datasets, validate the format of the dataset against the LD-TSV standard, and reconcile placenames with the Getty Thesaurus of Geographic Names and Wikidata, at which point the dataset can be accessioned by the World Historical Gazetteer for online publication.38 The LD-TSV format requires a start date in the ISO8601 format (with no zero year) for each place entry, and it allows an optional end date in the same format. This is a concrete demonstration of the value of date normalization and an easy way for scholars who wish to publish place-based datasets to transform their spreadsheets into Linked Data. To represent more temporally complex datasets, however, and especially datasets that contain both absolute dates and period terms, it is still necessary to use the full Linked Pasts JSON-LD specification. Files in this format, with the extension “lpf”, can also be uploaded, reconciled, and submitted through the data portal for incorporation in the World Historical Gazetteer. Perhaps even more exciting is the development of a “Linked Traces” format that will use the W3C Web Annotation model to connect objects, people, events, concepts, and other phenomena that move across time and space.39
Relative Time: Making New Periods
At this point, we have walked through the standardization and Linked Data representation of absolute calendar dates; we have discussed the alignment of relative dates expressed as named entities (periods) with Linked Data vocabularies, and the addition of URIs to a hypothetical dataset; and we have considered the serialized Linked Data representation of both absolute and relative temporal attributes in TEI-XML (the EpiDoc standard), RDF/TTL (the Epigraphic Database Heidelberg), and JSON-LD (the Linked Places standard). These steps toward the Linked Open Data publication of a dataset move along a scale of increasing complexity, at the most complex end of which is ontological mapping and the creation of new Linked Data vocabularies and schemata to represent the idiosyncrasies of a particular dataset. For some projects, it will be sufficient to format absolute time according to ISO8601 standards and map named temporal expressions to an external vocabulary like the Getty AAT, chronOntology, or PeriodO. Other projects will have the resources to express their own defined period vocabulary as fully-formed Linked Data entities with their own URIs. But in some cases, a project will have a period vocabulary that does not align easily with any of the existing sources for period identifiers, but will lack the resources to create its own URIs and Linked Data representation for that vocabulary. These cases are a particular concern of the PeriodO project, which aims to become a repository for user-contributed period definitions as well as a curated dataset. The project therefore incorporates a specific workflow for the addition of data by the user community.
The PeriodO datamodel is built around two basic datatypes: statements that identify temporal and spatial boundaries for a period term according to a particular authoritative source (these were called “definitions” in the first version of the dataset; now they are labeled “periods”); and sets of definitions expressed by the same authoritative source (“collections” in the first version, now “authorities”). Users may contribute new periods to existing authorities (for example, if an ongoing digital project adds new period terms to its vocabulary), or they may contribute new authorities (a new set of period definitions from an authoritative source not already represented in the dataset, preferably differing from definitions already in the dataset in time, space, or name). The PeriodO data interface has recently been redesigned to incorporate better spatial and temporal visualizations, improve searching and browsing, and facilitate the creation of new authorities and periods by users.40 Instructions for the addition of new authorities and periods are available on the PeriodO project website41 and have been discussed in several publications (Rabinowitz et al. 2016; Rabinowitz, Shaw, and Golden 2018), but the following section will provide a brief overview of the process.
To create new authorities and definitions, the user must first add a new editable data source in the data-source selection page, by default the splash page of the new browser client.42 To do so, the user selects “In-browser (editable)” in the “Add data source” section of the page. This will create a new dataset stored as a local Indexed Database in the browser cache. In some browsers, the user will be prompted to allow the client to use persistent storage, but in general, if the browser cache is cleared, the dataset will disappear, so it is advisable to navigate to the “Settings” page in the “Data source” menu to create a backup copy, in the form of a downloadable JSON file. When this new in-browser dataset is selected as a data source, it is blank by default; existing authorities can be added from the menu item “Import changes”, which will copy selected authorities or periods from another data source (e.g., the “Canonical” data source). The import of the relevant information from the Canonical dataset is necessary if the user intends to make additions or changes to authorities or periods that have already been published in PeriodO, or to define new periods that are derived from existing definitions.
To add new periods that are not associated with an existing authority in the Canonical dataset, the user must first create the authority in which those periods are defined. This can be done with the “Add authority” command in the “Data source” menu. An authority will usually be either a published source (a book, article, etc.) or an online project with some claim to expertise in a particular chronological context. If the source is a publication with a citation represented as Linked Data through WorldCat or CrossRef, entering a URL or DOI in the “Most published sources” tab will populate citation data automatically; otherwise, the user can manually enter citation data by clicking on the “Other sources” tab. Publication information populated automatically in the first tab can also be edited or expanded in the second.
Once the authority is created, the user can add period definitions by selecting the “Add period” item in the “Authority” menu. These definitions are added through a form that requires certain fields: an original label (in any language and script), a natural-language expression of spatial coverage as stated by the authority, and start and stop dates. The original label, spatial coverage description, and start/stop labels should all be expressed exactly as they appear in the source, in language, script, and wording, without any translation, editing, or parsing by the user. This is of central importance to the PeriodO model, which seeks to record definitions in their original form. The only required parsing is the expression of the start and stop dates as Gregorian years in ISO8601 format, in the “Start year” and “Stop year” fields. If one or both of these dates is a single year in a standard format (e.g. “200 B.C.” or “AD 10” or “250 CE”), the interface will parse them automatically. If the dates are in non-standard formats or involve ranges, the user can deselect “Parse dates automatically” and carry out conversions to ISO8601 manually (note that PeriodO uses the ISO8601 standard that does include a year zero, so BCE dates will all add one year: e.g. “200 BC” becomes “-0199”). In manual mode, checking the “Year range (not a single year)” box will also make it possible to express start, stop, or both in terms of two-part earliest/latest dates, so that a period that begins in the “first quarter of the third century” and ends in “AD 550/600” can be expressed as a four-part date range with earliest start: 0200, latest start: 0225, earliest stop: 0550, latest stop: 0600. The user may also provide additional optional parsing or information, including translations of the period label into other languages, mapping of the description of spatial coverage to named geographic entities represented in Wikidata and other preselected gazetteers, and locator information (the page on which the definition appeared in a print source, the URL of an online source). The current interface also makes it possible to provide additional semantic information about related periods: a period can be identified as a part of a broader period (e.g. “Late Bronze Age” is part of “Bronze Age”), or as a derivative of a period in another authority, to help chart chains of scholarly influence. Additional fields are available to record explanations of the definition offered by the source (“Note”) and any explanations related to the user’s creation of the record (“Editorial Note”). Once a definition is saved, it appears in the local data source just as it would in the Canonical dataset.
Currently, local data sources can be downloaded as JSON files through the “Back up data source” button on the data-source “Settings” page. These files can be loaded back into the client using the “Restore from backup” button on the data-source selection page; this allows a user not only to back up a local dataset to prevent data loss, but also to share it with other users, who can load the backup file as an editable local data source in their own browsers using the “Restore from backup” button. But user-generated authorities and periods are identified in local data sources with temporary identifiers, not URIs. Persistent URIs are only assigned when those authorities or periods are published in the curated Canonical dataset. For inclusion in that dataset, new data must be submitted by users with ORCIDs, in the form of patches. This process maintains a changelog and a provenance history. Patch submission takes place through the “Submit changes” item in the “Data source” menu. After choosing the data source to which the changes should be submitted (to add to the public PeriodO dataset, the source should be “Canonical”), the user is prompted to select which new or modified authorities or periods should be included in the patch. Submitted patches are reviewed and accepted or rejected by the PeriodO editors. When a patch is accepted, new authorities and/or periods are added to the canonical dataset and receive URIs in the form of ARK IDs. These ids include the PeriodO “shoulder”, minted and guaranteed by the EZID system of the California Digital Library, followed by a five-digit alphanumeric suffix that identifies the authority and a further four-digit alphanumeric suffix that identifies the specific period, as follows:
Here the ARK ID provides a unique identifier for the PeriodO dataset as a whole; the first suffix identifies the Epigraphic Database Heidelberg authority; and the second suffix identifies the “Late Antiquity” period. The HTTP URIs resolve to the individual authorities and periods using the suffix pass-through protocol of the ARK ID system (Kunze and Rodgers 2013).
There are several benefits to adding new periods to the PeriodO dataset. Simply creating authorities and periods using a local in-browser data source provides a structured Linked Data representation of that information that can be reused in a larger Linked Data dataset. Submitting a patch with new authorities and periods to the main dataset furnishes those authorities and periods with persistent URIs, which further enhances the expression of a dataset as Linked Data. And contributing to the PeriodO gazetteer increases the general scholarly understanding of the ways different fields and traditions deploy periodization to organize information. In the future, it may also make a user’s dataset easier to aggregate with others and more discoverable in chronological searches.
Conclusion: Why Take the Time?
All of the steps described in this recipe for the expression of temporal information as Linked Data are difficult and time-consuming. Even the conversion of non-Gregorian dates to the ISO8601 standard is burdensome, especially when the question of a year zero comes up. Meanwhile, the aggregated time-based search and visualization tools that would demonstrate the usefulness of Linked Data representations of time are still in the prototype stage, and integration work like that done by the Pelagios project with data from ancient-world spatial gazetteers has not yet been carried out. While there are some examples of visualizations that take advantage of periodized data,43 and while there are a number of current initiatives focused on developing the standards and platforms that will facilitate future aggregation and visualization of chronological information,44 there is still no temporal equivalent to the Peripleo spatial search engine. So why go to the trouble of adapting the dates and periods in your spreadsheet or relational database table to Linked Data datamodels, or contributing your period definitions to a gazetteer like PeriodO?
The representation of temporal information as structured data offers both short- and long-term dividends, and these dividends are multiplied in a dataset represented as Linked Data. In the short term, clear indications of calendrical system, the standardization of dates and date formats, and the explanation of period terms with reference to internal or external vocabularies simply make it easier for others to understand and reuse a given dataset. As data publication becomes more prevalent, it is worth thinking about that hypothetical spreadsheet of papyri not just in terms of the research carried out by its creator, but in terms of its benefit to a larger community. The more the creator can be transparent about the meanings of the datatypes a dataset contains, the more useful it will be to a broader audience. This is especially true when the creator of a dataset wishes to develop alternative or non-canonical vocabularies. Because of its focus on semantic information, the Linked Data ecosystem is inherently multivocal: if I have a simple way to explain exactly what I mean by a period term, I can be understood without having to use a conventional term from a defined vocabulary that may not fit my use case well.
In the medium term, the semantic representation of period terms and the association of those terms with URIs in other datasets will make it easier to search across heterogeneous datasets by time. We can now use tools like the Peripleo browser to find information associated with a given ancient place across a variety of databases, even if they do not all call that place by the same name. If a similar Linked Data approach is adopted for periodized data, we will soon be able to search across datasets for items related to the Late Antique period, even when definitions of that period vary from database to database – or, conversely, we will be able to search for items dated to the range 400-500 CE and find records like the papyrus in our hypothetical spreadsheet marked only as “Late Antique”. Thus, not only will this approach to periodization make an individual dataset more transparent, but it will also make the records in that dataset more discoverable and easier to align and aggregate with other datasets. As an added benefit, it will also facilitate the digital visualization of the temporal distribution of items in a dataset: a URI that associates a period label like “Late Antique” with dates will allow records with that label to appear on a digital timeline.
In the long term, the structured representation of periods in datasets related to the past will help us to understand our own scholarly disciplines. Transparent references to the meaning of period terms will highlight the way that usage changes over time, as periodizations are revised or new periodizations are proposed. They will also highlight differences of opinion across scholarship and culture. These genealogical and discursive trends are a particular focus of PeriodO, which tracks derivative relationships across definitions in different authorities, and which is developing visualization tools to compare the spatiotemporal extents assigned to the same term by different scholars at different times. Even more interesting is the possibility that structured temporal information expressed as Linked Open Data will provide training data for natural-language-processing approaches to text corpora. In this scenario, the transparent period definitions and date expressions in Linked Data datasets will be used to help machine-learning algorithms recognize and parse unstructured, opaque temporal references in texts. The first steps in this direction have already been taken (de Boer, van Someren, and Wielinga 2010; Mouroutsou, Markantonatou, and Papavasiliou 2014; Binding, Tudhope, and Vlachidis 2018), and as the quantity of both structured period definitions and structured periodized data grows, this approach will become even more robust. But the potential of linked temporal data to transform future research will only be realized if the historical disciplines take the time in the present to explain and align the ways we talk about time in the past – and share their data openly.
Bibliography
Allen, James F. 1984. “Towards a General Theory of Action and Time.” Artificial Intelligence 23 (2): 123–54. https://doi.org/10.1016/0004-3702(84)90008-0.
Berners-Lee, Tim. 2010. “Linked Data - Design Issues.” Www.W3c.Org. 2010. https://www.w3.org/DesignIssues/LinkedData.html.
Binding, Ceri, Douglas Tudhope, and Andreas Vlachidis. 2018. “A Study of Semantic Integration across Archaeological Data and Reports in Different Languages.” Journal of Information Science. https://doi.org/10.1177/0165551518789874.
Boer, Viktor de, Maarten van Someren, and Bob J. Wielinga. 2010. “Extracting Historical Time Periods from the Web.” Journal of the American Society for Information Science and Technology 61 (9): 1888–1908. https://doi.org/10.1002/asi.21378.
Cayless, Hugh, Charlotte Roueché, Tom Elliott, and Gabriel Bodard. 2009. “Epigraphy in 2017.” Digital Humanities Quarterly 3 (1). http://www.digitalhumanities.org/dhq/vol/3/1/000030/000030.html.
Cox, S. J. D., and S. M. Richard. 2015. “A Geologic Timescale Ontology and Service.” Earth Science Informatics 8 (1): 5–19. https://doi.org/10.1007/s12145-014-0170-6.
Dee, Stella, Maryam Foradi, and Filip Šarić. 2016. “Learning by Doing: Learning to Implement the TEI Guidelines through Digital Classics Publication.” In Digital Classics Outside the Echo-Chamber: Teaching, Knowledge Exchange & Public Engagement, edited by Gabriel Bodard and Matteo Romanello, 15–32. London: Ubiquity Press. http://dx.doi.org/10.5334/bat.b.
Deville, Pierre, Catherine Linard, Samuel Martin, Marius Gilbert, Forrest R. Stevens, Andrea E. Gaughan, Vincent D. Blondel, and Andrew J. Tatem. 2014. “Dynamic Population Mapping Using Mobile Phone Data.” Proceedings of the National Academy of Sciences 111 (45): 15888–93. https://doi.org/10.1073/pnas.1408439111.
Doerr, Martin, Athina Kritsotaki, and Stephen Stead. 2010. “Which Period Is It? A Methodology to Create Thesauri of Historical Periods.” In Beyond the Artifact. Digital Interpretation of the Past. Proceedings of CAA2004, Prato 13-17 April 2004, edited by Franco Niccolucci and Sorin Hermon, 70–75. Budapest: Archaeolingua. http://proceedings.caaconference.org/files/2004/10_Doerr_et_al_CAA_2004.pdf.
Grafton, Anthony T. 1975. “Joseph Scaliger and Historical Chronology: The Rise and Fall of a Discipline.” History and Theory 14 (2): 156–85. https://doi.org/10.2307/2504611.
Grossner, Karl, Krzysztof Janowicz, and Carsten Kessler. 2016. “Place, Period, and Setting for Linked Data Gazetteers.” In Placing Names: Enriching and Integrating Gazetteers, edited by Merrick Lex Berman, Ruth Mostern, and Humphrey Southall, 80–96. Bloomington: Indiana University Press.
Heath, Tom, and Christian Bizer. 2011. Linked Data: Evolving the Web into a Global Data Space. 1st ed. Vol. 1. Morgan & Claypool. http://www.morganclaypool.com/doi/abs/10.2200/S00334ED1V01Y201102WBE001.
Isaksen, Leif, Rainer Simon, Elton T. E. Barker, and Pau de Soto Cañamares. 2014. “Pelagios and the Emerging Graph of Ancient World Data.” In Web Sci ’14. Proceedings of the 2014 ACM Conference on Web Science, 197–201. ACM. http://dl.acm.org/citation.cfm?id=2615569.2615693.
Kunze, John, and R. Rodgers. 2013. “The ARK Identifier Scheme.” https://tools.ietf.org/html/draft-kunze-ark-15.
Mouroutsou, Maria, Stella Markantonatou, and Vasilis Papavasiliou. 2014. “The Development of Vocabularies of Historical Period Names from Web Acquired Corpora.” Mediterranean Archaeology & Archaeometry 14 (4): 165–74.
Netser, Michael. 1998. “Population Growth and Decline in the Northern Part of Eretz-Israel during the Historical Period as Related to Climatic Changes.” In Water, Environment and Society in Times of Climatic Change: Contributions from an International Workshop within the Framework of International Hydrological Program (IHP) UNESCO, Held at Ben-Gurion University, Sede Boker, Israel from 7–12 July 1996, edited by Arie S. Issar and N. Brown, 129–45. Kluwe.
Niccolucci, Franco, and Sorin Hermon. 2015a. “Time, Chronology and Classification.” Mathematics and Archaeology, 257–70.
———. 2015b. “Representing Gazetteers and Period Thesauri in Four-Dimensional Space–Time.” International Journal on Digital Libraries, July, 1–7. https://doi.org/10.1007/s00799-015-0159-x.
Rabinowitz, Adam, Ryan Shaw, Sarah Buchanan, Patrick Golden, and Eric Kansa. 2016. “Making Sense of the Ways We Make Sense of the Past: The PeriodO Project.” Bulletin of the Institute of Classical Studies 59 (2): 42–55. https://doi.org/10.1111/j.2041-5370.2016.12037.x.
Rabinowitz, Adam, Ryan Shaw, and Patrick Golden. 2019. “Making up for Lost Time: Digital Epigraphy, Chronology, and the PeriodO Project.” In Crossing Experiences in Digital Epigraphy. From Practice to Discipline, edited by Annamaria De Santis and Irene Rossi, 200–213. Warsaw and Berlin: De Gruyter.
Rasmussen, S. O., K. K. Andersen, A. M. Svensson, J. P. Steffensen, B. M. Vinther, H. B. Clausen, M.-L. Siggaard Andersen, et al. 2006. “A New Greenland Ice Core Chronology for the Last Glacial Termination.” Journal of Geophysical Research: Atmospheres 111 (D6). https://doi.org/10.1029/2005JD006079.
Rees, G., and A. Keinan-Schoonbaert. 2019. “Introducing the WebMaps-T working Group.” Medium. June 26. https://medium.com/pelagios/introducing-the-webmaps-t-working-group-7cff98021e42.
Shaw, Ryan, Raphaël Troncy, and Lynda Hardman. 2009. “LODE: Linking Open Descriptions of Events.” In The Semantic Web, edited by Asunción Gómez-Pérez, Yong Yu, and Ying Ding, 153–67. Lecture Notes in Computer Science. Springer Berlin Heidelberg.
Simon, Rainer, Elton Barker, and Leif Isaksen. 2012. “Exploring Pelagios: A Visual Browser for Geo-Tagged Datasets.” Workshop presented at the International Workshop on Supporting User’s Exploration of Digital Libraries. http://eprints.soton.ac.uk/343484.
Notes
1 http://www.getty.edu/research/tools/vocabularies/aat; an example of a Unique Resource Identifier (URI) for a calendrical system is http://vocab.getty.edu/page/aat/300404479, which refers to the Maya calendar as a conceptual entity. Mapping concepts in a dataset to external ontologies and vocabularies is also the first step toward the preparation of a Linked Data representation, as I will explain below, but it is also useful simply for transparency.
2 For example, the periods specified by the ARIADNE data platform use BP with a “present” of 2000 (http://n2t.net/ark:/99152/p0qhb66), as do some paleoclimate studies (e.g. Netser 1998, p. 135). The Centre for Ice and Climate at the University of Copenhagen has recently proposed the notation “b2k” to refer to BP dates counting back from 2000, with respect to a new ice-core chronology (Rasmussen et al. 2006).
3 https://godot.date/about. On a more expansive level, with significant implications for the aggregation of data with absolute dates expressed in different calendrical systems, the Austrian Centre for Digital Humanities and Cultural Heritage and DARIAH-EU have recently implemented "Linked Open Date Entities", a vocabulary with URIs for every absolute date between the third millennium BCE and the third millennium CE, from a given millennium to a given day-month-year (https://vocabs.acdh.oeaw.ac.at/date_entities/en/).
4 For example, the multi-system converter at https://www.fourmilab.ch/documents/calendar, which, when given a date in any one of a number of systems, will express it in all of the calendars on the page.
5 The British Julian date of February 28, 1750 would thus correspond to the Gregorian March 11, 1751, since for legal purposes the British new year fell on March 25th of the Julian calendar before the Gregorian reform.
6 https://www.iso.org/standard/40874.html. A new revision in two parts (ISO/FDIS 8601-1 and 8601-2:2019, basic rules and extensions) has recently been published, and is available at for purchase at https://www.iso.org/standard/70907.html and https://www.iso.org/standard/70908.html. English-language previews are available at https://www.iso.org/obp/ui#iso:std:iso:8601:-1:ed-1:v1:en and https://www.iso.org/obp/ui#iso:std:iso:8601:-2:ed-1:v1:en, and a 2016 draft version of the full standard can be found on the Library of Congress website at http://www.loc.gov/standards/datetime/iso-tc154-wg5_n0038_iso_wd_8601-1_2016-02-16.pdf.
7 Dates for Egyptian documents can be converted using the very useful service provided by the Date Converter for Ancient Egypt, http://aegyptologie.online-resourcen.de/home.
8 The work of the Hypermedia Research Group at the University of South Wales is especially relevant here: http://hypermedia.research.southwales.ac.uk.
9 EDTF: http://www.loc.gov/standards/datetime/edtf.html. Especially relevant for historical data are the addition of the characters “?”, “˜”, and “%” after the four-digit date to indicate uncertainty, approximation, and uncertain approximation, respectively.
11 https://godot.date/tools/openrefine.
12 TEI Consortium: http://www.tei-c.org. EpiDoc: https://sourceforge.net/p/epidoc/wiki/Home and https://github.com/EpiDoc, with guidelines at http://www.stoa.org/epidoc/gl/latest.
13 https://www.w3.org/TR/xmlschema-2/#built-in-primitive-datatypes.
14 http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ND.html.
15 For the application of this schema in the EpiDoc standard, see http://www.stoa.org/epidoc/gl/latest/supp-historigdate.html.
16 Dublin Core Metadata Initiative metadata terms: http://dublincore.org/documents/dcmi-terms; OWL-Time Ontology: https://www.w3.org/TR/owl-time/#time:TRS.
17 Main site at http://papyri.info; Github repository at https://github.com/papyri.
18 http://dublincore.org/documents/dcmi-period.
19 CIDOC-CRM specification: http://www.cidoc-crm.org; CRMarchaeo extension: http://new.cidoc-crm.org/crmarchaeo/fm_releases. A streamlined version of the CIDOC-CRM forms the core ontology of the Linked Art data model currently under development by the Getty Institute (https://linked.art/model/index.html); the website for this project provides a good example for the implementation of timespans in a particular model, and it also offers a useful explanation of the JSON-LD serialization it uses (https://linked.art/model/jsonld).
20 SKOS reference: https://www.w3.org/TR/2009/REC-skos-reference-20090818.
21 The Pleiades period thesaurus can be found at https://pleiades.stoa.org/vocabularies/time-periods.
22 A search for “period” in the Getty AAT will return identifiers for hundreds of periods organized by space/culture and arranged in hierarchical facts with multilingual and semantic representations.
23 http://vocab.getty.edu/page/aat/300019279.
24 http://chronontology.dainst.org.
25 http://chronontology.dainst.org/period/sTAli5CHCKzd; http://chronontology.dainst.org/period/KyfgoiHYmbZo; http://chronontology.dainst.org/period/b6lAiLW6tfW4.
26 https://perio.do; http://n2t.net/ark:/99152/p0.
27 “Late Antique” in the Getty AAT: http://vocab.getty.edu/page/aat/300020666; “Late Antique” in the Mediterranean derived from Peter Brown, The World of Late Antiquity (1989), in chronOntology: http://chronontology.dainst.org/period/KyfgoiHYmbZo; “Late Antique” in time and space, exactly as expressed by the British Museum’s defined period vocabulary, in PeriodO: http://n2t.net/ark:/99152/p08m57hmdw6.
28 For this reason, dates qualified with “circa” or “around” are rendered in the ISO8601 representation as a single Gregorian year, since the extent of uncertainty is rarely clear from the source – but a start-date expressed as “first half of the 5th century” can be represented with confidence as “earliest start: 499, latest start: 450”.
29 Work is ongoing on an extension to the PeriodO datamodel and platform that will allow users to create custom aggregations of similar periods; these aggregations would be defined by the earliest start, latest stop, and broadest spatial coverage of all the periods in a set, and given their own dereferenceable URIs – so that a user could refer to a “Late Antique” period definition derived from the aggregation of a dozen of the 23 results from this search.
30 https://github.com/periodo/periodo-reconciler.
32 See, for example, the Pelagios Linked Data search engine and visualization interface, Peripleo, at https://peripleo.pelagios.org.
33 http://www.stoa.org/epidoc/gl/latest/supp-historigdate.html.
34 Pelagios Gazetteer Interchange Format: https://github.com/pelagios/pelagios-cookbook/wiki/Pelagios-Gazetteer-Interconnection-Format; Recogito annotation platform: https://recogito.pelagios.org.
35 See the 2017 whitepaper of the working group at http://linkedpasts.org/assets/lp_whitepaper.pdf. The Linked Places JSON-LD context and a simplified version, designated LD-TSV, are available in the Linked Places repository of the Linked Pasts Github site (https://github.com/LinkedPasts/linked-places) at https://raw.githubusercontent.com/LinkedPasts/linked-places/master/linkedplaces-context-v1.1.jsonld and https://github.com/LinkedPasts/linked-places/blob/master/tsv_0.2.md.
36 The most current draft (version 1.1) of the Linked Places ontology in RDF is available at http://linkedpasts.org/ontology/lpo_latest.ttl. See also the OWL Time Ontology at https://www.w3.org/TR/owl-time.
37 Sample expression: https://raw.githubusercontent.com/LinkedPasts/linked-places/master/linkedplaces-sample-v1.json. The JSON-LD context for data represented in this format is provided by the data standard cited in n. 35.
38 A reconciliation tutorial is available at http://dev.whgazetteer.org/tutorials/walkthrough.
39 Linked Traces format draft v.0.2: https://github.com/LinkedPasts/linked-traces-format; Web Annotation Model: https://www.w3.org/TR/annotation-model; Linked Traces and the World Historical Gazetteer: http://dev.whgazetteer.org/tutorials/traces.
40 The update to the user interface has no effect on the HTTP resolution of the URIs in the dataset as URLs. The permanent PeriodO URI, an Archival Resource Key (ARK) ID in the form http://n2t.net/ark:/99152/p0, will always point either to the most stable version of the browser client or, in the event that the client ceases to be supported, to an archival copy of the most current version of the dataset as a JSON-LD serialization.
43 See, for example, the temporal browser in the ARIADNE data integration platform for archaeology at http://portal.ariadne-infrastructure.eu or the TopoTime datamodel and visualization platform developed by Karl Grossner and Elijah Meeks at http://dh.stanford.edu/topotime/demo_py.html, with ongoing documentation at https://github.com/kgeographer/Topotime.
44 The Timeline Consortium (https://timelineconsortium.org) is working to develop interoperable standards and tools for timeline visualizations; Karl Grossner has been developing GeoJSON-T, a temporal extension to the GeoJSON standard (https://github.com/kgeographer/geojson-t); and the WebMaps-T Working Group of the Pelagios Network now seeks to create a new Javascript library for an integrated map and timeline to display data represented as GeoJSON-T (Rees and Keinan-Schoonbaert 2019).