This article is available at the URI http://dlib.nyu.edu/awdl/isaw/isaw-papers/20-13/ as part of the NYU Library's Ancient World Digital Library in partnership with the Institute for the Study of the Ancient World (ISAW). More information about ISAW Papers is available on the ISAW website.

This article can be downloaded as a single file

ISAW Papers 20.13 (2021)

Applied Use of JSON, GeoJSON, JSON-LD, SPARQL, and IPython Notebooks for Representing and Interacting with Small Datasets

Sebastian Heath, Institute for the Study of the Ancient World, New York University

URI: http://hdl.handle.net/2333.1/t1g1k70v

In: Sarah E. Bond, Paul Dilley, and Ryan Horne, eds. 2021. Linked Open Data for the Ancient Mediterranean: Structures, Practices, Prospects. ISAW Papers 20.

Abstract: This paper describes the role of standards-based and open source file formats and tools in representing and interacting with small datasets. The example used is a database of Roman amphitheaters that is based on the GeoJSON variant of JSON, both of which formats are briefly defined and explained by example. It is stressed that the code sharing site GitHub can map the spatial information in GeoJSON files by default. Next, a series of iPython notebooks - all of which can be run interactively or downloaded for further developemnt - show the implementation of a lightweight interface for exploring amphitheater seating capacity. In conclusion, the paper emphasizes that using existing tools can make it easier to maintain focus on the intellectual content of a dataset.

Library of Congress Subjects: Amphitheaters--Italy--Statistics; Linked data--Use studies.

Introduction

The main goal of this paper is to show that a selection of the standards, methods, and tools that fall under the rubric of Linked Open Data (LOD) can be the basis for creating flexible representations, as well as interactive presentations, of small datasets. As will become clear, by 'representation' I mean the specific instance of a file that conforms to particular standards and is therefore reusable in multiple computational contexts. By 'presentation' I mean the transformation of that file into more human-readable results, such as visualizations or interactive web-pages. The specific use-case is a dataset providing brief information about Roman amphitheaters. There are approximately 260 of these structures, which occur throughout Roman territory, even if unevenly distributed.¹ All were built between the early first century BCE and the early fourth century CE, though most Roman amphitheaters are first or second century CE in date. These aspects of the data - a relatively small number of entities that show spatial and chronological variability within the set - make for an interesting test-case of the use of LOD methods and tools. They also allow the discussion here to be published in conjunction with all the associated data and with brief scripts that many readers will be able to run themselves, either after downloading or in a cloud-based environment.² There are links to the latter in the text that follows.⬈#p1

The discussion that follows will move from an overview of the specifics of an LOD-informed representation of the phenomenon of Roman amphitheaters, then to querying that data using the SPARQL query language, and finally to a limited implementation of a graphical and interactive user interface. My intent is that this interface is useful as a repeatable and reusable example of working code. LOD influences all parts of what follows, though more general tools will come into play, including the Python programming language and interactive IPython notebooks. These additions mean that there is no attempt to be "pure" or "strict" LOD. Discussion of actual practice will always be to the fore, and that practice will also suggest a path for using data in historical investigation, even if that is not the primary focus here. Although it is a gross generalization to say that computers only work with 1's and 0's and humans work with ideas, working to bridge the gap between those two perspectives remains a topic of discussion within the wider field of "Digital Humanities."³ By the end of this paper, a set of tools and data will have been assembled that offer an additional starting point in this ongoing effort.⬈#p2

There are other introductory topics to address early on. Firstly, "Roman amphitheaters" here usually means fully-enclosed, quite large, at least partially stone, oval, public structures used primarily for the staging of gladiatorial combats, fights involving animals, and public executions.⁴ These activities made them an important setting for social and political interaction in the Imperial period.⁵ Amphitheaters are distinct from theaters, which are generally half-round and primarily used for dramatic events. Even the succinctly stated criteria used here highlight that there are borderline cases, including the so-called Gallo-Roman amphitheaters that have seating only partially enclosing an oval arena. Those are included in the dataset, though it would be easy to exclude them from any analyses that would be improved by doing so. There is also dynamism in the number of amphitheaters in use at any one time. The form, or at least permanent stone versions of it, likely originated in southern Italy in the early first century BCE.⁶ Initial spread was slow, and then from the mid-first century CE to the mid-second century many were built. As new amphitheaters appeared, older ones went out of use. A compelling pairing of growth and loss is the destruction of the amphitheater at Pompeii in 79 CE, an event that buried 20,000 seats in ash, and the opening shortly thereafter of the Flavian Amphitheater in Rome, the so-called Colosseum, which is in use by 80 CE. With 50,000 seats, the Flavian Amphitheater was comparatively huge. Many Roman amphitheaters fell within the range of 10,000 to 25,000 seats. An interface for exploring amphitheater capacities that is built using open data and open tools appears towards the end of this paper (Fig. 14).⬈#p3

Another topic to consider is this paper's audience. I do not mean what follows as a ground-up introduction to using JSON, JSON-LD, GeoJSON, SPARQL, and iPython notebooks to publish data about the Roman Empire. I do offer brief definitions of those terms, but readers with no experience in the Linked Open Data digital ecosystem might not be satisfied with this discussion as an entry point to the topic. Nonetheless, I will stress throughout that representing data using well-documented file formats and then manipulating that data with open-source tools allows the focus to be on the intellectual content of a dataset and on how it can be queried and the results displayed. I will show "out of the box" functionality inherent in file formats, with mapping being the most visually compelling example. The combined application of all the third-party tools that I will use is tantamount to a test of whether or not I have usefully represented the phenomenon of Roman amphitheaters. To the extent readers think the answer is "yes," this paper is one more contribution that keeps the dialog between standards-compliance and the needs of individual research efforts at the center of discussion of the role of digital tools in Humanistic research.⁷⬈#p4

Representing the Data: JSON and GeoJSON

As of this writing, the dataset under discussion here is available in a GitHub repository published under a Public Domain dedication, meaning that it meets the expectations implied by the 'O' in LOD.⁸ While the current author is the main contributor, and is certainly responsible for any shortcomings and incompleteness, the commit history shows that early data collection was a shared effort. Versions of this repository are also published via Zenodo.org, which means there is a DOI for the collected resource.⁹⬈#p5

The main data appears in the file 'roman-amphitheaters.geojson'. By the end of this section, it will be clear that this file contains both structured data about each amphitheater - such as dimensions, an indication of chronology, and capacity - and spatial data in the form of a point giving the center - accurate to meters when possible - of the arena. After exploring a few specifics of this representation, I will show that the data can be queried using the SPARQL query language that works with simple statements known as 'triples'. But before that, a direct look at the serialization - that is the sequence of characters that allow both humans and computers to recognize the information content of a file - will be useful.⬈#p6

Some unpacking of file extensions and names of formats is necessary,. The '.geojson' extension means the information in 'roman-amphitheaters.geojson' is represented using the JSON format as a starting point, with additional compliance to the GeoJSON standard for recording spatial data. For its part, JSON records information as "key-value pairs".¹⁰ An example of four key-value pairs adapted from the Roman amphitheater data is:⬈#p7


	 {
	   "id": "romeFlavianAmphitheater",
	   "title": "Flavian Amphitheater at Rome",
	   "chronogroup": "flavian",
	   "pleiades": "https://pleiades.stoa.org/places/423025"
	 }

Fig. 1: Simplified JSON Snippet.

As an isolated snippet of JSON, the above is quite readable, which is one advantage of the format. To the left of each ":" is a 'key', and to the right is the associated 'value'; these are surrounded by curly brackets, with the implication being that the key-value pairs describe a single entity. The information in Fig. 1 can be rephrased as "There is an amphitheater with the unique ID 'romeFlavianAmphitheater'; it has the more human-readable title 'Flavian Amphitheater at Rome'; it was built in the Flavian period; and - by the way - it's useful to associate this record with the Pleiades URI 'https://pleiades.stoa.org/places/423025'." At the end of that long sentence I am being somewhat wordy, particularly in comparison to the JSON itself. That is because, like many databases, this specific serialization obscures the nature of the connection being made between a vocabulary and the values indicated. In this case, there is a reference to Pleiades, which describes itself as a "gazetteer of past places."¹¹ Visiting the web address in the JSON snippet displays a page that has the title "Roma" and a further description reading "The capital of the Roman Republic and Empire." As used above, then, the link to Pleiades is imprecise. It is not suggesting a narrow equivalence as it is clear that the scope of the Pleiades identifier is far broader than the individual record in the amphitheater dataset. This use is instead an invocation of a well-recognized general identifier within a specific, even idiosyncratic, dataset. This is good Linked Open Data practice, and as will be seen below, one that comes with a good return on effort when this data is made available on the internet in an interactive setting.⬈#p8

Pleiades, however, does have an identifier for the Flavian Amphitheater itself (https://pleiades.stoa.org/places/285857974) and it will be useful to include that in the amphitheater data. This is easy to do, as shown by the following expanded JSON snippet that adds a key for 'pleiadesspecific' (Fig. 2):⬈#p9


	 {
	   "id": "romeFlavianAmphitheater",
	   "title": "Flavian Amphitheater at Rome",
	   "chronogroup": "flavian",
	   "pleiades": "https://pleiades.stoa.org/places/423025" ,
	   "pleiadesspecific": "https://pleiades.stoa.org/places/285857974"
	 }

Fig. 2: Slightly expanded JSON snippet.

This snippet still remains readable. But it also allows me to introduce an important aspect of using JSON to represent structured data: when information is not known: there is no need to have a blank field. This can be seen by browsing the roman-amphitheaters.geojson file itself; many entries do not have a 'pleiadesspecific' key, either because there is no relevant identifier in Pleiades or because it has not yet been entered. Further looking inside that file will find a number of 'fields' that are not present for every entry. These range from expected fields that are sometimes not available for poorly preserved structures, such as maximum length (see 'exteriormajor'), to more specialized aspects of amphitheater studies such as the presence or absence of below-ground tunnels in the arena (look for the key 'hypogeum').⬈#p10

Direct inspection of the data on Github will certainly reveal that the snippets appearing above are very simplified. The file itself has more structure. This is in part because, as noted, it conforms to the GeoJSON variant of JSON, which here supports directly recording the approximate centerpoints of amphitheaters. A still simplified snippet that indicates how these points appear in the data appears in Fig. 3:⬈#p11


	{
	 "type": "Feature",
	 "id": "romeFlavianAmphitheater",
	 "properties": {
	   "title": "Flavian Amphitheater at Rome",
	   "chronogroup": "flavian",
	   "pleiades": "https://pleiades.stoa.org/places/423025" ,
	   "pleiadesspecific": "https://pleiades.stoa.org/places/285857974"
	},
	 "geometry": {
	    "type": "Point",
	    "coordinates":[
	     12.492269,
	     41.890169,
	     22
	    ]
	  }
	}

Fig. 3: GeoJSON snippet.

GeoJSON is a formally published Internet Engineering Task Force (IETF) proposal, giving it the effective status as a useable standard.¹² Although GeoJSON does impose requirements on how information is represented, it remains quite readable. The above snippet builds on the brief information about the Flavian Amphitheater already introduced, but places all but the ID in a 'properties' block. There is also a 'geometries' block, which in this case defines a point in three dimensional space at longitude 12.492269, latitude 41.890169, and elevation of 22 meters. Again, this specific representation - one that establishes the identity of the Flavian Amphitheater at Rome, gives very brief descriptive informations, and indicates the central point of the structure - has this precise form because it is valid GeoJSON. This conformance to a standard means that readers can copy-and-paste the text into a tool that renders GeoJSON as a map. At the time of this writing, the sites geojsonlint.org and geojson.io work well. Fig. 4 shows the GeoJSON snippet rendered by GeoJSONLint.com.⬈#p12

Fig 4. GeoJSON representing the location of and brief information about the Flavian Colosseum in Rome displayed by the site http://geojsonlint.com alongside a map.#figure04⬈

Fig. 5. GitHub map of https://github.com/sfsheath/roman-amphitheaters/blob/master/roman-amphitheaters.geojson.#figure05⬈

The complete file lists more amphitheaters and for each one gives more information than in the snippets above. To visually present the dataset's full scope, Fig. 5 shows that GitHub, the website that hosts the data, defaults to displaying geojson files as a map, in this instance as a set of points. By default, the basemap is modern, but it will be apparent to many readers that amphitheaters are spread around the Roman empire, though with more in the Central Mediterranean and western provinces. And when one is on the live version of that page, clicking on any of the markers will show more information about the amphitheater it represents. Additionally, downloading the file will allow it to be opened directly in desktop GIS software such as QGIS, though exploring that avenue lies beyond the scope of this paper.¹³⬈#p13

Representing the Data: JSON-LD

The previous section showed that the GeoJSON variant of JSON can be used to represent both descriptive information about amphitheaters and their locations. Except for the licensing and the use of URLs from Pleiades, it did not speak directly to the topic of Linked Open Data to the extent LOD is a set of specific practices. This section will. Before doing so, however, some preliminaries again need to be in place.⬈#p14

Writing generally and echoing the other contributions to this collection, Linked Open Data can be considered a set of best practices that encourages the sharing of open-licensed data in formats that computers can read as well as usefully manipulate and render. That GitHub and GeoJSONLint can display the spatial information in the amphitheater data is a specific indication that the "read and usefully manipulate" aspect of LOD is being satisfied in this case. Digging deeper into the set of practices that make up LOD, a preferred representation of information within LOD relies on a concept known as the "triple". A triple, in turn, is a three part statement that has a subject, which is the entity being described, a predicate, which is the type of information, and an object, which is the value being recorded. Subject, predicate, and object are technical terms, though ones that are easily identified in simple natural-language sentences. For example, "The Flavian Colosseum in Rome has a seating capacity of fifty thousand." can be understood as a triple having the subject "The Flavian Colosseum at Rome," the predicate "seating capacity," and the object "fifty thousand".⬈#p15

Many forms of digitized information, particularly those which are at all understandable as databases, can be conceived of as triples. For example, a spreadsheet has rows and columns. It is a commonly seen convention to put column names in the first row at the top of a sheet and an identifier in the first column at the left. These are the equivalent of predicate and subject respectively. In such a spreadsheet, individual cells hold values at the intersection of rows and columns, with the values in those cells being the equivalent of objects. Triples can also be recognized in database display and entry screens or in their web-based equivalents. These interfaces will show an indication - often a unique id - of the entity being described by all the information displayed on a screen or webpage; this is the subject. Field names are predicates; and the values in those fields are objects. Subject-predicate-object is, therefore, a fundamental data structure identifiable in many contexts. This article has as a particular concern the identification of triples in JSON and GeoJSON files and also making them computationally actionable using LOD tools.⬈#p16

Referring back to Fig. 3 - the GeoJSON snippet indicating the location of the Flavian Amphitheater - one can find many triples in that brief example. To use the terminology of the code itself, all these implicit triples have the subject 'romeFlavianAmphitheater'. Predicates include 'title', which has the object "Flavian Amphitheater at Rome." The task at hand, then, is to turn these implicit triples that can be intuitively recognized by humans into explicit ones that can be manipulated by LOD-aware software tools. Just as conforming to the GeoJSON standard allowed for location data to be recorded in a way that was widely actionable, the JSON-LD standard, where "LD" is short for "Linked Data," allows a JSON file to indicate how triples can be found.¹⁴⬈#p17

Returning to showing examples, Fig. 6 is another simplified snippet, this time of JSON-LD.⬈#p18


	{
	 "@context": {
	  "id": "@id",
	  "@vocab": "http://purl.org/roman-amphitheaters/vocab/",
	  "@base": "http://purl.org/roman-amphitheaters/id/",
	  "dcterms": "http://purl.org/dc/terms/",
	  "title": {"@id": "dcterms:title"},
	  "chronogroup":{"@type":"@id"}
	 },
	 "id": "romeFlavianAmphitheater",
	 "title": "Flavian Amphitheater at Rome",
	 "chronogroup": "flavian",
	 "pleiades": "https://pleiades.stoa.org/places/423025",
	 "pleiadesspecific": "https://pleiades.stoa.org/places/285857974"
	}

Fig. 6. Brief amphitheater data with a JSON-LD context.

The only change to the snippet in Fig. 2 is the addition of an '@context' block. That can be thought of as a set of instructions to LOD-aware processors as to how to recognize triples in a JSON file. In it, the line reading '"id": "@id",' indicates that the 'id' key in the JSON sets the subject of the triple. The line reading '"title": {"@id": "dcterms:title"}' indicates that the titles in the JSON can be understood as Dublin Core Titles. For its part, the Dublin Core is a widely deployed vocabulary that many tools can recognize.¹⁵ The line beginning '@vocab' means that any JSON keys not specifically assigned to well-known vocabularies should be considered part of a set of terms identified by the URL "http://purl.org/roman-amphitheaters/vocab/".¹⁶⬈#p19

Rather than pile on more explanation or add more complexity to the snippet in Fig. 6, I want to show that adding the "@context" block worked. That is, it had a computationally actionable effect. Just as with the example of showing that GitHub can render the amphitheater GeoJSON as a map, Fig. 7 indicates that there are tools that can recognize and display the triples in the above JSON. Specifically, the site https://json-ld.org/ includes a "JSON-LD Playground." Readers can paste in JSON-LD here and confirm that the playground is able to recognize the embedded triples. The Playground's "Table" view of the data is selected in Fig. 7 because it uses the Subject, Predicate, Object terminology introduced above (the headings of the columns in the lower part of the page). It can be seen that equating the 'id' JSON key with the "@id" as specificed by JSON-LD produced the intended result: all the triples have as their subject "http://purl.org/roman-amphitheaters/id/romeFlavianAmphitheater."⬈#p20

Fig. 7: Using the JSON-LD Playground to display triples in simplified form.#figure07⬈

Querying the Data: SPARQL and Python

Confirming that triples are automatically identifiable in a small snippet of JSON is useful as a demonstration. As with the discussion of GeoJSON, it is again the case that the file roman-amphitheaters.geojson is a more complete example of adding an '@context' block to a JSON file so that a wide range of triples are recognizable. There is, however, no display of all the triples that is as visually compelling as displaying all the locations as a map in GitHub. Instead, I will bring in another fundamental tool of Linked Open Data: the SPARQL query language.¹⁷ Doing so will be a turning point in this paper. Up to this point, I've kept to examples that I hope most readers can easily repeat: following the link to the GitHub page for the file roman-amphitheaters.geojson will automatically display it as a map; cutting-and-pasting into geojsonlint.com or the playground at jsonld.org has immediately observable effect. There is not an equivalent website for querying JSON-LD files using SPARQL. I will instead use the programming language Python to show that open-source tools can read and query JSON-LD.¹⁸ My goal is that a reader with intermediate Python skills, including the ability to run interactive Jupyter/iPython notebooks, can repeat the steps I show.¹⁹ I have also linked to a cloud-based tool, Binder, which should mean that many readers, even those with no Python experience, can run the code.²⁰⬈#p21

SPARQL is a query language that searches for patterns of triples. Cutting right to the chase, Fig. 8 is a brief example.⬈#p22


	PREFIX ramphs: <http://purl.org/roman-amphitheaters/id/>
	SELECT ?p ?o WHERE {
	  ramphs:romeFlavianAmphitheater ?p ?o
	 }

Fig. 8: SPARQL example. The PREFIX line allows the full URL to be shortened using the abbreviation "ramphs:".

In this query, the pattern can be worded as "all triples that begin with ramphs:romeFlavianAmphitheater", with "ramphs:" being a shorthand for the full URL. '?p' is a placeholder for the predicates that will match; '?o' is placeholder for the objects. When applied to the snippet in Fig. 6, it will produce results that are analogous to those seen in the JSON-LD playground.⬈#p23

The parts of a triple that will be "filled in" during the query can be switched. In Fig. 9, the pattern being looked for can be phrased using plain language as "All amphitheaters assigned to the chronological group 'Flavian'." This is because the predicate and object have fixed values and the subject of triples that match the pattern they imply will be returned.⬈#p24


	PREFIX ramphs: <http://purl.org/roman-amphitheaters/id/>
	PREFIX ramphsv: <http://purl.org/roman-amphitheaters/vocab/>
	SELECT ?s WHERE {  
	  ?s ramphsv:chronogroup ramphs:flavian
	 }

Fig. 9: Simplified SPARQL example that finds all amphitheaters specifically assigned to the Flavian period.

In theory, applying the query in Fig. 9 to the Roman amphitheaters data will list all amphitheaters specifically assigned to the Flavian period, with the so-called Colosseum being among those. But this is a practice-oriented paper within a practice-oriented collection so putting readers in position to make this query actually work is a goal. This will require a slightly different query and a working tool to run it. Writing as I do in mid-2018, using the Python programming language code and sharing it as an interactive notebook is a good way forward.⬈#p25


	PREFIX ramphs: <http://purl.org/roman-amphitheaters/resource/>
	PREFIX gj: <https://purl.org/geojson/vocab#>
	PREFIX ramphs-p: <http://purl.org/roman-amphitheaters/properties#>
	PREFIX dcterms: <http://purl.org/dc/terms/>
	SELECT ?title WHERE {

	?s gj:properties/ramphs-p:chronogroup ramphs:flavian .
	?s gj:properties/dcterms:title ?title
	}

Fig. 10. SPARQL query to find 'flavian' amphitheaters in the roman-amphitheaters.geojson file.

The exact query that will allow 'flavian' amphitheaters to be discovered within the roman-amphitheaters.geojson file requires one major change from the query shown in Fig. 9. Because GeoJSON places descriptive data about the entities it describes into a 'properties' block, that structure needs to be taken into account in the query. Fig. 10 is a SPARQL query that does just that. I have included it directly in the text here so that readers can more easily test it in an environment that they create or adapt. Fig. 11 shows this query within the context of an IPython notebook.⬈#p26

Fig 11. IPython notebook implementing SPARQL query to find '`flavian`' amphitheaters.#figure11⬈

At the time of this writing, a cloud-based version of the notebook using the free website Binder is online at https://mybinder.org/v2/gh/sfsheath/heath-lod-cookbook/master?filepath=jsonld-sparql.ipynb. For readers not familiar with Binder, it allows IPython notebooks to be run in a cloud-based environment that requires no installation of software by end-users. Clicking-through on the link should be sufficient to run the notebook in a modern browser. While a full introduction to IPython notebooks is beyond the scope of this article, readers with no familiarity can find many tutorials online. And at a minimum, choosing "Run All" from the "Cell" menu near the top of the page will cause a list of "Flavian" amphitheaters to appear below the third "cell" of code.⬈#p27

Readers who have more comfort with IPython notebooks - and by implication with at least simple Python programming - can make edits when running this notebook in Binder or locally after download. Simple changes that will have an immediate effect are to replace 'ramphs:flavian' with 'ramphs:republican' or with 'ramphs:hadrianic.' Either edit will cause the relevant amphitheaters to be listed when the third cell is run.⬈#p28

Being able to query the roman-amphitheaters.geojson file directly using SPARQL within the context of an IPython notebook provides great flexibility, far more than can be fully discussed here. I offer two further examples, one which creates a simple map using the Folium Python modue, and one which creates a simple interactive data visualization using the Pandas Python module and embedded user settable widgets.²¹ For each I will show the SPARQL query that pulls information from the amphitheaters data as well as a screenshot that shows the query in the context of an IPython notebook. And again I link to a version of the notebook running in Binder, though it is likely the case that at some point in the medium term that exact setup will stop working.⬈#p29

Fig. 12: IPython notebook rendering the results of SPARQL query as a map using the '`folium`' Python module.#figure12⬈


	PREFIX gj: <https://purl.org/geojson/vocab#>
	PREFIX ramphs: <http://purl.org/roman-amphitheaters/resource/>
	PREFIX ramphs-p: <http://purl.org/roman-amphitheaters/properties#>
	PREFIX dcterms: <http://purl.org/dc/terms/>
	SELECT * WHERE {

	?s gj:properties/ramphs-p:chronogroup ramphs:flavian .
	?s gj:properties/dcterms:title ?title .

	?s gj:geometry/gj:coordinates/rdf:first ?long ;
	   gj:geometry/gj:coordinates/rdf:rest/rdf:first ?lat .
	}

Fig. 13: SPARQL to find latitude and longitude of Flavian amphitheaters.

Figs. 12 and 13 show the IPython notebook that renders maps from SPARQL query results and the core SPARQL itself that can be copied into other environments. To run the notebook interactively, either download it or run it Binder via the link https://mybinder.org/v2/gh/sfsheath/heath-lod-cookbook/master?filepath=jsonld-folium.ipynb. The SPARQL query has straightforward aspects. The lines that read in part 'gj:geometry/gj:coordinates' accommodate the fact that GeoJSON uses a 'geometries' construct that is similar to the 'properties' construct discussed above. The last elements of those lines deal with the fact that JSONLD produces a complex structure for JSON arrays. While the details go beyond the scope of this paper, the point that SPARQL can query this data remains true and relevant.²² Again, simple changes can be made to this query, with mapping 'republican' amphitheaters a possibility.⬈#p30

Fig. 14: Interactive "widget" embedded within an IPython notebook.#figure14⬈


	PREFIX ramphs: <http://purl.org/roman-amphitheaters/resource/>
	PREFIX gj: <https://purl.org/geojson/vocab#>
	PREFIX ramphs-p: <http://purl.org/roman-amphitheaters/properties#>
	PREFIX dcterms: <http://purl.org/dc/terms/>
	SELECT ?title ?capacity ?pleiades WHERE {

	?s gj:properties/dcterms:title ?title ;
	   gj:properties/ramphs-p:capacity/ramphs-p:quantity ?capacity ;
	   gj:properties/ramphs-p:pleiades ?pleiades
	   .
	FILTER ((?capacity > 3000))
	FILTER (?capacity < 55000)

Fig. 15: SPARQL with addition of 'FILTER' statements to set upper and lower of capacity.

Figs. 14 and 15 are a similar pairing for a small interactive interface for listing amphitheaters by seating capacity and also showing a histogram of the selection.²³ The cloud-based interactive version is at https://mybinder.org/v2/gh/sfsheath/heath-lod-cookbook/master?filepath=jsonld-widgets.ipynb.⬈#p31

Because the code is longer than that in the other notebooks, the screenshot (Fig. 14) shows only the interactive widgets and the resulting output that is displayed when those are changed. The SPARQL query (Fig. 15) uses two FILTER statements to narrow the results. In the IPython notebook the values - here hard coded as 3000 and 55000 - are replaced by one that the user sets with the "Upper limit" and "Lower limit" sliders. While none of the programming here rises to the level of being 'advanced,' it may still be the case that it is most useful for Python programmers familiar with concepts such as defining functions and setting them to handle events generated by users interacting with the interface. I have tried to keep the code relatively simple so that the implementation is straightforward, readable, and ready for adaptation in other contexts.⬈#p32

Looking at the output of this notebook, the initial settings mean that all amphitheaters are included in the histogram, which in turn indicates that the Flavian Amphitheater is very much an outlier in terms of capacity, so much so that it is represented by the single isolated bar at the right. It is clear from this visualization that the bulk of Roman amphitheaters had fewer than 20,000 seats. In an effort to add context to the graphic display, the table above the histogram allows the capacity values to be directly seen. Clicking on the Pleiades URI will open the appropriate page on that site, with the distinct advantage for this application being that a map is displayed. That is a light-weight, easily obtained benefit of using stable identifiers published on the internet in accordance with Linked Open Data principles. The combination of interactivity and linking can support preliminary exploration of this aspect of amphitheaters as they appear in the Roman Empire. Again, the predominant take-away can be that medium size structures are usual and that the example in the imperial capital is an actually an extreme outlier. This is not a new conclusion so the advantage here is allowing users to directly explore the data for themselves. And it is of course the case that new interfaces can be created to allow exploration of other aspects of the data and to implement links to additional public resources. In that regard, this paper has suggested a patterns of implementation and usage as much as it has explored a specific example.⬈#p33

Conclusions

Although showing practical use of well-documented formats and open-source tools has been the main goal of the above discussion, there is a larger point. Even within the constraints implied by compliance, using existing standards and tools means that the focus can remain on the specific needs of any one research endeavor. I stress that in the example code above, steps such as loading the data or including the ability to search triples required very few lines of code. The most expressive sections are the SPARQL queries. That is where I am directly engaging with my own data. To put it another way, the ability to express queries of my data using a single well-documented and fully-supported standard is the payoff for choosing JSON-LD and GeoJSON as my underlying formats. And because I can implement SPARQL queries within IPython notebooks, that environment's ability to support quick creation of interfaces and visual outputs is a follow-on and substantial benefit to using standards and tools created by others. I can represent the data, I can present it, and I can share my work on the public Internet.⬈#p34

I will repeat the list of standards and tools I've used to set up the last point: readers can pick and choose what of the above is useful for them. GeoJSON + JSON is powerful combination for representing data. SPARQL is a powerful tool for querying that data but also for querying any triple-based dataset. IPython widgets and Pandas visualization can work with any data that comes from similar workflows, and with many more. At no point are there proprietary dependencies, which is a pattern of practice and usage that is useful and rewarding to adopt in Linked Open Data and beyond.⬈#p35

References

Bond et al. 2018: S. Bond, H. Long, and T. Underwood. 2018. “‘Digital’ Is Not the Opposite of ‘Humanities’,” The Chronicle of Higher Education (Nov. 1, 2018). https://www.chronicle.com/article/Digital-Is-Not-the/241634.⬈#reference-1

Dodge 2009: H. Dodge. 2009. “Amphitheatres in the Roman East.” In T. Wilmott (ed.), Roman Amphitheatres and Spectacula: a 21st-Century Perspective, 29-45. Oxford: Archaeopress.⬈#reference-2

Dodge 2014: H. Dodge. 2014. “Building for an Audience: The Architecture of Roman Spectacle.” In R. Ulrich and C. Quenemoen (eds.), A Companion to Roman Architecture, 281-298. Chichester: Wiley Blackwell.⬈#reference-3

Evens 2012: A. Evens. 2012. “Web 2.0 and the Ontology of the Digital,” Digital Humanities Quarterly 6.2. http://www.digitalhumanities.org/dhqdev/vol/6/2/000120/000120.html.⬈#reference-4

Fagan 2011: G. Fagan. 2011. The Lure of the Arena. Cambridge: Cambridge University Press.⬈#reference-5

Fagan 2016: G. Fagan. 2016. “Manipulating Space at the Roman Arena.” In W. Riess and G. Fagan (eds.), The Topography of Violence in the Greco-Roman World, 349-379. Ann Arbor: University of Michigan Press.⬈#reference-6

Laurence et al. 2011: R. Laurence, S. Cleary, and G. Sears. 2011. The City in the Roman West: c. 250 BC - c. AD 250. Cambridge: Cambridge University Press.⬈#reference-7

Marwick 2016: Marwick, B. 2016. “Computational Reproducibility in Archaeological Research: Basic Principles and a Case Study of Their Implementation,” Journal of Archaeological Method and Theory 24(1), 424–50.⬈#reference-8

Tuck 2007: S. Tuck. 2007. “Spectacle and Ideology in the Relief Decorations of the Anfiteatro Campano at Capua,” Journal of Roman Archaeology 20, 255-272.⬈#reference-9

Welch 2007: K. Welch. 2007. The Roman Amphitheatre: From its Origin to the Colosseum. Cambridge: Cambridge University Press.⬈#reference-10

Notes

¹ Dodge 2009, Laurence et al. 2011.⬈#footnote-1

² Marwick 2016.⬈#footnote-2

³ Bond et al 2018.⬈#footnote-3

⁴ Dodge 2014: 281.⬈#footnote-4

⁵ Fagan 2011, Fagan 2014, Tuck 2007.⬈#footnote-5

⁶ Welch 2007.⬈#footnote-6

⁷ Evens 2012.⬈#footnote-7

⁸ See http://github.com/sfsheath/roman-amphitheaters, specifically LICENSE.txt.⬈#footnote-8

⁹ The URL version of the DOI is http://doi.org/10.5281/zenodo.596149.⬈#footnote-9

¹⁰ https://www.json.org/.⬈#footnote-10

¹¹ https://pleiades.stoa.org.⬈#footnote-11

¹² https://www.rfc-editor.org/info/rfc7946.⬈#footnote-12

¹³ http://qgis.org/.⬈#footnote-13

¹⁴ https://www.w3.org/TR/json-ld/ and https://json-ld.org/.⬈#footnote-14

¹⁵ http://dublincore.org/.⬈#footnote-15

¹⁶ Over the course of publication of this article the purl.org infrastructure for permanent URLs changed so that the links here do not lead to individual items. All the processes described here use a locally cached version of the data, meaning that effect of this change is very limited.⬈#footnote-16

¹⁷ https://www.w3.org/TR/sparql11-overview/.⬈#footnote-17

¹⁸ RDFLib, https://github.com/RDFLib, implements the SPARQL searches.⬈#footnote-18

¹⁹ http://jupyter.org and https://ipython.org, with the former likely being the better starting point.⬈#footnote-19

²⁰ https://mybinder.org and https://mybinder.readthedocs.io/. Over the course of writing and editing this piece, Google launched and is now promoting "Google Colab," which is another cloud-based approach to running Python code. See https://colab.research.google.com/. Preliminary tests indicate the notebooks will run with minimal change. It is necessary to include "!pip install rdflib" before importing that module.⬈#footnote-20

²¹ https://github.com/python-visualization/folium and https://pandas.pydata.org/.⬈#footnote-21

²² I can offer some more explanation: Geojson specifies that coordinates are represented as a JSON array. RDF can represent an array-like structure by linking one resource to the next resource in a linke list via the rdf:rest predicate. The first resource is pointed to using the 'rdf:first' predicate. In the useage, 'rdf:rest' has a plain-language meaning akin to, "This is the resource that begins the rest of the list." It is effectively a 'next' predicate by a different name. There is widespread dissatisfaction with the complexity this imposes on the representation of a basic data structure. The SPARQL in Fig. 13 moves from the gj:coordinates predicate to the first element of the list of the coordinates, it then repeats that path with further querying of the resource that is next in the sequence.⬈#footnote-22

²² https://github.com/jupyter-widgets/ipywidgets and https://ipywidgets.readthedocs.io/en/stable/.⬈#footnote-22