Retrofitting an Existing API with JSON-LD
In my last blog post, I provided an introduction to the JSON-LD serialization, its flavors and their strengths and weaknesses. In this blog post, we'll talk about how JSON-LD can be used to retrofit an existing JSON API into linked data.
Crucial Elements to Creating JSON-LD
I pointed out earlier in this post that JSON-LD can be created in such a way that it can be consumed like regular JSON. So what if you have an existing API that returns JSON? Can its output be extended to produce JSON-LD? Yes, it can. In fact, making JSON into JSON-LD often can be done with small changes that leave your existing JSON backward compatible. How is that possible? There are three key properties for turning JSON into JSON-LD: @context, @type and @id.
@context
The @context JSON property will allow you to take the existing elements in your JSON document and map them to known vocabularies, such as Schema.org, without actually changing the element names.
@type
The @type JSON property is an RDF type or types that you want to use for the data you’re providing. Ideally, you would add this element to your JSON output as well so that you can get the best semantic mapping. However, it is also possible use the @context to set this if you have an existing JSON field that indicates what the type is.
@id
The @id JSON property is a unique identifier for a given resource within your data. As with the @context property, it is best to use this with a property with a unique URI in it. Alternatively, if you have a property with a unique ID, you can create a unique URI. Even if you don't have a unique ID, all is not lost. Linked data libraries will interpret the URL you are retrieving the data from as the @id property of the resource. However, having a unique URI is the best option.
Example: JSON-LD for the WorldCat Knowledge Base API
For purely demonstration purposes, I'm going to use the output of one of OCLC’s existing APIs to show you how this works. Below is an example of the JSON returned for an entry request to the WorldCat knowledge base API.
{ "title": "A|Z ITU Journal of the Faculty of Architecture", "id": "http://worldcat.org/webservices/kb/rest/entries/DOAJ.Records,9230622", "links": [ { "href": "http://worldcat.org/webservices/kb/rest/entries/DOAJ.Records,9230622.html", "rel": "alternate" }, { "href": "http://worldcat.org/webservices/kb/rest/entries/DOAJ.Records,9230622", "rel": "self" }, { "href": "http://www.az.itu.edu.tr", "rel": "via" }, { "href": "http://www.az.itu.edu.tr", "rel": "canonical" } ], "kb:entry_uid": "9230622", "kb:entry_status": "raw", "kb:collection_uid": "DOAJ.Records", "kb:collection_name": "Directory of Open Access Journals (All titles)", "kb:provider_uid": "DOAJ", "kb:provider_name": "Directory of Open Access Journals", "kb:oclcnum": "793811170", "kb:issn": "1303-7005", "kb:publisher": "Istanbul Technical University", "kb:coverage": "fulltext@2007", "kb:coverage_enum": "fulltext", "kb:coll_type": "openaccess browsable" }
The first step in transforming this JSON into valid linked data is to create an @context property in the JSON. In the @context, I need to define prefixes for the vocabularies I'm going to use. In this case, I'm using Schema.org. And I need to map a property in my JSON to the @id property.
Then, I need to map a property in my JSON to the @type property. This turns out to be a challenge for this JSON. You need to pick a property that represents the type of the resource you’re providing. Unfortunately, there isn’t a property in the WorldCat knowledge base API JSON that I can use for this. We'll come back to this issue a bit later to map the value that appears in a JSON property to an RDF type.
Lastly, I need to map all the properties in my JSON that I want to appear in the graph to corresponding properties in Schema.org. You don't have to map every property. Anything you don't map won't be included in the graph.
The JSON now looks like this.
{ "@context": { "schema": "http://schema.org/", "id": "@id", "title": "schema:name", "kb:issn": "schema:issn", "kb:publisher": "schema:producer", "kb:provider_name": "schema:provider", "kb:collection_name": "schema:isPartOf", "kb:oclcnum": "schema:sameAs" }, "title": "A|Z ITU Journal of the Faculty of Architecture", "id": "http://worldcat.org/webservices/kb/rest/entries/DOAJ.Records,9230622", "links": [ { "href": "http://worldcat.org/webservices/kb/rest/entries/DOAJ.Records,9230622.html", "rel": "alternate" }, { "href": "http://worldcat.org/webservices/kb/rest/entries/DOAJ.Records,9230622", "rel": "self" }, { "href": "http://www.az.itu.edu.tr", "rel": "via" }, { "href": "http://www.az.itu.edu.tr", "rel": "canonical" } ], "kb:entry_uid": "9230622", "kb:entry_status": "raw", "kb:collection_uid": "DOAJ.Records", "kb:collection_name": "Directory of Open Access Journals (All titles)", "kb:provider_uid": "DOAJ", "kb:provider_name": "Directory of Open Access Journals", "kb:oclcnum": "793811170", "kb:issn": "1303-7005", "kb:publisher": "Istanbul Technical University", "kb:coverage": "fulltext@2007", "kb:coverage_enum": "fulltext", "kb:coll_type": "openaccess browsable" }
Improving the Semantics of the Output and Added Links
Technically, the JSON shown above is valid linked data, but it isn't as semantically rich as it could be. This is because applying an @context to an existing JSON document cannot add data that isn't present in the original JSON document nor can it change the structure of the JSON to add objects that are not present. Since I don't have a field I can map to @type, I don't have a type. This is a pretty significant issue that I need to deal with. So, the first change I make is to add an @type property and to set it to “schema:Periodical.” This makes the main resource a “schema:Periodical.” There are other properties that I want to add to the document as well to improve the data quality. These are “schema:url” and “schema:startDate.” The “schema:url” property is being used for the actual URL for the content. The “schema:startDate” property is being used for the starting date for this periodical’s coverage. Now that I have dealt with the issue of missing properties, I have to handle the fact in that some places I need to build relationships between two resources.
According to Schema.org, the properties “schema:isPartOf,” “schema:provider” and “schema:sameAs” shouldn't point to strings, they should point to another resource, hopefully with a URI. So, I need a URI for the collection, the provider and the bibliographic record. Luckily for me, the data contains an ID for these resources, and they all use a standard URI pattern. So, I can create a URI for them easily.
But what if I had a string? How would I get a URI? Well, it depends on what type of resource the string is for. Is it for an author, organization or place that is controlled via an authority record? Well, then one could use VIAF to search for the resource and retrieve a URI. Is the string for a FAST subject heading? FAST is also searchable and has URIs. Is the string the name of the library? The WorldCat registry is searchable and has URIs for libraries and cultural institutions. Is the string an OCLC Number? That is actually fairly easy, since every bibliographic record in WorldCat has a URI based on the syntax http:/worldcat.org/oclc/{oclcNumber}. You can see, depending on the data you have, options for adding links using OCLC's data and web services. If you have other types of data, you'll need to look at the URIs available on the web to see if you can convert additional strings in your data set to URIs for things.
In my example, I've added as many links as I can at this point in time to the JSON, resulting in the following output:
{ "@context": { "schema": "http://schema.org/", "id": "@id", "title": "schema:name", "kb:issn": "schema:issn", "kb:publisher": "schema:producer", "kb:provider": "schema:provider", "kb:collection": "schema:isPartOf" }, "@type": "schema:Periodical", "title": "A|Z ITU Journal of the Faculty of Architecture", "id": "http://worldcat.org/webservices/kb/rest/entries/DOAJ.Records,9230622", "links": [ { "href": "http://worldcat.org/webservices/kb/rest/entries/DOAJ.Records,9230622.html", "rel": "alternate" }, { "href": "http://worldcat.org/webservices/kb/rest/entries/DOAJ.Records,9230622", "rel": "self" }, { "href": "http://www.az.itu.edu.tr", "rel": "via" }, { "href": "http://www.az.itu.edu.tr", "rel": "canonical" } ], "schema:url": "http://www.az.itu.edu.tr", "kb:entry_uid": "9230622", "kb:entry_status": "raw", "kb:collection": { "@id": "http://worldcat.org/webservices/kb/rest/collections/DOAJ.Records", "@type": "schema:CreativeWork", "title": "Directory of Open Access Journals (All titles)" }, "kb:collection_uid": "DOAJ.Records", "kb:collection_name": "Directory of Open Access Journals (All titles)", "kb:provider": { "@id": "http://worldcat.org/webservices/kb/rest/providers/DOAJ", "@type": "schema:Organization", "title": "Directory of Open Access Journals" }, "kb:provider_uid": "DOAJ", "kb:provider_name": "Directory of Open Access Journals", "kb:oclcnum": "793811170", "schema:sameAs": "http://worldcat.org/oclc/793811170", "kb:issn": "1303-7005", "kb:publisher": "Istanbul Technical University", "kb:coverage": "fulltext@2007", "schema:startDate": "2007-01-01", "kb:coverage_enum": "fulltext", "kb:coll_type": "openaccess browsable" }
You'll notice that I've left all the existing properties from the original JSON. This is done to ensure that the data is backward compatible. If I were creating a new API or were going to make this serialization available under a separate Mime type, I would remove the duplicate elements.
You can take this JSON for a spin by putting it into the JSON-LD Playground or the RDF Translator. This will show you what the graph version of this looks like.
Resources
You can see this is a fairly reasonable way to incorporate linked data into an existing API without having to break existing clients or completely change the technological infrastructure. However, it requires building knowledge of how JSON-LD works. I'd suggest a couple of resources for this.
JSON-LD Specification - This has details about how JSON-LD should be written. It has a couple of nice examples of using context with a more object-oriented JSON document.
JSON-LD Playground - This is a super useful place to try out your JSON-LD document to make sure the semantics are correct and that your context is actually behaving as you desire. There is an N-Triples style view of the graph that is super helpful for this.
Indexing Linked Bibliographic Data with JSON-LD, BibJSON and Elasticsearch - This article from the code4lib journal isn't directly about retrofitting JSON so it becomes JSON-LD, but it has a good example of how @context works.
-
Karen Coombs
Senior Product Analyst