Embedded Linked Data in Webpages
Our last month of posts on linked data focused on how to query existing sets of linked data using SPARQL. This month, we're shifting focus from querying to producing data. One of the simplest ways to create linked data is to add it to existing webpages. This strategy makes the data readily available to both search engines crawling the Web for structured data and other linked data consumers.
General Principles for Creating Structured Data
If you are encoding data that doesn't already exist in a graph form, the process of creating linked data requires several key steps. First, decide who the audience or audiences for consuming the data will be. This first decision is crucial in making later decisions about encoding the data. Second, determine what entities exist within the data. Third, decide what statements need to be made about the entities and their relationships. An existing vocabulary/vocabularies can be used, or you can create your own. Currently, one of the most well adopted ontologies is Schema.org. The Schema.org site provides information about the ontology as well as examples marked up in microdata, RDFa and JSON-LD. After ontologies are chosen, the next step is to decide the serialization(s) to share. The last step is to properly encode the semantics in the chosen serialization(s). Two serializations significant to adding linked data to existing webpages are RDFa and JSON-LD.
RDFa
RDFa is a serialization that can be used to add structured linked data to existing webpages. It uses a set of properties to add the markup to existing HTML elements. Some key properties in RDFa are as follows:
- Prefix – used for the prefixes you’ll be using for the ontologies within the markup
- Resource – used for the URIs for entities within the markup
- Property – used for the names of the predicates being used
- Typeof – used for the rdf:type or rdf:types of the entity
While RDF libraries can output RDFa, this output will not match the look and feel of your UI. To embed the RDFa in your own UI, you'll have to markup the HTML. Let's look at a simple example for a book.
Example
<body prefix='schema: http://schema.org/'> <div resource='http://www.worldcat.org/oclc/7977212' typeof='http://schema.org/Book http://schema.org/CreativeWork'> <h1 property='schema:name'>Old Possum's book of practical cats</h1> <div property='schema:author' resource='http://viaf.org/viaf/56609282' typeof='http://schema.org/Person'> <h2 property='schema:name'>Thomas Stearns Eliot, 1888-1965</h2> </div> <h3>Contributors</h3> <ul> <li property='schema:contributor' resource='http://viaf.org/viaf/105372100' typeof='http://schema.org/Person'> <span property='schema:name'>Edward Gorey, 1925-2000</span> </li> </ul> <p property='schema:datePublished'>1982</p> </div> </body>
On the surface, this might seem like a fairly time consuming and trivial task. However, to produce good RDFa the semantics have to be correct within the RDFa serialization. If you are creating your HTML from an existing graph, then you can run the RDFa through a tool that will convert it to a serialization so it is easier to read the semantics. I prefer Turtle for this and often use the EasyRDF PHP library “Converter” page.
Additionally, you can test your RDFa output using the Google Structure Data testing tool. You can provide a URL to a public webpage or cut and paste HTML into the tester. This will give you some idea of how Google and other search engines will “see” and extract your data.
Embedded JSON-LD
Google's documentation indicates that it will read data embedded in webpages via microdata, microformats, RDFa or JSON-LD. This means embedded JSON-LD is a viable alternative to RDFa for search engines capable of harvesting. Embedded JSON-LD utilizes a script tag to hold all the structured data serialized as JSON-LD. Let’s look at the same sample book information serialized as JSON-LD.
Example
<script type="application/json+ld"> { "@context": { "schema": "http://schema.org/" }, "@graph": [ { "@id": "http://www.worldcat.org/oclc/7977212", "@type": [ "schema:Book", "schema:CreativeWork" ], "schema:bookFormat": { "@id": "bgn:PrintBook" }, "schema:contributor": [ { "@id": "http://viaf.org/viaf/105372100", "@type": "schema:Person", "schema:birthDate": "1925", "schema:deathDate": "2000", "schema:familyName": "Gorey", "schema:givenName": "Edward", "schema:name": "Edward Gorey" } ], "schema:creator": { "@id": "http://viaf.org/viaf/56609282", "@type": "schema:Person", "schema:birthDate": "1888", "schema:deathDate": "1965", "schema:familyName": "Eliot", "schema:givenName": [ "T. S.", "Thomas Stearns" ], "schema:name": "Thomas Stearns Eliot" }, "schema:datePublished": "1982" } ] } </script>
Embedded JSON-LD has several advantages. If there is an existing graph, serializing JSON-LD using a linked data library and embedding it is very simple. This serialization also has the potential to keep the HTML a little leaner and cleaner. Lastly, it can be easier for a human to read and verify the semantics in JSON-LD than other Linked Data serialization such as RDFa or RDFXML.
There are downsides to embedded JSON-LD, though. While Google supports JSON-LD, the level of support it has within other harvesters is unclear. Additionally, linked data parsers that retrieve HTML are not currently capable of extracting the embedded JSON-LD. These parsers expect RDFa in documents that are returned for the HTML mime types.
JSON-LD via Linked Script
In addition to reading JSON-LD embedded in a script tag, Google will also read JSON-LD linked to in a script tag. Depending on the application, this may be a more efficient and effective way to embed the linked data within the page. Fundamentally, this is the same serialization. However, the technique for adding it is slightly different. All that is necessary is a single script tag that links to the JSON-LD URL.
Example
<script type="application/json+ld" href="http://www.worldcat.org/oclc/7977212.jsonld"/>
Link Tags
Another way to indicate to consumers of your page that linked data is available is by using HTML link elements to indicate there is an alternate version available. By listing all the serializations available using the link tags, the HTML page indicates to clients what other serializations are available.
Example
<link rel="alternate" type="application/rdf+xml" href="http://www.worldcat.org/oclc/7977212.rdf" /> <link rel="alternate" type="application/ld+json" href="http://www.worldcat.org/oclc/7977212.jsonld" /> <link rel="alternate" type="text/turtle" href="http://www.worldcat.org/oclc/7977212.ttl" /> <link rel="alternate" type="text/plain" href="http://www.worldcat.org/oclc/7977212.nt" />
All of these techniques can be used to make your data readily accessible to search engines and other linked data without creating massive infrastructure changes. In our next two posts, we’ll dive a little deeper into the JSON-LD serializations and how it can be used to retrofit an existing JSON API to be valid linked data.
-
Karen Coombs
Senior Product Analyst