Calling things by their true name¶
What’s cool about JSON-LD is that it takes your API and makes it interoperable with RDF. And what’s cool about RDF — if you’ll accept that there’s anything cool about RDF — is that it can assign everything a name, and that name is meaningful and globally unique.
Naming things is one of the traditional “hard problems of computer science”, so this actually matters. And the way RDF names things should be immediately understandable to every developer: names are URLs.
Following the fantasy trope, when you know the true name of something, you have power over it.
Having the URL for a term in RDF tells you whether it’s the same as something you already know about. Computationally, you know more about what “JSON” is if you know it’s the same as https://www.wikidata.org/wiki/Q2063 or http://dbpedia.org/resource/JSON.
And if you have the URL for something that you don’t already know about, you can usually go to that URL and find more information. For example, that’s how you’d confirm that Wikidata’s “Q2063” and DBPedia’s “JSON” are the same thing as each other. That’s what makes all of this information “Linked Data”, not just data.
When you say “URL”, you must actually mean “IRI”.
It’s good to talk to you again, Imaginary Interlocutor, but do you have to be such a web-standards pedant? Nobody knows what an IRI is. I’m going to keep calling these URLs, especially because I really do intend every one of them that I produce to locate a resource.
The names in ConceptNet may look like ad-hoc identifiers, like "/c/en/knowledge_graph"
and "cc:by-sa/4.0"
. The property names, such as "dataset"
, look pretty ad-hoc too. But these are just short nicknames, and via JSON-LD, we can find the true names of all of these:
The way to turn the strings in the API response into these true names is using ConceptNet’s JSON-LD context. Don’t get too bogged down in it right now. One thing it provides is prefixes that let us use shorter names for things. Here’s the prefix that lets "cc:by-sa/4.0"
point to the Creative Commons URL above:
"cc": "http://creativecommons.org/licenses/",
It also has a base URL, for interpreting relative URLs such as /c/en/knowledge_graph
. The base URL happens to be the URL of the context itself, because why not:
"@base": "http://api.conceptnet.io/ld/conceptnet5.6/context.ld.json",
Some of the property names are things that we define. This line says that “weight” is a property that’s defined in ConceptNet’s context (cn:
for short), and its value is a floating-point number:
"weight": {"@id": "cn:weight", "@type": "xsd:float"},
Some of the properties are already meaningfully defined elsewhere. For example, we can have “comment” fields in API responses. Its values are strings to be read by the API user. This notion of a comment already exists in RDF Schema.
"comment": {"@id": "rdfs:comment", "@type": "xsd:string"},
With this line, we can specify that when we say “comment”, we mean “rdfs:comment”, which when you expand the prefix means “http://www.w3.org/2000/01/rdf-schema#comment“.
Let’s take a step back. What do you do with this kind of information?
I think the most likely user who cares about the linked data in ConceptNet is someone who’s building something larger out of ConceptNet and other resources. This would match my experience in building ConceptNet, where the inputs that are available in RDF are the ones I can be confident that I’m handling correctly, even if they update in the future.
Let’s talk about how things used to be with WordNet. If I want to refer to a particular item in WordNet, such as the synset {example, instance, illustration, representative}, there are a number of ways I could describe it, and most of them probably wouldn’t be consistent with anything else. I could give you synset names that you can look up, such as example.n.01
or illustration.n.03
. These numbers might change with new versions of WordNet, and there’s no way to inherently know that they refer to the same thing.
I could also give you an internal ID such as 05828980-n
, which at least is a single name for the synset, but all of these IDs would change with new releases of WordNet.
And this really got better because of RDF?
Yep. When using multiple data sources that are based on WordNet, you used to need a table that tells you which IDs are the same as which other IDs — basically a kind of Rosetta stone lining up names and numbers from different versions of WordNet. Hopefully some researcher somewhere has made the table you need.
But the fact that WordNet is in RDF now means that I know the global, true name that I can call this WordNet entry: http://wordnet-rdf.princeton.edu/id/05828980-n. I don’t need a Rosetta stone to know what this URL refers to. I can even go to that URL to find out more about it.
But that’s just the same internal ID shoved into a URL. How does that make a difference?
Putting it into a URL means that it’s more than just an internal ID now. Regardless of where the ID number came from originally, it’s an implicit promise that this URL consistently refers to the synset {example, instance, illustration, representative}.
And, importantly, it suggests that if you’re building something on top of WordNet, you should use the same URL to identify the same synset. These wordnet-rdf URLs are also used by the Open Multilingual WordNet project, so you can be sure of when terms in different languages are intended to refer to the same thing, and you can align the data OMW provides with WordNet data you get from other sources.