Monday, March 14, 2005

ST 2005 Note #1

A number of organisations have been discussing how they are using RDF as their data interchange format. They describe their use of RDF and present examples of RDF encoded in XML.

When representing data in XML it often boils down to how you use XML to represent a directed graph, where the arcs are labeled and have meaning. This in turn raises the value/identity issue. For example:

<movements>
<movement>
<grade><name>JET</name></grade>
<movement>
<movement>
<grade><name>JET</name></grade>
<movement>
<movements>

In this example we assume that the message has to be self-contained, i.e. no external references. The example includes the grade JET, which is identified by value. There are a number of things we don't know. What is the relationship between a movement and the grade. Does a movement carry the grade? If a movement is deleted is the grade deleted as well? Addressing the first issue involves making the relationship a first class element. Whilst doing that let's ensure there is only every one JET grade defined in the document.

<movements>
<movement>
<carries>
<grade id='1'><name>JET</name></grade>
</carries>
<movement>
<movement>
<carries href='#1'>
<grade><name>JET</name></grade>
</carries>
<movement>
<movements>

In this example the second carries back references the first one. The implication is that the JET grade has an identity which is important. In complex schema this type of approach is often used, though the references and the referenced may have different locations. Of course, there are issues around containment (i.e. when I delete the first movement element does that delete the JET grade or not. However, let's ignore that for the time being.

Note that an approach to representing relationships has been constructed 'on the fly'. It's not standard and the semantics aren't clear.

We could express the each movement as a set of triples (subject, predicate, object) thus:

movement (some id) carries grade (some id)

Or, with a liberal addition of URIs for identifiers:

http://www.newco.com/movements/1234 http://www.newco.com/predicates/carries
http://www.newco.com/grades/5678.
http://www.newco.com/grades/5678 http://www.newco.com/predicates/name JET.

This URI based triples model is what RDF uses. Moving to XML, or an XML representation of the triples, we'll assume the following pre-amble:

<?xml version="1.0"?>
<!DOCTYPE rdf:RDF [<!ENTITY xsd "http://www.w3.org/2001/XMLSchema#">]>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:terms="http://www.newco.com/terms/"
xmlns:grades="http://www.newco.com/grades/"
xmlns:types="http://www.newco.com/types/">

In RDF we could describe the movement and grade thus:

<rdf:Description rdf:about="http://www.newco.com/movements/1234">
<rdf:type rdf:resource="http://www.newco.com/types/Movement"/>
<terms:carries rdf:resource="http://www.newco.com/grades/5678">
</rdf:Description>

<rdf:Description rdf:about="http://www.newco.com/grades/5678">
<rdf:type rdf:resource="http://www.newco.com/types/Grade"/>
<terms:name>JET</terms:name>
</rdf:Description>

Note we added type, so we know what type of this we are dealing with. For brevity, we can remove the rdf:Description and rdf:type verbosity by using the type as an element name:

<types:Movement rdf:about="http://www.newco.com/movements/1234">
<terms:carries rdf:resource="http://www.newco.com/grades/5678">
</types:Movement>

<types:Grade rdf:about="http://www.newco.com/grades/5678">>
<terms:name>JET</term:name>
</types:Grade>

What have we gained over the initial XML?

  • We have a formal model for representing information about an entity (triples) which we do not have to invent.
  • We have a well defined mapping from this model to XML and we didn't have to invent it.
  • We haven't had to invent a mechanism for handling the fact our grade/movement model isn't hierarchical, the types are peers.
  • We have not ended up in a world of xsi:type pain.
  • We have RDF tool support if we need it.
  • If we need collections with defined semantics then RDF supplies these, we do not have to invent them.

Clearly RDF is a big topic, and there is also RDF-Schema and then OWL to layer some rules on top of structure. More on this later.


[1]The ST 2005 Notes are consolidated notes from the 2005 Semantic Technology conference.