Tuesday, March 15, 2005

ST 2005 Note #4 - The Semantic Conceptual Stack

The Stack


The conceptual stack in the semantic technology arena is composed of the following key elements:







LayerDescription
SyntaxThe underlying representation of the structure. XML provides this foundation piece.
VocabularyA collection of terms and their definitions.
TaxonomyA collection of terms organised into a classification scheme.
OntologyA specification of a conceptualisation[1]. More practically, data, structure, meaning and rules.

As an organisation moves up the stack the degree of detailed and coverage of the semantic content of the business grows. This in turn means that the utility of the models grow. However, the effort required to define and manage the models increases rapidly.

Decomposing the stack


Each layer in the stack can be decomposed into high level pieces. One such decomposition is below:

This figure shows the constituents of each layer related to semantics.

Vocabulary


The first rung on the ladder, once a common syntax has been defined, is the vocabulary. At its simplest this contains terms (e.g. Currency) and a definition (e.g. medium of exchange, monetary system). This can be elaborated in a number of ways. Firstly the terms can be mapped to an existing lexical database such as WordNet. In this manner the definition of the term Currency could simply be http://wordnet.princeton.edu/cgi-bin/webwn2.0?stage=2&word=currency&posnumber=1&
searchtypenumber=2&senses=1&showglosses=1
. Secondly, the term could be defined in relation so surrounding terms. In particular guidance on how to rule whether an item is a Currency is very useful. The medical profession uses this mechanism (rule-in/rule-out) to provide mechanisms for determing if symptoms rule in or out a specific disease. Finally, a term maybe associated with a canonical name and a short name (for use in database schema etc).

Taxonomy


The second rung, a taxonomy presents terms within a classification framework. A taxonomy would classify terms. For example, Currency could be a unit of measure, it could be countable.

Ontology


The third rung, an ontology takes terms, defines the data associated with them, the relationships between terms and constraints/rules that define how terms and relationships can be combined and what their lifecycles are. An ontology can be viewed as a data model with an associated constraint language or as a sequence of assertions of the form [subject, predicate, object].

Practical application


Elements of this stack are in use now in many organisations.

Metadata repository


For instance, UPS has a metadata repository (started in the late 80's) which stores a Taxonomy. Terms are associated with a definition, a canonical name, a short name and an abbreviation. UPS ensure that all schema that reference terms use the standard names. This makes it relatively simple to understand the meaning behind the entity definitions in, for example, a database schema. UPS use the classification scheme to reason about the vocabulary. For example, one classification is code. This allowed UPS to identify they they had a growing list of code terms and move to establish a central code repository and identify sources for codes (such as standards organisations).

B2B standards alignment


Other organisations are establishing defined ontologies within specific domains. This has allowed groups to map an internal ontology to external standards. This activity has enabled standards alignment for B2B activities (both internally and externally).

Web Services


There is a lot of activity around semantics and web services. This is covered in ST 2005 Note#5 - Semantic Web Services.

Enterprise Application and Information Integration


Organisations use a common message bus and/or data bus[2] to make the integration activity cost effective in the medium term. Implicitly or explicitly, the data on these buses normally has a common data model and master/static/reference data source. Without these elements the bus becomes a conduit (an expensive one at that) for point to point interfaces. When a common bus is used it is important that the common models in use can be communicated clearly and all involved understand the semantics of the model. This degree of understanding involves the following elements in the model:

  • Entities
  • Relationships, including roles, cardinality, ownership etc.
  • The business meaning of each entitity.
  • The business meaning of each relationship.
  • The lifecycle of the entities and relationships.

It is this information that ontologies document and the current crop of W3C standards provides mechanisms for persisting this information in a machine readable form.

Inference


Ontologies provide the opportunity for organisations to infer implicit relationships between instances based on the explicit relationships in the ontology and associated business rules. However, there was limited example of its used on commercial organisations at ST 2005.

Summary



  • Organisations are already using elements of the semantic conceptual stack.
  • Vocabularies, taxonomies and ontologies are reducing the cost of information and application integration.
  • The entire stack does not have to be adopted at once.



[1] Gruber, Tom. http://www-ksl.stanford.edu/kst/what-is-an-ontology.html
[2] Data bus could be a data warehouse or an operational data store.