Tuesday, September 30, 2008

Expressiveness of Semantic MediaWiki

Semantic MediaWiki (SMW) represents a concept as a page. Each page is exactly one concept. Each link to another page is potentially a triple, with an annotation on the link identifying the relation. This in a nutshell is the syntactic representation of triples in SMW. Categories potentially reflect classes, while normal pages reflect individuals. This gives a class-instance distinction.

This simple approach allows us to represent a variety of knowledge. But there are many other key constructs required for authoring a full ontology. I'll just give one example here: template vs own slots. These terms require some explanation. Template slots are defined on classes. Suppose (C1 r V2) is a triple representing a specification of a template slot. C1 is a class, and V2 is some valid value for r. By stating this triple, we're asserting that the filler of r in an instance of C1 must be V2. If r were an own slot, by contrast, C1 itself would have a value V2 in the filler of r. The instances of C1 would not be directly aware of this triple.

Further, the assumption that a single page describes only one concept is not tenable in any realistic situation. All but the simplest articles deal with multiple concepts. Splitting an article into components is often not feasible, as the content becomes too scattered for easy human consumption. I deal with both issues in the remainder of this post.

Given that we want to represent knowledge as succinctly as possible within the general wiki syntax, creating every semantic distinction within the wiki is a significant challenge. The SMW syntax is very simple, but the knowledge one can describe through it is too simplistic. A few straightforward changes can make SMW much more expressive.

Let's start with classes and instances. I don't think it is a good idea to conflate a class with a category, and an instance with an article. Each of these objects have distinct roles that are then lost. For example, each article is an instance of a class Document. The article discusses subjects that can be mapped to concepts. But the article is not itself a concept.

Here's a simple mechanism for producing such mappings. Each time an article discusses something that can be related to a concept, declare a start tag. At the end of the discussion, close with a end tag.

But, if an article no longer represents something in the KB, we can no longer rely on links to describe triples. This is easily addressed. Insofar as an article talks about anything, it is generally possible to define a primary topic for the article. Then a link is substituted with that primary topic when we're defining triples.

The nesting of topics also represents a relationship between those topics. This is similar to annotated links describing relations between concepts in the KB.

Allow a document to represent a class, an individual, or neither. This can be easily declared, just as categories are declared. This will enable documents to describe either classes or instances, or a combination of the two. The current SMW implementation supports linking categories to anything one wishes. However, categories do not show factboxes, except in previews. Asking for the OWL export of a category does not include the properties that had been included there. These may just be bugs, but as far as I know there exists no specification of the meaning of linking a category to another via a relation. One needs to be able to say whether a property linking two classes is intended as an own slot, a template slot, or a restriction on the slot filler of the source class. I would contend that in most cases one can deduce the intended meaning by considering the domain and range restrictions for a slot.

Effectively, my experience is that SMW can't describe anything but the most rudimentary content about classes. One cannot really develop an ontology in SMW as it exists today. At best one can describe basic taxonomy.