Thursday, July 17, 2008

RDF and Hypergraphs

Today I discovered directed hypergraphs in all their glory. Well, OK, not all their glory, since information about them is pretty sketchy. There seems to be only one entry in Wikipedia about hypergraphs, with some of the math involved in them. Digging deeper into the literature brought up directed hypergraphs, which according to Wikipedia should not exist. So... what is a hypergraph?

While a graph has edges that connect pairs of nodes, hypergraphs have edges that connect an arbitrary subset of the graph's nodes. A directed hypergraph places a direction between two sets of nodes, where the sets are disjoint. There are mathematical formalizations for these graphs, naturally, but I just want to stick to the intuitions.

Naturally, the resulting data structure can't be visualized in any reasonable way using nodes and edges. So... what are they good for?

I started looking into this knowing that they have been applied to semantic web problems. Google brought me pretty quickly to the realization that hypergraphs have been widely considered for representing RDF. Each node in such a hypergraph could be either the subject, predicate or object of an RDF triple. The triple itself is then an edge in the hypergraph. The hypergraph must be directed, otherwise it isn't possible to distinguish subject from object.

Unfortunately there doesn't seem to be enough discussion about precisely how such an edge should be interpreted. Would the source and destination sets be {s, p} -> {o}, or {s} -> {p, o}? Because, really, we need to distinguish between subject, predicate and object. And in the representation above you can't really tell (other than by the letter prefix I've used) which one of the three is the predicate.

There are other ways I've seen of applying hypergraphs to representing RDF. There was a paper that described the application of hypergraphs to mergining ontologies. The description was quite precise and relatively clear.