Tuesday, August 15, 2006

XPath - bigger and better?

I recently started looking at XPath again, as a convenient way of manipulating XML documents programmatically. Natually, I needed a lisp implementation on top of the framework that cxml provides. And not finding any obvious candidates, I started rolling my own, starting with something really simple, with a lispy syntax. And to come up with the lispy syntax, I stared to refer to the XPath documents at w3.org. And that's when the fun began. My, how XPath has grown! Don't get me wrong, some of the things in there seem to be fine ideas, like unifying XPath with XQuery and coming up with some formal semantics. But the reference material right now is unreadable.

So I'm starting with a syntax inspired by XPath 1.0, hoping that the semantics haven't changed drastically since then:

(expand-path '(child "xhtml:div" / child "xhtml:p")) ->
((LOOP FOR NODE IN (DOM:CHILD-NODES CONTEXT) WHEN
(AND (DOM:ELEMENT-P NODE)
(PURI:URI= (DOM:NAMESPACE-URI NODE)
#<URI http://www.w3.org/1999/xhtml>)
(STRING= (DOM:LOCAL-NAME NODE) "div"))
COLLECT NODE)
(LOOP FOR NODE IN (DOM:CHILD-NODES CONTEXT) WHEN
(AND (DOM:ELEMENT-P NODE)
(PURI:URI= (DOM:NAMESPACE-URI NODE)
#<URI http://www.w3.org/1999/xhtml>)
(STRING= (DOM:LOCAL-NAME NODE) "p"))
COLLECT NODE))


Naturally, some more work is still needed to actually apply these fragments to an XML document.

Wednesday, August 2, 2006

XML tools for lisp

I've been trying to find a good set of tools for manipulating XML/SVG in common lisp. There are a bunch out there, but as should be expected many of them are near unusable. After eliminating those that I considered to be obviously poor choices due to their syntax, or their license, I was left with XMLisp and cxml. In the process I also learned a great deal about asdf and asdf-install. It's odd that in spite of having used lisp for such a long time, I haven't really had a chance to work with these pieces of code that the lisp community has put together.

First the XML tools. XMLisp is a fascinating idea, to be able to turn expressions typed on the REPL into CLOS objects, and vice versa. However cool the idea, there are many things I found wrong with this approach. There isn't enough documentation. A set of examples is not a clear specification of the program's behavior, which is all that comes with XMLisp. It isn't clear how content and attributes are distinguished. I considered how something like XHTML would be manipulated in XMLisp. It appears one would have to encode every tag manually as a class. This is an error prone operation, and a maintenance nightmare. Finally, there's namespaces. It isn't clear if XMLisp handles them correctly.

cxml by comparison is well designed. It handles namespaces correctly. It creates a DOM tree for a document, or can function as a SAX parser. The documentation is better, though still feels a bit inadequate. I haven't actually tried doing any real work with it, so I can't say with any certainty. The documentation also indicates an awareness of DTD's. Tonight I hope to take some time to see if I can manipulate XML documents thruogh cxml. Finally, cxml is available through asdf-install.

As I've already mentioned, asdf and asdf-install are relatively new beasts to me. I have been largely working with ACL and ACL specific tools so far. They tend to be good, but I have had some issues with them. asdf-install is so far straightforward to use, even with gpg. There are a few loose ends, but not asdf-install's fault: broken gpg key access, etc. Installation of packages is one thing, their management is another, though, and the latter can't be known until I have worked with asdf-install for a little while.