Sunday, March 22, 2015

The Rise and Fall of XML

First there was XML. I remember XML from about 2000.

Then there was SOAP. Then WSDL. XML schema, RDF, RSS and other formats had also appeared before I checked out. That's about when I stopped paying attention. But on a recent rediscovery trip I was introduced to BPEL, BPMN, and other such acronyms. And now I know how XML died. And now they live on as zombies, sort of like COBOL.

Well, I don't in fact know if they have actually died, I'm just hoping. XML is a reasonable standard. It has a spec that is arguably complete, and solves a well defined problem. SOAP is less well specified. It has a well defined spec which is again relatively complete, but usage in practice is a subset of what the spec allows. Witness RPC style and document literal style calling. The latter became the preferred approach for using SOAP, so far as I can tell because Microsoft's tooling supported it better. And the overriding consideration appears to be that one style meshes better with auto-generated interfaces in various programming languages than the other.

Generating SOAP interfaces relies on consuming and using WSDL documents, which allows for an ever wider set of interpretations. WSDL allows for the separation of the logical description of an interface from its implementation. From an engineering perspective, I'd argue relatively few individuals would care about the logical description of a service. Most would only look at it to the extent that it supports calling a particular web service. And so we have a proliferation of possible interpretations of abstract service descriptions.

When we get to BPEL, we're building abstractions on top of abstractions. We're building a spec that is a union of existing approaches. And we have incomplete implementations of the spec. The rigor with which the spec is defined is greatly reduced. One can only put together a reasonable implementation of the spec based on existing software implementations, which feels too much like reverse engineering said implementations rather than working from a well thought out description.

The BPEL spec was put out in 2007, and I was fortunate to have not encountered it until now. In 2009 I found REST and JSON, as I transitioned to a startup. REST based services are underspecified in how their messages work, or how they are discovered, or how they might relate to each other. This is a good thing. It prevents train wrecks like BPEL from emerging.

Why Standardize?

There are probably many ways to arrive at a standard. The two I have encountered most frequently are:

  • Codifying existing practice vs specifying first
  • Capturing core concepts (concepts intersection) vs capturing all technology features (concepts union)

With most XML based technologies, the tendency appears to have been towards specification first, concepts union. Which with just a little bit of thought should make apparent how a single specification would allow for a wide variety of usage models and approaches.

XML seems to have been on the side of codifying first, capturing core concepts. XML grew out of HTML and SGML. It grew out of the desire to separate content from presentation. We can argue about how successful that's been, but that notion was the starting point. It brought greater structure and sanity to the tag soup that was the WWW at the time. I consider XML to have been incredibly successful.

SOAP was a specification of RPC in XML. I would consider it to be specification first, concept intersection. But it was based on a long history of RPC implementations. It worked pretty well.

WSDL is where we start going off the reservation. Key elements of WSDL were brand new ideas: types specified in XSD being used for constructing messages that were being passed over SOAP protocols. Instead of stopping there, WSDL became incredibly aspirational, and attempted to separate the abstract description of services using these untested languages from the relatively clear RPC mechanism encoded in SOAP, all without existing implementations of the end-to-end system to see how it would work out in practice. And, as we can see, practice required simplifying away parts of the standard so that there was a sane core that could be implemented and used.

The specification-first tendency is at its maximum in BPEL. My guess right now is there isn't a single complete implementation of the BPEL spec. I'd argue, this is how you kill an emerging ecosystem.

By now it should be obvious that I dislike specification-first. It reminds me of other bad ideas, such as the waterfall process. We must recognize that it is rooted however in a good thought: that if we build the specification first, we will avoid building costly systems that will later have to go through an even more costly process of re-adapting to a future standard. But instead now you're left with a costly system that follows a standard that few want to use or understand.

The union vs intersection argument is more challenging. I have a preference for intersection because it generally requires resolving conflict. In conflict we have to surface, discuss and settle opposing points of view, to arrive at a conclusion that must cover just enough ground to settle the conflict. It requires taking a stand. Whereas union is an effort to get along on paper while continuing to operate how you always had, in a way that sets you apart from others with whom you've entered into union. The key advantage of an intersection is that you should get something smaller that a new entrant into the ecosystem can pick up and get running relatively quickly, while with a union you could have a new entrant come in and build something that is different yet again from everything everyone else has built. With one of these approaches you at least have a prayer of interoperability.

So What About REST?

Let's now step back again and consider REST. It is an approach to communication distinct from RPC. It has come from an understanding and appreciation for the HTTP protocol, and adapting it to building web services. Where RPC tries to mimic function calls over the network, REST tries to manipulate object state. With REST one is adapting the HTTP protocol verbs to describing object state and operations thereon. I would therefore argue that it is an implementation first approach.

REST is more importantly a work in progress. There are many libraries that support REST style communication. And there are many ways in which software has used these libraries. There's an emerging consensus on how to define and present a REST interface. But being a work in progress means all implementations of these APIs will be subject to obsolescence. But at the same time we're freed of the requirement to conform to an arbitrary specification set ahead of time without exploring and maximizing the potential of the protocol.

(Thank goodness for the timely death of WSDL 2.0, which for all the improvement it brought, would still have been a step backwards. Again, last spotted in 2007.)

In contrast to SOAP, WSDL and so on, while REST has a lot to say about communication, it has nothing to say about communication content. By convention JSON is the message format of choice, but XML messages aren't precluded. We thus have a good separation of concerns, where the communication protocol states where the content should go, but has nothing further to say about it.

The apparent disadvantage of such an approach is that there is nothing like BPEL in the REST ecosystem. Once again, I think this is a strength. If there is a great need for something like BPEL, the argument goes, there will over time be multiple implementations that satisfy the needs, and that's a good time to think about standards and approaches to support the need. Until then, time spent considering the problem will be time poorly spent.