2005-08-12

What is XML?

Posted in Uncategorized at 6:57 pm by Liam Quin

I wanted somewhere I could post without sounding too authoritative; my Advogato account isn’t really focussed on XML and my personal page isn’t bloggish.

What is XML?
I’ve recently held a number of Future of XML sessions at various conferences. I posted about it on my Advogato blog too. I got quite a lot of feedback, but one thing emerged clearly: XML is what you make of it. There is no one answer. There is no One True Data Model. There is no One True Meaning, and no One True Processing Model.

One person confidently told me (and maybe 150 pther people in the room), “The 98% use-case for XML today is for Web services, and all features not needed for Web services should be removed.” Another (in the same session) said, more or less, “it seems to me that the main purpose of XML is to serialise RDF, and that RDF can do everything XML can, so let’s get rid of all the features of XML that RDF doesn’t need.” I do not mean to mock these two people: their viewpoints are, in their respective worlds, perfectly valid. It is clear, however, that neither viewpoint should be allowed to dominate to the exclusion of the other.

Last week I was at Extreme Markup, my favourite of the XML conferences. It’s where a whole bunch of markup geeks can talk about the philosphy of markup. Those conversations are important. At Tommie Usdin (conference co-chair) said, the edge cases are where the change comes from.

A strong theme at Extreme Markup this year (and last year) was the need to handle overlapping regions of text when one is using markup to describe existing non-XML documents. For example, consider the question, “How many times in Luke’s Gospel (a Biblical text) does Jesus speak?” It turns out that sometimes a quote can start in the middle of one verse and continue over two verses and end in the middle of another verse. So the verse structure and the speech structures overlap. But neither is intrisically more important than the other: they both exist. One can use empty elements to mark up verse boundaries, and then one can easily answer the question about speech. But then it’s really hard to fetch the contents of any particular verse.

If this all sounds really abstruse and irrelevent, think again. The very idea of SGML and generic vendor and application neutral markup was once considered equally weird and edge-case, and yet today HTML and XML are mainstream. It just took a while.

So, back to the question, what is XML? Is it a serialisation of computer data? A remote procedure call format? A vendor-neutral way to represent data? Or an application-independent way to represent text and its structure? It’s all of these, of course. The neat part is that a tool that handles any one of these might well handle them all, and yet specialisations (e.g. a graphics editor using SVG) can also exist alongside the more generic tools without any contradiction.

XML is many different things to many people. A former employer of mine, the late Yuri Rubinsky, used to tell the story of a group of blind people who encounter an elephant and try to describe it. One says it’s a long swinging thing (the trunk, get your mind out of the gutter!), another that it’s a long curved tusk, and another that it’s huge and flat (the middle part I suppose). In the same way, SGML (and today, XML) is many things to many people, and that is its strength.

This all means that on the one hand we must try to welcome new communities, and on the other hand we must keep the communities we already have: a difficult balance. But a worth-while one.