Don’t call me DOM

31 July 2007

Generating HTML documentation from DTD

Filed under:

I consider myself fairly fluent in XML technologies, but there is at least one technology that I have never bothered to fully learn, nor that I am planning to ever fully learn, and this would be XML DTDs.

Some people would like DTDs to disappear completely off the face of XML, and while I wouldn’t disagree with them, I still have to live in a world where some markup languages are formalized using DTDs, and even more importantly, where the greatest part of the developed markup language productions is checked using DTD (typically with the W3C Markup Validator).

So, instead of learning the arcane syntax of DTDs, I wrote this Python script that relies on an existing Python library that parses DTD (and thus on having someone who actually read and understood the DTD specification). It takes as parameter a DTD URI, and outputs a couple of HTML tables describing which elements and attributes are defined in the given DTD.

You can see the result of the script in the just-released XHTML Basic 1.1 reference.

The script is not anywhere the nicest code I’ve ever written, and it probably actually tops the list of the ugliest Python I’ve ever written; but then, it’s dealing with DTDs, so that seems only fair!

Picture of Dominique Hazael-MassieuxDominique Hazaël-Massieux ( is part of the World Wide Web Consortium (W3C) Staff; his interests cover a number of Web technologies, as well as the usage of open source software in a distributed work environment.