I have just reported the bug in the w3c-dtd-xhtml Ubuntu package that had prevented me from using the Apache XML Catalog resolver to use local XHTML DTDs rather than the on-line ones when using the Saxon XSLT processor.
Hitting the on-line DTDs on every invokation of Saxon unnecessarily burdens the W3C Web site. I had already found guidance on how to use the Apache XML Catalog resolver to avoid that, but it wouldn’t work with the default XML catalog list provided by Ubuntu in
/etc/xml/catalog for the XHTML DTDs.
After some investigation, it appeared that the use of a bogus URL as a SystemID in the intermediaries XHTML catalog files prevents the proper parsing of these catalogs, and thus make the local DTDs undiscoverable.
With the patch provided in my bug report, I can now happily use
/etc/xml/catalog with saxon and never hit the network when transforming XHTML files:
java -cp /usr/share/java/xml-commons-resolver-1.1.jar:path-to-saxon8/saxon8.jar -Dxml.catalog.files=/etc/xml/catalog -Dxml.catalog.verbosity=1 net.sf.saxon.Transform -novw -r org.apache.xml.resolver.tools.CatalogResolver -x org.apache.xml.resolver.tools.ResolvingXMLReader -y org.apache.xml.resolver.tools.ResolvingXMLReader XMLFile XSLTFile