<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Don't call me DOM &#187; XSLT</title>
	<atom:link href="http://people.w3.org/~dom/archives/category/development/web-development/xslt/feed/" rel="self" type="application/rss+xml" />
	<link>http://people.w3.org/~dom</link>
	<description>W3C has the DOM, and the Dom ; pick the one you prefer.</description>
	<lastBuildDate>Sat, 07 Nov 2009 11:02:42 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.5</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Using /etc/xml/catalog with org.apache.xml.resolver </title>
		<link>http://people.w3.org/~dom/archives/2009/07/using-etcxmlcatalog-with-org-apache-xml-resolver/</link>
		<comments>http://people.w3.org/~dom/archives/2009/07/using-etcxmlcatalog-with-org-apache-xml-resolver/#comments</comments>
		<pubDate>Thu, 16 Jul 2009 14:50:30 +0000</pubDate>
		<dc:creator>Dom</dc:creator>
				<category><![CDATA[XSLT]]></category>

		<guid isPermaLink="false">http://people.w3.org/~dom/?p=300</guid>
		<description><![CDATA[I have just reported the bug in the w3c-dtd-xhtml Ubuntu package that had prevented me from using the Apache XML Catalog resolver to use local XHTML DTDs rather than the on-line ones when using the Saxon XSLT processor.

Hitting the on-line DTDs on every invokation of Saxon  unnecessarily burdens the W3C Web site. I had [...]]]></description>
			<content:encoded><![CDATA[<p>I have just <a href="https://bugs.launchpad.net/bugs/400259">reported the bug in the w3c-dtd-xhtml Ubuntu package</a> that had prevented me from using the Apache XML Catalog resolver to use local XHTML DTDs rather than the on-line ones when using the Saxon XSLT processor.</p>

<p>Hitting the on-line DTDs on every invokation of Saxon  <a href="http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dtd_traffic">unnecessarily burdens the W3C Web site</a>. I had already found <a href="http://xml.apache.org/commons/components/resolver/resolver-article.html">guidance on how to use the Apache XML Catalog resolver</a> to avoid that, but it wouldn&#8217;t work with the default XML catalog list provided by Ubuntu in <code>/etc/xml/catalog</code> for the XHTML DTDs.</p>

<p>After some investigation, it appeared that the use of a bogus URL as a SystemID in the intermediaries XHTML catalog files prevents the proper parsing of these catalogs, and thus make the local DTDs undiscoverable.</p>

<p>With the patch provided in my bug report, I can now happily use <code>/etc/xml/catalog</code> with saxon and never hit the network when transforming XHTML files:</p>
<pre><code>
java -cp /usr/share/java/xml-commons-resolver-1.1.jar:<var>path-to-saxon8</var>/saxon8.jar -Dxml.catalog.files=/etc/xml/catalog -Dxml.catalog.verbosity=1 net.sf.saxon.Transform -novw -r org.apache.xml.resolver.tools.CatalogResolver         -x org.apache.xml.resolver.tools.ResolvingXMLReader         -y org.apache.xml.resolver.tools.ResolvingXMLReader <var>XMLFile</var> <var>XSLTFile</var>
</code></pre>
]]></content:encoded>
			<wfw:commentRss>http://people.w3.org/~dom/archives/2009/07/using-etcxmlcatalog-with-org-apache-xml-resolver/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Synchronizing text and video</title>
		<link>http://people.w3.org/~dom/archives/2009/02/synchronizing-text-and-video/</link>
		<comments>http://people.w3.org/~dom/archives/2009/02/synchronizing-text-and-video/#comments</comments>
		<pubDate>Fri, 13 Feb 2009 12:31:32 +0000</pubDate>
		<dc:creator>Dom</dc:creator>
				<category><![CDATA[Video]]></category>
		<category><![CDATA[Work environment]]></category>
		<category><![CDATA[XSLT]]></category>

		<guid isPermaLink="false">http://people.w3.org/~dom/?p=235</guid>
		<description><![CDATA[After having visited the land of transcription as my first stop in the world of Web video, the next logical step was to look into how this wonderful transcription of my video could be actually shown along with the video.
Transcriber, the tool I used to generate the captions of the video, saves the transcription into [...]]]></description>
			<content:encoded><![CDATA[<p>After having <a href="http://people.w3.org/~dom/archives/2009/02/diving-in-transcription/">visited the land of transcription</a> as my first stop in the <a href="http://people.w3.org/~dom/archives/2009/02/exploring-the-world-of-web-video/">world of Web video</a>, the next logical step was to look into how this wonderful transcription of my video could be actually shown along with the video.</p>
<p><a href="http://trans.sourceforge.net/en/presentation.php">Transcriber</a>, the tool I used to generate the captions of the video, saves the <a href="http://www.w3.org/2009/02/presentation-viewer/parisweb-transcriber.xml">transcription into its own XML format</a>:</p>
<pre ><code>&lt;Episode program="<span xml:lang="fr">ParisWeb 2007 - Les Bonnes Pratiques du Web Mobile</span>" 
  air_date="2007-11-16">
&lt;Trans scribe="<span xml:lang="fr">Dominique Hazael-Massieux</span>" 
  audio_filename="parisweb" version="5" version_date="090210" xml:lang="fr">
  &lt;Speakers>
    &lt;Speaker id="spk1" name="<span xml:lang="fr">Stéphane Deschamps</span>" 
      check="no" type="male" dialect="native" accent="French" scope="local"/>
    &lt;/Speaker>
  &lt;/Speakers>
  &lt;Section type="report" startTime="0" endTime="44.209">
    &lt;Turn startTime="0" endTime="19.933" speaker="spk1" mode="planned">
      &lt;Sync time="0"/>
      <span xml:lang="fr">Y'a quelque chose auquel on croit beaucoup à ParisWeb,</span>
      &lt;Sync time="3.458"/>
      <span xml:lang="fr">c'est "les standards, c'est bon, mangez-en",</span>
      &lt;Sync time="6.553"/>
      <span xml:lang="fr">c'est pour ça que cette association existe</span>
    &lt;/Turn>
  &lt;/Section>
&lt;/Trans></code></pre>

<p>It offers the possibility to export it in a variety of other formats (including HTML), but for sake of exploring one of the technologies in development in W3C for that precise use-case, <a href="http://www.w3.org/TR/ttaf1-dfxp/">Timed Text DFXP</a>, I started to look into transforming their XML format into Timed Text.</p>
<p>Another motivation for that was that <a href="http://home.gna.org/subtitleeditor/">Subtitle Editor</a>, the other tool I had looked at, is able to import and export timed text data; this also meant that very same tool would allow me to quickly visualize the subtitles superimposed to the video, one of the advantages that it had over Transcriber.</p>
<p>It turned out (unsurprisingly, I suppose) that the conversion between the two formats was really quite easy through <a href="http://www.w3.org/2009/02/presentation-viewer/transcriber2dfxp.xsl">an XSLT style sheet</a>; the main structural difference between the two formats is that Transcriber notes break points as XML elements (<code>&lt;Sync&gt;</code> in the example above), while TimedText wraps the transcripted content into elements (<code>&lt;span&gt;</code> or <code>&lt;p&gt;</code>).</p>
<p>So, now that I had a <a href="http://www.w3.org/2009/02/presentation-viewer/parisweb.xml">Timed Text version of my transcription</a>, how did that help me putting the transcripted video on the Web?</p>
<p>Looking quickly on the Web, it seems that some Video hosting services, including dotSub and dailymotion but not (I think) Youtube, allows publishers to upload subtitles with their videos; as I have verified since, dotSub even supports importing and exporting subtitles in TimedText format.</p>
<p>But I was curious to know how to include these subtitles in a self-hosted video situation; I had little hope to find subtitles support through the classical <code>&lt;object></code> tag in HTML, but I was hoping that the new <a href="http://dev.w3.org/html5/spec/Overview.html#video"><code>&lt;video></code> element in HTML 5</a> would help solve that problem.</p>
<p>Unfortunately, it doesn&#8217;t out of the box as of the draft dated of February 12 :</p>
<blockquote cite="http://www.w3.org/TR/2009/WD-html5-20090212/#video">
<p>[&hellip;] authors are expected to provide alternative media streams and/or to embed accessibility aids (such as caption or subtitle tracks) into their media streams.</p>
</blockquote>
<p>That certainly seemed extremely suboptimal to me &#8211; having to download a whole video to access its transcript doesn&#8217;t sound like a good use of anyone bandwidth. <a href="http://blog.gingertech.net/2008/12/12/attaching-subtitles-to-html5-video/">Discussions on fixing that current state of the HTML 5 spec have apparently started</a>, and brought to my attention the work that my colleague Philippe had started to implement <a href="http://www.w3.org/2008/12/dfxp-testsuite/web-framework/HTML5_player.js">a JavaScript-based TimedText player</a> for HTML 5.</p>
<p>This was exactly what I needed, and I thus started to play with that code to embed subtitles of my video in an HTML page.</p>
<p>And this is what got me started to look into <a href="http://people.w3.org/~dom/archives/2009/02/the-beauty-of-htmlmediaelementthe-beauty-of-htmlmediaelement/">why the new <code>&lt;video&gt;</code> element in HTML 5 is actually a game changer</a>, rather than just a nice wrapper around the existing functionalities in <code>&lt;object&gt;</code> &#8211; which is what my next blog post will look into.</p>]]></content:encoded>
			<wfw:commentRss>http://people.w3.org/~dom/archives/2009/02/synchronizing-text-and-video/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Geographical site map</title>
		<link>http://people.w3.org/~dom/archives/2006/01/geographical-site-map/</link>
		<comments>http://people.w3.org/~dom/archives/2006/01/geographical-site-map/#comments</comments>
		<pubDate>Wed, 18 Jan 2006 16:58:57 +0000</pubDate>
		<dc:creator>Dom</dc:creator>
				<category><![CDATA[XSLT]]></category>

		<guid isPermaLink="false">http://people.w3.org/~dom/archives/2006/01/geographical-site-map/</guid>
		<description><![CDATA[Inspired by MaxF&#8217;s recent cool photomap hack, I wrote my own version of the tool that works in a more general case: basically, you feed it with an XHTML page, and it will spider any other page linked from that page and extracts GeoURL data from them, and put them in a javascript file. When [...]]]></description>
			<content:encoded><![CDATA[<p>Inspired by <a href="http://people.w3.org/maxf/blog/?p=3">MaxF&#8217;s recent cool photomap hack</a>, I wrote <a href="/~dom/2006/photomap.xsl">my own version of the tool</a> that works in a more general case: basically, you feed it with an XHTML page, and it will spider any other page linked from that page and extracts <a href="http://geourl.org/add.html">GeoURL data</a> from them, and put them in a javascript file. When this javascript is called from an HTML page, it inserts a Google Map with markers for the various pages.</p>
<p>See how it <a href="http://www.nimbustier.net/voyages/photomap.html">looks on my personal site</a>:<br /><img src="/~dom/2006/photomap-screenshot.png" alt="Screenshot of a page rendered using this tool" /></p>
<p>I have preferred generating a separate Javascript file, since having embedded HTML in a javascript embedded in HTML does not play nicely with validation; also, it makes it easier to use in different pages.</p>
<p>The embedding HTML page needs to have the following elements:</p>
<ul>
<li>in the <code>&lt;head></code> element:
<pre><code>
&lt;script src="http://maps.google.com/maps?file=api&#038;v=1&#038;key=<var>GoogleMapAPIKey</var>"
type="text/javascript">&lt;/script>
</code></pre>
<p> where <var>GoogleMapAPIKey</var> is to be replaced by <a href="http://www.google.com/apis/maps/">your own API Key</a></li>
<li>in the <code>&lt;body></code> element, at the place where the map must be generated:
<pre><code>
&lt;div id="map"
  style="width: 900px; height: 700px">
&lt;/div>
&lt;script type="text/javascript"
   src="<var>map.js</var>"&#038;;gt
&lt;/script>
</code></pre>
<p> where <var>map.js</var> is to be replaced by the location of the script generated by the XSLT.</li>
<p>If you&#8217;re already marking up your pages with GeoURL and if your pages are in XHTML, a whole web site can be mapped very quickly on a Google Map using this system.</p>
<p>A few notes:</p>
<ul>
<li>the current style sheet deliberately doesn&#8217;t crawl URIs over the network; so it will only work on a local copy of the site to be mapped</li>
<li>it doesn&#8217;t work on pages served as application/xhtml+xml; I don&#8217;t know if Google Map as a whole does &#8211; I know that Google AdSense <a href="http://www.cssplay.co.uk/menu/adsense.html">doesn&#8217;t work by default with this MIME type</a></li>
<li>ideally, there would be a way to generate a non-javascript version of the map for accessibility reasons; but there isn&#8217;t</li>
</ul>
<p><ins datetime="2006-01-19"><strong>Update</strong> (Jan 19): I&#8217;ve updated the XSLT to make it use proper DOM nodes instead of <code>document.write</code>; also, it now proposes a list of links for a given location, instead of the last link that was added for the said location.</ins></p>]]></content:encoded>
			<wfw:commentRss>http://people.w3.org/~dom/archives/2006/01/geographical-site-map/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Links annotater</title>
		<link>http://people.w3.org/~dom/archives/2005/04/links-annotater/</link>
		<comments>http://people.w3.org/~dom/archives/2005/04/links-annotater/#comments</comments>
		<pubDate>Thu, 21 Apr 2005 16:23:52 +0000</pubDate>
		<dc:creator>Dom</dc:creator>
				<category><![CDATA[QA]]></category>
		<category><![CDATA[XSLT]]></category>

		<guid isPermaLink="false">http://people.w3.org/~dom/archives/2005/04/links-annotater/</guid>
		<description><![CDATA[The Web is a formidable tool to host documentation; nothing new about that.
But documentation, be it on the Web or not, tend to rot when not maintained. Nothing new about that either.
While documentation maintenance is probably better addressed at the social-engineering level, there are tools that can help manage it; namely, a few weeks ago, [...]]]></description>
			<content:encoded><![CDATA[<p>The Web is a formidable tool to host documentation; nothing new about that.</p>
<p>But documentation, be it on the Web or not, tend to rot when not maintained. Nothing new about that either.</p>
<p>While documentation maintenance is probably better addressed at the social-engineering level, there are tools that can help manage it; namely, a few weeks ago, W3C Systems Team went through the process of cleaning up our internal documentation on processes, tools, services, configurations, etc. that sits on our Team-only Web site, but is too rarely kept up to date with the latest developments.</p>
<p>To help filling the gap, I wrote a <a href="http://www.w3.org/2005/02/annotate-last-modification">small XSLT style sheet to annotate links</a> with information on how recently linked pages were updated. The goal was to run it on our main documentation pages, and find pages that hadn&#8217;t been updated recently through a color code.</p>
<p>The idea behind this tool, as illustrated when <a href="http://www.w3.org/2005/02/annotate?xmlfile=http%3A%2F%2Fwww.w3.org%2FQA%2F">applied to the QA Activity home page</a> is to show the date of last modification of all pages linked from a given page. To make it easier to find the most outdated pages, it also uses a color code, from pale yellow (most recent) to red (oldest), to denote how recently the linked page was updated.</p>

<p>I think the tool is useful as is, although it could use some user interface polishing. It could also provide some ideas for new features in the <a href="http://validator.w3.org/checklink">W3C link checker</a>, or the <a href="http://www.kevinfreitas.net/extensions/linkchecker/">linkchecker Firefox extension</a>.</p>

<p>Since it reveals several interesting aspects of developing with HTTP and XSLT, I&#8217;m going to give a bit more details on how it actually works.</p>

<h3>Getting the information from HTTP</h3>
<p>The links annotations rely on the <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.29"><code>Last-Modified</code> HTTP header</a>. This header, when set, indicates <q cite="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.29">the date and time at which the origin server believes the [page] was last modified</q>.</p>
<p>In other words, the annotations only give interesting results on servers that are properly configured to send this header; for a statically-served site on Apache (as most of W3C Web site), it is just a matter of enabling <code>ExpiresActive On</code> in the configuration file (see for <a href="http://www.mnot.net/cache_docs/#IMP-SERVER">other servers</a>); it&#8217;s a bit more tricky to get it right for dynamically-generated pages, but still <a href="http://www.mnot.net/cache_docs/#IMP-SCRIPT">very much doable</a>, and directly worth it in terms of getting a better caching behavior from browsers and proxies. In any case, <a href="http://www.mnot.net/cache_docs/">Mark Nottingham&#8217;s caching tutorial</a> is one of the best references on this topic.</p>

<h3>Using XSLT to get and show the results</h3>
<p>So, now that we know where the information is, how are we going to access it? Getting access to HTTP headers is usually pretty easy in most HTTP-aware programming languages, but this is not the case by default with XSLT. (Why using XSLT then? because it so damn easy to use for parsing XML!)</p>
<p>To circumvent this limitation, I&#8217;m using on the most powerful way to extend XSLT in my opinion: the <code>document()</code> function combined with HTTP GET.</p>
<p>Indeed, XSLT allows to import and process the content from other documents than the one being processed with the <code>document()</code> function that takes the URI of these other documents as parameter. As this URI can be created at runtime (i.e. can be the result of another expression), you can actually get the content of resources based on the main document being processed.</p>
<p>In this case, we&#8217;re going to use this flexibility to get the HTTP headers from a <a href="http://cgi.w3.org/cgi-bin/headers?">completely separate tool</a>: as this tool uses HTTP GET to pass its parameter <code>url</code> and outputs the results in XHTML, one can parse  in XSLT the results for any HTTP URI by concatenating the said URI to <code>http://cgi.w3.org/cgi-bin/headers?url=</code>, namely with <code>document(concat('http://cgi.w3.org/cgi-bin/headers?url=',<var>$uri</var>))</code>.</p>

<p>Since this capacity to access HTTP headers has been useful to me more than once, I created a while back a <a href="http://www.w3.org/2001/11/http-head.xsl">small XSLT interface</a> that defines a set of named templates that I can import in new style sheets with <code>&lt;xsl:import></code>. Over the years, I have gathered <a href="http://www.w3.org/2001/10/xslt-toolbox">a few of these interfaces</a> that allow to considerably reduce the development of small or complex tools.</p>

<p>In practice, the code of the main XSLT does the following processing:</p>
<ol>
<li><p>by default, we keep everything as is through the identity transformation:</p>
<pre><code>&lt;!-- default: Identity Transformation -->
&lt;xsl:template match="*|@*|comment()|text()">
   &lt;xsl:copy>
     &lt;xsl:apply-templates select="*|@*|comment()|text()"/>
   &lt;/xsl:copy>
 &lt;/xsl:template></code></pre>
<p>(there are simpler ways to specify the identity transformation, but it would get caught on one of the bugs of the above mentioned XSLT servlet)</p></li>
<li><p>since we&#8217;re interested on visible links to HTTP resources, we&#8217;re going to process each of these through:</p>
<pre><code>&lt;xsl:template match="html:a[@href and (starts-with(@href,'http:') or not(contains(@href,':')))]"></code></pre>
<p>(i.e. applies what follows to all the <code>a</code> elements that have an <code>href</code> attribute which either starts with <code>http:</code> or doesn&#8217;t contain a URI scheme (and thus, are relative HTTP URIs)</p></li>
<li>then, we transform the URI of the link into an absolute URI (using <a href="http://www.w3.org/2000/07/uri43/uri.xsl">another imported XSLT</a>)</li>
<li>we check its HTTP status code to detect broken links, and for valid links, we actually extract the <code>Last-Modification</code> header</li>
<li>the rest of the code only deals with displaying only the interesting information of this header (namely, the date, since the day of the week and the time are less likely to be useful) and associating it with an HTML class, so as to facilitate the color coding through CSS</li>
</ol>
<p>Et voilà for the processing!</p>
<p>The last bit is to provide an HTML interface to it; to that end, I&#8217;m using a CSS trick on the XSLT style sheet itself (trick that doesn&#8217;t work in Internet Explorer, last time I heard). The XHTML interface is directly embedded inside the XSLT, as a child of the root element, where XSLT processors are not required to complain about foreign elements. Then, with a <a href="http://www.w3.org/2002/02/style-xsl.css">CSS style sheet</a>, I make sure the content of the templates is not displayed, and that way by default, browsers will only display the embedded XHTML.</p>
<p>This wouldn&#8217;t prevent from doing a separate HTML interface as I have come to do for other tools; but I have always liked the idea (stolen from <a href="http://www.w3.org/People/Connolly/">DanC</a> as far as I can remember) of embedding HTML in my XSLT if only as a way to document what they are for, and how to use them.</p>
]]></content:encoded>
			<wfw:commentRss>http://people.w3.org/~dom/archives/2005/04/links-annotater/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
