Don’t call me DOM

13 February 2009

The beauty of HTMLMediaElement

So, while exploring the world of Web video, after having successfully transcribed a one hour long video of one my presentations, and turned that transcription into an HTML 5 video with subtitles, I started to look in more details as to what HTML 5 brought to the table that made this synchronization possible.

The rather obvious change that HTML 5 brings to the table is the HTMLMediaElement DOM Interface, and in particular the currentTime property, which at any time reflects the part of the media content that is played.

This means that it allows to synchronize any part of your HTML page with the video, as well as navigate through the video by setting the property to the section of the video you want to play!

And since I had already gathered a lot of timing information in the transcript of the video, extracting meaningful timings of the various sequences of the video was again only an XSLT style sheet away, provided I added relevant metadata in the transcription: typically, identifying subsections as <div> in the timedtext transcript, with a ttm:title set (which I achieved directly through my transcribing tool, Transcriber, that has all the needed interfaces to set these metadata).

And so I wrote that XSLT, added some further out-of-band metadata linking to slides and additional notes that I wanted to include in my presentation viewer (more details on the process involved are available).

The fact that I couldn’t embed these additional data in TimedText is actually quite disappointing – that a Web format should be developed without any way to add hyperlinks seems quite wrong! Generally speaking, it’s not clear to me that timedtext should be anything else than a set of additional timing attributes on top of XHTML – but I can’t claim that I have explored that space sufficiently to give much credit to that assertion.

Given that these metadata were not stored in the TimedText file, I ended up having them embedded in the resulting HTML page; it occured to me that the best combination to store them there was to use the extremely experimental media fragment syntax within an RDFa description of the table of content, e.g.:

<ul class="toc">
            <li about=",00:01:28.432">
               <a target="slides" rel="foaf:depiction"  

This essentially annotates a given section of the video (#t=00:00:44.209,00:01:28.432 meaning between 44.209 seconds after the start of the video and 1 minute 28.432 second after the start) with a title and an illustration (in this case, the accompanying slide) – I chose foaf:depiction as a property, but it probably isn’t the best match – I’m hoping thet Media Annotations Working Group will come up with a useful ontology that could be used in these types of contexts.

These annotations are then parsed by a small Javascript layer (built on top of JQuery) which reproduces most of what the TimedText javascript player does, but in a much less verbose way… – another incitation for hoping that timedtext was really just XHTML.

The resulting presentation viewer allows to navigate through the video, with synchronized slides, notes and subtitles, provided your browser supports the HTMLMediaElement interface, as Firefox 3.1 does:

(also available as Ogg/Theora video with a Timed Text transcript.)

It also carries a set of RDF annotations to the video itself.

(I discovered only today that apparently Ian Hickson made a very similar demonstration a few months ago)

I must confess that I’m not quite sure that the accessibility of the resulting page is great – it uses the <object> element to load external pages (the slides and notes), while it should probably include their content automatically through AJAX, with a pinch of WAI ARIA to alert of pages updates.

The AJAX inclusion of content would be much facilitated by scoped style sheets.

My conclusion from this exploration is that clearly the new HTMLMediaInterface DOM interface is of great importance to really bring video (and similarly audio) into the Web browser; I can see how it could be improved to make it much easier to create synchronization effects:

  • using some sort of timer callback interface with a begin and end period – currently, you have to use the generic setInterval function that polls every tenth of second to check whether you are in a period of the video where something should happen; ideally, you would just say video.setTimer(callback_function,begin,end), and callback_function would be called each time the video enters the period of time between begin and end;Err… it seems that’s exactly what addCueRange is about.… I guess what I really need is having it implemented :)
  • ensuring that the HTMLMediaInterface is applied to any element where a time-based animation is used: being able to use it on SVG animations, Flash animations, and maybe even animated GIF (!) sounds as useful as on videos and audios; maybe this “just” means that the <video> element implementations should support image/svg+xml and image/gif as acceptable media types?
  • it seems really backward that any JavaScript layer be required at all to run synchronized subtitles, and the <video;gt; element should clearly support linking media content and their transcript in a uniform way;
  • it would be really neat if the media fragment URIs could be used again directly to go to a particular section of a video included in the page, without the Javascript layer.

2 Responses to “The beauty of HTMLMediaElement”

  1. karl Says:

    >that a Web format should be developed without any way to add hyperlinks seems quite wrong!

    One of my criteria for knowing if a format is a Web format is specifically: Does the format have hyperlinking capabilities. If not it can’t play in the Web. Example: PNG, JPEG are not Web format but format used on the Web.

  2. Mark Birbeck Says:

    Hi Doinique,

    I’ve just come across your excellent post, because it has been referred to in an RDFa discussion. I have nothing to add to any of that side of your post, other than it’s a very interesting use-case.

    But what I did want to quickly flag up was that the media fragment syntax you referred to might be better constructed using the XPointer framework.

    There are many ways it could be done, but here is just one example of how your URL might look:,tEnd=00:01:28.432)

    It’s only a minor change, to push the parameters inside the parentheses, but since there is an increasing need to pass information to the client application, rather than the server, I’m finding that XPointer [1] gives us a very useful way to do this [2, 3].



    [2] “Passing run-time parameters to internet applications” (
    [3] “Passing run-time parameters to UBX Viewer, via the URL” (

    Mark Birbeck

Picture of Dominique Hazael-MassieuxDominique Hazaël-Massieux ( is part of the World Wide Web Consortium (W3C) Staff; his interests cover a number of Web technologies, as well as the usage of open source software in a distributed work environment.