So, while exploring the world of Web video, after having successfully transcribed a one hour long video of one my presentations, and turned that transcription into an HTML 5 video with subtitles, I started to look in more details as to what HTML 5 brought to the table that made this synchronization possible.
The rather obvious change that HTML 5 brings to the table is the
HTMLMediaElement DOM Interface, and in particular the
currentTime property, which at any time reflects the part of the media content that is played.
This means that it allows to synchronize any part of your HTML page with the video, as well as navigate through the video by setting the property to the section of the video you want to play!
And since I had already gathered a lot of timing information in the transcript of the video, extracting meaningful timings of the various sequences of the video was again only an XSLT style sheet away, provided I added relevant metadata in the transcription: typically, identifying subsections as
<div> in the timedtext transcript, with a
ttm:title set (which I achieved directly through my transcribing tool, Transcriber, that has all the needed interfaces to set these metadata).
And so I wrote that XSLT, added some further out-of-band metadata linking to slides and additional notes that I wanted to include in my presentation viewer (more details on the process involved are available).
The fact that I couldn’t embed these additional data in TimedText is actually quite disappointing – that a Web format should be developed without any way to add hyperlinks seems quite wrong! Generally speaking, it’s not clear to me that timedtext should be anything else than a set of additional timing attributes on top of XHTML – but I can’t claim that I have explored that space sufficiently to give much credit to that assertion.
Given that these metadata were not stored in the TimedText file, I ended up having them embedded in the resulting HTML page; it occured to me that the best combination to store them there was to use the extremely experimental media fragment syntax within an RDFa description of the table of content, e.g.:
<ul class="toc"> <li about="http://media.w3.org/2007/11/parisweb-dom.ogv#t=00:00:44.209,00:01:28.432"> <a target="slides" rel="foaf:depiction" property="dc:title" href="http://www.w3.org/2007/Talks/11-parisweb/slide-1.html"> Introduction </a> </li> </ul>
This essentially annotates a given section of the video (
#t=00:00:44.209,00:01:28.432 meaning between 44.209 seconds after the start of the video and 1 minute 28.432 second after the start) with a title and an illustration (in this case, the accompanying slide) – I chose
foaf:depiction as a property, but it probably isn’t the best match – I’m hoping thet Media Annotations Working Group will come up with a useful ontology that could be used in these types of contexts.
The resulting presentation viewer allows to navigate through the video, with synchronized slides, notes and subtitles, provided your browser supports the
HTMLMediaElement interface, as Firefox 3.1 does:
It also carries a set of RDF annotations to the video itself.
(I discovered only today that apparently Ian Hickson made a very similar demonstration a few months ago)
I must confess that I’m not quite sure that the accessibility of the resulting page is great – it uses the <object> element to load external pages (the slides and notes), while it should probably include their content automatically through AJAX, with a pinch of WAI ARIA to alert of pages updates.
The AJAX inclusion of content would be much facilitated by scoped style sheets.
My conclusion from this exploration is that clearly the new
HTMLMediaInterface DOM interface is of great importance to really bring video (and similarly audio) into the Web browser; I can see how it could be improved to make it much easier to create synchronization effects:
using some sort of timer callback interface with a begin and end period – currently, you have to use the generic
setIntervalfunction that polls every tenth of second to check whether you are in a period of the video where something should happen; ideally, you would just say
callback_functionwould be called each time the video enters the period of time between
- ensuring that the HTMLMediaInterface is applied to any element where a time-based animation is used: being able to use it on SVG animations, Flash animations, and maybe even animated GIF (!) sounds as useful as on videos and audios; maybe this “just” means that the
<video>element implementations should support
image/gifas acceptable media types?
<video;gt;element should clearly support linking media content and their transcript in a uniform way;