Don’t call me DOM

13 February 2009

Synchronizing text and video

After having visited the land of transcription as my first stop in the world of Web video, the next logical step was to look into how this wonderful transcription of my video could be actually shown along with the video.

Transcriber, the tool I used to generate the captions of the video, saves the transcription into its own XML format:

<Episode program="ParisWeb 2007 - Les Bonnes Pratiques du Web Mobile" 
  air_date="2007-11-16">
<Trans scribe="Dominique Hazael-Massieux" 
  audio_filename="parisweb" version="5" version_date="090210" xml:lang="fr">
  <Speakers>
    <Speaker id="spk1" name="Stéphane Deschamps" 
      check="no" type="male" dialect="native" accent="French" scope="local"/>
    </Speaker>
  </Speakers>
  <Section type="report" startTime="0" endTime="44.209">
    <Turn startTime="0" endTime="19.933" speaker="spk1" mode="planned">
      <Sync time="0"/>
      Y'a quelque chose auquel on croit beaucoup à ParisWeb,
      <Sync time="3.458"/>
      c'est "les standards, c'est bon, mangez-en",
      <Sync time="6.553"/>
      c'est pour ça que cette association existe
    </Turn>
  </Section>
</Trans>

It offers the possibility to export it in a variety of other formats (including HTML), but for sake of exploring one of the technologies in development in W3C for that precise use-case, Timed Text DFXP, I started to look into transforming their XML format into Timed Text.

Another motivation for that was that Subtitle Editor, the other tool I had looked at, is able to import and export timed text data; this also meant that very same tool would allow me to quickly visualize the subtitles superimposed to the video, one of the advantages that it had over Transcriber.

It turned out (unsurprisingly, I suppose) that the conversion between the two formats was really quite easy through an XSLT style sheet; the main structural difference between the two formats is that Transcriber notes break points as XML elements (<Sync> in the example above), while TimedText wraps the transcripted content into elements (<span> or <p>).

So, now that I had a Timed Text version of my transcription, how did that help me putting the transcripted video on the Web?

Looking quickly on the Web, it seems that some Video hosting services, including dotSub and dailymotion but not (I think) Youtube, allows publishers to upload subtitles with their videos; as I have verified since, dotSub even supports importing and exporting subtitles in TimedText format.

But I was curious to know how to include these subtitles in a self-hosted video situation; I had little hope to find subtitles support through the classical <object> tag in HTML, but I was hoping that the new <video> element in HTML 5 would help solve that problem.

Unfortunately, it doesn’t out of the box as of the draft dated of February 12 :

[…] authors are expected to provide alternative media streams and/or to embed accessibility aids (such as caption or subtitle tracks) into their media streams.

That certainly seemed extremely suboptimal to me – having to download a whole video to access its transcript doesn’t sound like a good use of anyone bandwidth. Discussions on fixing that current state of the HTML 5 spec have apparently started, and brought to my attention the work that my colleague Philippe had started to implement a JavaScript-based TimedText player for HTML 5.

This was exactly what I needed, and I thus started to play with that code to embed subtitles of my video in an HTML page.

And this is what got me started to look into why the new <video> element in HTML 5 is actually a game changer, rather than just a nice wrapper around the existing functionalities in <object> – which is what my next blog post will look into.

One Response to “Synchronizing text and video”

  1. Some Advantages Of Court Reporting Video Services. | Wisdom Health Prosperity Says:

    […] Search engines, social networks, blogs and forums – all this will assist you to solve many issues.As you know for a long time the official standard for recording legal proceedings was a stenographic…transcripts are still considered to be rather an accurate way of chronicling legal proceedings. But […]

Picture of Dominique Hazael-MassieuxDominique Hazaël-Massieux (dom@w3.org) is part of the World Wide Web Consortium (W3C) Staff; his interests cover a number of Web technologies, as well as the usage of open source software in a distributed work environment.