W3C Team bloghtml 5: doctype to version

At a regular pace, there are discussions about the need of versioning for HTML 5. The issue breaks down around a few points including identification of the language itself for different kind of user agents, and parser libraries. A while ago, I had published an overview of different methods to identify the HTML language.

Michael A. Puls II posted an interesting mail about doctype this morning. He's writing

In Opera, goto opera:config#CompatMode%20Override and set the CompatMode Override to 2. This will make Opera always use standards mode for HTML even if there isn't a doctype.

So,

<html>
   <head>
      <meta charset="utf-8">
      <title>HTML5 document</title>
  </head>
  <body>
      <p>This is an HTML5 document.</p>
  </body>
</html>

would be rendered in standards mode.

He's warning people saying that:

What you will see is that on some sites, forcing standards mode breaks pages.

But add something interesting to me. (emphasis is mine)

Now, for HTML5 pages specifically, browsers could always force standards mode, but without the HTML5 doctype, there would have to be something like <html version="5"> so browsers would know when to force standards mode. However, <!DOCTYPE html> already accomplishes that behavior now.

It would help the HTML validators, syntax checkers to identify the intent of the author. It would also make it easier for writing converters. It would also ease people who wants to jump from HTML to XML back and forth.

Lachlan HuntLife in Norway

As many of you know, I arrived in Norway on October 3rd this year, almost 2 months ago. Since then, I haven’t found much time to update this blog with new content. However, I always try to post at least one post per month, so here I am, on November 30th at 23:10 UTC writing this month’s entry. With just under an hour to go, I’m rushing to get this done.

Anyway, the last two months have been exciting. Working for Opera has been really fun and I’m starting to adjust to my new life in Norway reasonably well. Although I can’t yet tell you everything, I’m working in the Core QA department, finding and analysing bugs and working on some other really cool projects.

I have, of course, continued my involvement with the W3C HTML, Web API and WAF working groups. The HTML WG recently published the first public working draft of the HTML Design Principles. This is actually a really important document, since it outlines the core principles and design aims behind HTML5. Unfortunately, we are yet to formally publish a first public working draft of the spec itself (the editors draft is always available), we are really hoping we can resolve the remaining issues sooner rather than later.

I’m currently living in an apartment, sharing with three other Opera employees. Some photos of this place have been published on my flickr account. My bedroom is a little bit small (it’s only 1.7m × 4.7m), but somehow I’ve managed to fit everything in.

One thing I’ve noticed here, which has required a fair bit of adjusment, is that almost everything is backwards here! Seriously! Here’s just a few things to which I’ve had to adjust:

  • They drive on the right instead of the left. This is a problem because I instinctively look the wrong direction before crossing the road and it’s not easy to break the habbit.
  • People generally walk on the right as well, so I have to try to remember to move right instead of left to avoid running in to people.
  • Escalators in shopping centres are ridden on the right as well.
  • While traffic lights are red, they briefly flash orange before changing to green. For those who don’t know, in Australia, the orange is only used while changing from green to red.
  • They have zebra crossings here, some with pedestrian lights! In Australia, zebra crossings mean that the cars must give way to pedestrians. Here, at the zebra crossings without lights, cars don’t always seem to stop when they see someone about to cross, only when the pedestrian is actually on the crossing. (Maybe they’re just bad drivers, I’m not sure of the rules)
  • Pedestrian lights flash green before turning red. In Australia, it’s the red that flashes before turning on fully.
  • They have both Norwegian and English TV shows here. They put Norwegian subtitles on the English shows and no subtitles on the Norwegian shows.

There’s probably more, but I can’t remember them all.

It started snowing here a few days ago, though it’s still a little warm and it melts away during the afternoon. I’m looking forward to the ski season starting sometime in the next few weeks. Although I wear warm clothes while walking to work, I generally walk around the office during the day wearing shorts and t-shirt. The office is just too warm to be wearing long pants or a jumper, yet some people are still surprised to see me like that.

As those of you who’ve been following my twitter posts would know, I recently bought myself a 17" MacBook Pro. I absolutely love it and I’m finding myself using it more and more instead of my PC. OS X Leopard rules! Windows Vista sucks. Seriously, I’ve never seen an OS so bad and unusable. It’s unbelievable how annoyed I got with it after only 5 minutes of use on a friends laptop.

Anyway, my time is almost up to get this published before the end of the day, so that’s it for now.

Justin Thorpjthorp

Hypertext Markup Language (HTML) is the language of the World Wide Web but it’s been a while since it’s been updated to where the world is today. At the W3C, there is a working group which is looking at the future of HTML.

As we move forward, there needs to be a set of principles which guides the decisions we make. This is why the HTML Working Group published a first public working draft of the HTML Design Principles.

Here’s my summary and translation…

  1. Be backwards compatible. Don’t break existing Web content.
  2. When a new feature isn’t supported by a browser, degrade gracefully.
  3. Consider what already exists before trying to make something new
  4. Look at what current best practices already exist
  5. It’s better to evolve a standard because then you don’t have to reteach and redo everything.
  6. Don’t do something for the sake of doing it.
  7. Don’t make the Web insecure
  8. Have HTML elements behave in ways that authors can depend on
  9. Be simple in creating a solution
  10. The Web is filled with tag soup.  Show how to deal with those errors.
  11. HTML should work across devices, environments, and platforms.
  12. Be publishable in the world’s languages
  13. Be accessible to people with disabilities.

Was anything missed?  The HTML Working Group needs your feedback.

Sam RubyHTML5 needs a CarterPhone

Brendan Eich: Standards often are made by insiders, established players, vendors with something to sell and so something to lose. Web standards bodies organized as pay-to-play consortia thus leave out developers and users, although vendors of course claim to represent everyone fully and fairly.  I’ve worked within such bodies and continue to try to make progress in them, but I’ve come to the conclusion that open standards need radically open standardization processes.

The W3C HTML Working Group needs a CarterPhone.  Clearly, Brendan is talking about ES4, but the issues he brings up are general.

When I was growing up, all phones I ever used were the property of AT&T.  It takes a totally unified system to make it all work. One system. AT&T.  Yes, that gave us Princess Phones, but not answering machines, fax machines, cordless phones, or computer modems.

What are the four format freedoms that correspond to the four net freedoms?

In the summer of 2000, the notion of defining RSS as a monolithic entity was put forth.  “Interesting ideas” once defined in namespaces apparently included rdf:about (now guid) and dc:date (now pubDate).  Meanwhile, some barnacles like skipDays and cloud got included.  27 months later, RSS 2.0 got namespaces, and data like dc:creator, content:encoded, and slash:comments (all which already had long histories with RSS 1.0) could now flow through that format.

Scope Issues

Dan Connolly: Of course, it would be easier to publish the spec right away if the spec took a much more conservative position on issues such as videoaudio, immediate-mode-graphics, and offline-applications-sql.

Apparently reasonable people disagree about whether these things are included in the charter.  That doesn’t mean that there aren’t good arguments to be made on both sides — in fact, that’s exactly what you would expect in a situation where reasonable people disagree.

One way to address this would be to fix the charter, though many have argued against that, fearing rats the size of lions.  Oh my.  Another way to address this would be to jettison for the moment features which don’t yet enjoy consensus — in the interest of enabling forward progress now.  Predictably that hasn’t gotten much traction yet either.

And yet talk about distributed extensibility goes nowhere.  Nearly four months later it is still on my list of things to look at; when I have time.

Why do I get the feeling like I am faced with an attitude of “It takes a totally unified system to make it all work. One system. HTML5”?

Like Brendan believes is true for ES4, I believe that what we really need here is a radically open standardization process.  The two standards are quite different, so different solutions may be in order.  In the case of HTML5, I believe that a smaller spec which focuses on two things: fixing HTML4 (includings things like well defined error recovery), and setting the basis for separate (often overlapping) groups to work on things like canvas.  No, I’m not suggesting that canvas needs to be in a namespace, but just that the rules for extending HTML be written down.

In such an organization, things could always be flowing.

Maciej Stachowiak (WebKit/Safari blog)HTML5 Media Support

Another nice feature from the HTML5 draft specification is now available in the WebKit nightly builds for Mac OS X. The new HTML5 <video> and <audio> elements add native support for embedding video and audio content in web pages. They also provide a rich scripting API for controlling playback. Adding video to a web page is almost as simple as adding an image:

<video src=sample.mov autoplay></video>

To make a button that gives the user basic playback controls you could do this: 

<script>
function playPause() {
  var myVideo = document.getElementsByTagName('video')[0];
  if (myVideo.paused)
    myVideo.play();
  else
    myVideo.pause();
}
</script>
<input type=button onclick="playPause()" value="Play/Pause">

The specification also defines a set of events that can be used to react to changes in media playback and load state. For example:

myVideo.addEventListener('ended', function () {
  alert('video playback finished')
} );

To play audio from JavaScript you can simply do this:

var audio = new Audio("song.mp3");
audio.play();

The implementation is still a work in progress and not all features (including the ‘controls’ attribute which gives native playback controls) of the specification are there yet. The current implementation supports all formats that QuickTime supports, including installed 3rd party codecs.

The example below uses the ‘poster’ attribute of the <video> element to display an initial image before the video is loaded, progress events to track loading, and play/pause/ended events to make the overlay button reflect the video’s state.

#videoelem { width: 100%; height: 100%; position: absolute; } #videocontainer { position: relative; width: 400px; height:225px;-webkit-user-select:none; -webkit-transition-duration:0.2s} .videobutton { line-height: 40pt; border: 3px solid white; -webkit-border-radius: 20px; opacity: 0.5; position: absolute; font-size: 40pt; color: white; background-color: gray; cursor: pointer; text-align: center; z-index: 1; } .videozoombutton { bottom:10px;right:10px;width:1.1em;height:1.1em;font-size:15pt; line-height: 15pt; border:2px solid white; -webkit-border-radius: 8px;} .videoloading { top: 0; bottom: 0; margin:auto; left:0; right:0; width: 7em; height: 1.2em; cursor:default;} .videofadeout { -webkit-transition: 1.5s; opacity:0; } #videocontainer:hover .videofadeout { opacity: 0.5; } .videoplay { top: 0; bottom: 0; margin:auto; left:0; right:0; width: 1.2em; height: 1.2em;}

>
+

Anne van KesterenHTML WG published HTML Design Principles

The shocking news is out, the HTML WG managed to publish something: HTML Design Principles. The other documents (among those the actual specification) are awaiting further discussion because non-responders are not ok with publishing. A new survey reveals that at least Microsoft and IBM think the HTML charter does not cover the canvas element.

Shawn MederoWhy HTML 5?

In response to the publishing of the HTML Design Principles Gary McGath asks:

What bothers me most is that the document doesn’t say anything about why there should be an HTML 5 at all.

Simon Pieters, from Opera Software, answers:

  • Specify how existing text/html documents are to be parsed and processed, so that a browser can be created from scratch in the future when the source code of concurrent browsers are lost and the world has moved on to some other format. Implementing a browser based on the SGML, HTML4 and DOM specs will not result in a browser that can render most of the Web, because they don’t reflect the Web.

  • Increase interoperability between browsers so that authors don’t have to use hacks or workarounds for things to work as intended cross-browser, without having browser vendors being forced to reverse engineer each other.

  • Have browser vendors come together and discuss things they want to implement (like video), instead of having browser vendors implement new stuff in incompatible ways (like in the IE4/NS4 era).

WHATWG blogValidator.nu Web service API

Validator.nu has had a Web service API for a while. It has not had documentation, though. This has now changed: Validator.nu Web service API docs.

Shawn MederoW3C HTML Working Group Publishes First Public Documents

The W3C HTML Working Group have published our first “public”1 documents: HTML Design Principles and HTML 5 specification. The Design Principles are like a project specification in English prose and are meant to keep us on track when discussing issues with the specification. One the thing to keep in mind when reviewing the current HTML 5 specification is that it is really targeted at user-agent implementors (read: browser manufacturers) but there is an effort in progress to address the web author’s point-of-view. If you are new to (x)HTML 5 you might want to take a look at the HTML 5 differences from HTML 4 document Anne van Kesteren put together a while ago.


  1. I say “public” because everything we’ve done (emails, IRC conversations, teleconferences, face-to-face meetings, document revisions, wiki collaboration, etc) since March 2007 is out in the open for people to look at. 

WHATWG blogValidating attribute values

Traditionally, SGML-based HTML validation has treated most attribute values as “anything goes” strings. This has meant that all kinds of bogus values have passed as valid. W3C XML Schema added a fixed set of datatypes. The spec is mostly useless for HTML5 validation, since the HTML5 microsyntaxes do not match exactly the XSD datatypes for the same concepts. XSD regular expressions were suitable for representing the syntax of a number of HTML5 microsyntaxes, though. XSD datatypes can be used from RELAX NG and Validator.nu used to XSD regular expressions for many HTML5 microsyntaxes.

The problem with using XSD regular expressions has been that they are not user-friendly. When an attribute value did not match the required regular expression, the UI told the user that the attribute value was “bad”. Nothing else.

Fortunately, unlike XSD, RELAX NG allows pluggable datatype libraries. A datatype library is a library written in a general-purpose programming language. The RELAX NG engine calls the library to check if a string conforms to a named datatype. Validator.nu has used this approach for a long time for the more complex microsyntaxes in HTML5.

I have recently made an effort to move Validator.nu away from XSD regular expressions to a more comprehensive custom datatype library. Even though as a formalism regular expressions are sufficient for many syntaxes, writing checking by hand allows more useful error messages in cases of failure. Moreover, having identifiers for the datatypes makes it possible to tell which datatype failed as opposed to the UI being able to tell that some regular expression failed under the hood. This allows the UI to pull in per-datatype advice from a wiki page.

In addition to improving the user experience with previously supported microsyntaxes such as integers, I have implemented support for previously unsupported microsyntaxes such as MIME types and Media Queries.

There is still work to do. For example, the syntaxes for accept-charset and WF2 type=email are not done. data: and mailto: IRIs are not properly validated yet. The syntaxes for image map coordinates still use XSD regular expressions. The advice on the wiki page is far from complete. (You can help!)

For the parts already implemented, please try the new features out and let me know what needs improvement.

Maciej Stachowiak (WebKit/Safari blog)Ten New Things in WebKit 3

Lately we’ve been talking about a lot of great new features in the latest development trunk of WebKit - features like web fonts, client-side database storage, CSS transforms and CSS animation. These features will likely make it to an official release someday. But I’d like to take a step back and talk about some older features, namely all the great stuff in our recent stable release.

Apple recently released Mac OS X 10.5 “Leopard”, including Safari 3. The latest Safari is also included in Mac OS X 10.4.11, the latest update to Tiger. A corresponding version is available as the latest Safari for Windows Beta, including the new features and lots of stability and usability improvements.

Apple’s site can tell you a lot about the new end-user features of Safari 3. But a lot of the goodness is on the inside, in the WebKit engine that powers Safari. Here’s a list of ten of the most exciting engine enhancements since the Safari 2 version of WebKit, with lots of details and demos. These features are all included in the WebKit that comes with Safari 3 - you don’t have to download nightlies or anything else to get them.

1. Enhanced Rich Text Editing

As you browse the web with a WebKit 3 based browser, you will get a complete and functional rich text editing experience on the new read-write web. Here’s a sweet demo of our improved editing support, just click the text and editing controls appear.

Specifically, we have worked together with developers of RTE libraries and applications to improve compatibility. WebKit 3 fixes many bugs, and supports additional text editing features like links and lists. We now have support from web applications like WordPress, Google Docs, GMail, Blogger, and many more. We’ve also improved editing to support libraries like TinyMCE and FCKeditor. We expect even more web apps and toolkits to add support over time.

2. Faster JavaScript and DOM

We have greatly improved the speed of JavaScript and DOM operations, both critical to the performance of today’s rich web applications. You can see this on a number of benchmarks. To gather the results below, I tested on a MacBook Pro (2 GHz Core Duo, 1 GB RAM). For the WebKit 2 results, I used Safari 2.0.4 on Mac OS X 10.4 Tiger. For the WebKit 3 results, I used Safari 3.0.4 on Mac OS X 10.5 Leopard.

  • i-Bench JavaScript Processing - The primary benchmark that Apple marketing has used is the JavaScript i-Bench. While you can download it yourself, it’s a bit of a pain to set up. Most of the other benchmarks listed below are easier to run yourself, but are not as realistic and comprehensive in their coverage.
    WebKit 2 - 1.99 sec
    WebKit 3 - 0.87 sec
    WebKit 3 is 2.3 times as fast!
  • Celtic Kane Javascript Speed Test 2007 - This popular benchmark is easy to try in the browser and covers a variety of JavaScript and DOM processing tasks.
    WebKit 2 - 1276 ms
    WebKit 3 - 624 ms
    WebKit 3 is 2 times as fast!
  • pentestmonkey MD5 test - This test times various cryptographic checksums coded in pure JavaScript. Run it here.. I’m reporting only the MD5 numbers - the other changes are similar.
    WebKit 2 - 8.352 sec
    WebKit 3 - 3.794 sec
    WebKit 3 is 2.2 times as fast!
  • JavaScript Raytracer - The full mode of this JavaScript Ray Tracer is a test of many parts of the browser including JavaScript, DOM and layout.
    WebKit 2 - 853.594 sec
    WebKit 3 - 48.48 sec
    WebKit 3 is 17.6 times as fast!

If you try other JavaScript and DOM benchmarks on the web, you’ll see similar results - speedups of 2x or more. These are speedups you will really feel on advanced web applications.

3. Faster Page Loading

WebKit 3 also offers significantly improved raw page loading speed. Unfortunately it’s hard to find good benchmarks in this area. The best we know of is the HTML i-Bench which is a pain for the casual user to set up, but which is based on real web content.

Some have argued that page loading benchmarks are unfair because browsers dispatch the load before painting, and Safari will sometimes even do it before the first layout. But the HTML i-Bench is one of the few tests to factor this out - it forces a layout and scrolls to ensure a paint. Here’s the numbers:

WebKit 2 - 2.95 sec
WebKit 3 - 2.06 sec
WebKit 3 is 1.4 times as fast!

In addition, independent researchers confirm that Safari 3’s page loading is really fast.

4. SVG

WebKit 3 features a major new technology - SVG (Scalable Vector Graphics). SVG is an XML markup language for graphics that allows rich interaction and which can be mixed directly with XHTML. Here’s some whizzy demos:

We haven’t profiled and optimized SVG quite as much as the rest of the engine, but early tests seem to indicate that it already has blazing performance. Look for this exciting new technology to see even more use on the web over time, now that it is supported by WebKit, the Gecko engine inside Firefox, and the Presto engine inside Opera.

5. XPath

Another major brand new technology in WebKit 3 is XPath, the XML Path Language. XPath is a W3C standard query language that lets web developers efficiently find particular elements in the document. Since XPath is a programming language, it’s hard to show a pretty demo, but this tutorial goes in depth and has a few examples. XPath is used in AJAX toolkits like TIBCO General Interface, and can be used by CSS query engines for improved performance, as in dojo.query.

6. New and Improved XML Technologies

In addition to the big new features of XPath and XVG, we have lots of new and improved XML technologies:

  • The XSLTProcessor JavaScript API for XSLT, and many XSLT fixes and improvements including support for external entities.
  • The DOMParser API.
  • The XMLSerializer API.
  • Incremental rendering support for XML.
  • Proper support for named character references in XHTML.
  • Much more complete and compatible XMLHttpRequest, including support for event listeners, incremental updates for persistent server connections, parsing of more XML MIME types, support for more HTTP methods.

7. Styleable Form Controls

WebKit 3 introduces the ability to customize the look of form controls with CSS. We still use standard looking native form when no custom styles are applied, but we have the ability to customize the look to better support sites with a strong visual identity. Here’s a few simple examples:

Here’s some older, more advanced examples for styleable text fields. On other sites, you can find demo pages for styling all sorts of form controls.

8. Advanced CSS Styling

We have added many advanced CSS features that let content authors make better-looking sites with less effort. These include experimental WebKit features or early implementations of CSS3. Here’s a quick demo of some of them (you’ll only see the fancy stuff with a WebKit 3 based browser):

Text-stroke and text-shadow

WebKit supports multiple columns. This is a test of multiple columns, so it should really lay out in two columns. Multi-column layout is a CSS3 module. And hey, as an added bonus, why not use text-stroke/fill (WebKit extensions) and text-shadow? Those make for a nice fancy heading. And while we’re at it, there’s also border-radius and box-shadow for box decorations.

In addition to the features shown here, many more CSS 2.1, CSS 3, and WebKit experimental features are included. We support CSS Media queries, and lots of background improvements like background-origin and background-clip, multiple backgrounds (since Safari 2, but still only supported by WebKit), box-sizing and more. Another cool new feature is border-image, which lets you make resizable control backgrounds using a single image - there’s some demos in this sample code for the iPhone in the buttons section.

See this Safari CSS reference document for a complete list.

9. Reduced Memory Use

The latest stable WebKit has enabled significantly reduced memory use, compared to the Safari 2 version. We have made many kinds of improvements. Pages containing large amounts of text are stored more efficiently. JavaScript code generates smaller data structures. And most significant of all, we’ve revamped the way we handle the memory cache. The cache is now much better at holding the data that’s truly critical for faster page loading, but less of the data that can easily be recomputed, like decoded image data.

Memory use is something that is notoriously hard to measure. The browser has many caches, and many sites on the live web server. The best way I could find to measure repeatably was by looking at memory use after running through the HTML i-Bench, but your results on other sites may vary. Here is what I saw:

WebKit 2 - 26.7M RPRVT memory
WebKit 3 - 23M RPRVT memory
WebKit 3 uses 14% less memory!

Improving memory use will remain an important focus for future releases.

10. Web Developer Tools

One of the best WebKit improvements is the availability of Web Developer tools, the Web Inspector and the Drosera JavaScript debugger. I can’t really describe these better than the original blog posts, so here’s some screenshots and links to the original posts:

Web Inspector:


Web Inspector screenshot

Drosera:



Conclusion

So that’s it, ten huge new features in WebKit 3. Grab Mac OS X Leopard, the 10.4.11 update to Mac OS X Tiger, Safari 3.0.4 Beta for Windows, or your favorite other WebKit-based browser to check them out for yourself, along with thousands of smaller features and bug fixes.

Justin ThorpLinks for 2007-11-23 [del.icio.us]

WHATWG bloghtml5lib 0.10 Released

html5lib 0.10 is now available for your HTML-parsing pleasure.

html5lib is an implementation of the HTML 5 parsing algorithm, available in both Python and Ruby flavours. The HTML 5 algorithm is based on reverse engineering the behaviour of popular web browsers and so is compatible with the myriad of broken HTML encountered on the web.

Features in 0.10:

  • Parse HTML to a variety of common tree formats including minidom, ElementTree and BeautifulSoup (Python), and hpricot and rexml (Ruby) as well as a custom simpletree format
  • Automatic detection of character encoding from meta elements and using frequency analysis (if chardet is available)
  • Sanitization of markup and CSS using a whitelist approach
  • Liberal XML parsing
  • Conversion of trees to event streams and Genshi-inspired filters for those streams
  • Flexible serializers for writing out streams in HTML and XHTML-syntax
  • A prototype HTML 5 validator
  • A large test suite

Download:

WHATWG blogLa loterie du longdesc

This is a French translation of this article : The longdesc lottery

Parlons maintenant de l’attribut longdesc. En HTML 4, il est défini comme un pointeur vers une longue description, pour une image complexe. Tout le monde peut apprendre à écrire une longue description pertinente. Il n’y a qu’un seul problème : dans les faits, personne ne s’en soucie, et celui qui s’en soucie se trompe.

Maintenant, quantifions le phénomène. En Août 2007, Ian Hickson a analysé un échantillon d’un milliard d’éléments <img> dans l’index de Google. Approximativement 1,3 millions (soit 0,13 %) avaient efectivement un attribut longdesc. Eh bien direz-vous, c’est normal : toutes les images n’ont pas besoin d’un tel attribut. Et vous auriez raison. Mais sans se soucier de savoir s’il est nécessaire ou pas, longdesc n’est pas utilisé si souvent : un seul pour une centaine d’image.

Maintenant, voyons dans combien de cas l’attribut longdesc est utilisé judicieusement. Bien sûr, ce critère est plus subjectif, mais on peut tout de même relever les erreurs les plus évidentes. Des 1,3 millions d’images qui avaient un attribut longdesc, ôtons celles ou l’attribut longdesc

  • est vide
  • n’est pas une url valide
  • pointe vers l’image elle-même (c’est à dire la même url que l’attribut src)
  • pointe vers la page sur laquelle on se trouve déjà
  • pointe vers la racine d’un autre domaine
  • est le même que l’attribut href du lien qui entoure l’image (le longdesc est redondant, puisqu’il est possible de suivre le lien de l’image à la place)

Cela élimine purement et simplement 1,25 million (environ 96%) d’images du lot. Ce n’est pas 96% de toutes les images présentes sur le web - c’est 96% des 0.13% des images qui incluaient un attribut longdesc en première instance. Et lorsqu’on regarde plus attentivement aux 50 000 images restantes, (soit 4% de 1,3 million) les résultats empirent encore : des liens vers d’autres images, des liens brisés, des liens vers une description d’une ligne identiques à l’attribut alt, et des liens vers une page qui vous indique les dimensions de l’image, mais pas son contenu (Wikipedia, c’est bien de toi dont je parle). Si on extrapole à 1,3 million d’image, les 50 000 se réduisent à 10 000. Cela signifie que moins de 1% des images qui fournissent un attribut longdesc sont réellement utiles. Pas plus d’une image sur 100 est correcte (sur les 1% qui se donnent la peine d’essayer).

Parallèlement, les même personnes qui souhaitaient conserver l’attribut longdesc ont récemment réalisé quelques expériences de test par les utilisateurs. C’est-à-dire qu’ils ont testé avec quelle précision une vraie personne aveugle avec un vrai lecteur d’écran pouvait lire une vraie page web. Il s’est avéré que le sujet ne connaissait pas l’existence de l’attribut longdesc avant que le testeur n’en fasse mention. Peut-on vraiment lui en vouloir ? 99.87% des images qu’il avait rencontré n’avaient même pas d’attribut longdesc. Même s’il en avait eu connaissance, et qu’il en avait rencontré une par hasard, il restait tout même 99% de chances que les informations fournies ne présentent aucun intérêt. Il a ainsi plus de chance de gagner à la loterie.

Je ne dis pas qu’il n’y a pas là un réel problème qu’il faudrait résoudre. Il y en a bel et bien un. Les gens peuvent publier des images complexes qui nécessitent des alternatives textuelles tout aussi complexes. Les diagrammes, graphiques et autres photos très détaillées. Mais peu importe. « une image vaut mieux qu’un long discours » et tout ça … L’attribut longdesc est, théoriquement, une solution à ce problème. Mais cela ne veut pas dire pour autant qu’il s’agisse d’une bonne solution et encore moins de la seule solution. Cela fait 10 ans maintenant que nous vivons avec longdesc et je peux vous l’assurer : cela ne fonctionne pas. Ainsi, pourrions nous éviter la levée de boucliers et commencer à parler d’une meilleure solution ?

WHATWG blogNot that 80

In his post Parroting Pareto, Jeremy Keith says that HTML5 needs to cover cases that “fall far outside the 80%-90% curve”, in particular accessibility. “By their very nature, accessibility concerns are not going to affect the majority of users. That doesn’t mean they can be dismissed.”

My understanding of applying the 80/20 rule to the design of HTML5 is that the “80” isn’t about 80% of users. It is about (proverbial) 80% of authoring cases. That is, it doesn’t make sense to support (for accessibility or otherwise) things that people would only publish very rarely if engineering support for the rarity would complicate the implementation of the language significantly.

See Hixie’s email to the HTML WG on the topic.

WHATWG blogThe longdesc lottery

Let’s talk about the longdesc attribute. In HTML 4, it’s defined as a pointer to a long description for a complex image. Anyone can learn how to write a good long description. There’s only one problem: virtually no one bothers, and virtually everyone who does bother gets it wrong.

Let’s quantify that. In August 2007, Ian Hickson analyzed a sample of 1 billion <img> elements in Google’s index. Approximately 1.3 million (0.13%) had a longdesc attribute. That’s OK, you say, not every image needs a longdesc attribute. And you would be right. But regardless of whether it’s needed or not, it’s not being used that often: just over one in a thousand images.

Now let’s look at how often the longdesc attribute is actually used correctly. Of course this is a more subjective question, but we can spot some obvious errors. Out of those 1.3 million images with a longdesc attribute, let’s subtract the ones where the longdesc attribute…

  • is blank
  • is not a valid URL
  • points to the image itself (i.e. the same URL as the src attribute)
  • points to the page you’re already on
  • points to the root level of another domain
  • is the same as a parent link’s href attribute (i.e. the longdesc is redundant because you could just follow the image link instead)

That knocks out a whopping 1.25 million (about 96%) right off the bat. That’s not 96% of all the images on the web; that’s 96% of the 0.13% of images that included a longdesc attribute in the first place. And when you take a closer look at the remaining 50,000 (4% of 1.3 million), the results get even worse: links to other images, links gone 404, links to one-line text descriptions identical to the alt attribute, and links to pages that describe the image size but not its contents (Wikipedia, I’m looking at you). Extrapolating back to 1.3 million, that 50,000 shrinks to about 10,000. That means that less than 1% of images that provide a longdesc attribute are actually useful. No more than one in a hundred get it right, of one in a thousand that even try.

Meanwhile, the very people advocating for keeping the longdesc attribute have recently conducted some user testing. That is, testing how well an actual blind person with an actual screen reader can read actual web pages. It turned out that the test subject didn’t know that longdesc even existed before the tester told him about it. Can you blame him? 99.87% of the images he’d ever encountered had no longdesc attribute at all. Even if he had known about it, and he had actually stumbled across one, he would still be up against 99 to 1 odds that following it would be worth his time. He has a better chance of winning the lottery.

I’m not saying there isn’t a real problem to be solved here. There is. People can publish complex images that require complex text alternatives. Charts, graphs, detailed photographs. Whatever. “A picture is worth 1000 words,” and all that. The longdesc attribute is, theoretically, a solution to this problem. But that doesn’t mean it’s a good solution, and it’s certainly not the only solution. We’ve been living with longdesc for 10 years now, and let me tell you, it’s not working out. So can we please get past the grandstanding and start talking about a better solution?

WHATWG blogValidation result formats for review

I’d like to enable the use of Validator.nu as a RESTful Web service. To this end, I have designed a Validator.nu-native XML response format.

There is also a JSON format for review.

I’d appreciate comments on the format—especially from people who can foresee wanting to write clients.

Comments on this blog seem to be broken right now. Comments can be sent directly to hsivonen@iki.fi or the the implementors mailing list.

Comments are welcome here.

WHATWG blogPourquoi le texte alternatif peut être omis (French)

This article is a French translation of the article Why the Alt Attribute May Be Omitted.

La spécification de l’attribut alt a été retravaillée récemment, afin d’améliorer sa définition, en incluant une explication en profondeur de comment fournir le texte alternatif le plus approprié avec de réelles exigences éditoriales.

La spécification décrit des situations où le texte alternatif doit être précisé, où un attribut alt vide doit être utilisé et où, de façon plus sujette à controverse, l’attribut alt peut parfaitement être omis. Cette omission peut sembler être sujette à controverse, parce que au premier regard, cela ressemble à une tentative pour justifier la mauvaise pratique, contraire aux principes de l’accessibilité, qui consiste à oublier l’attribut alt … Et à jeter un pavé dans la mare. C’est une confusion malheureuse qui nécessite qu’on s’y arrête un instant pour balayer les doutes qu’elle pourrait susciter chez bien des gens. Bien que cela puisse paraître rétrograde, la situation est ainsi bien meilleure.

Il y a bien des cas où le texte alternatif est tout simplement indisponible et où il y a peu de choses que l’on puisse faire pour remédier à la situation. Par exemple, la plupart des utilisateurs de sites de partage de photos comme Flickr n’auraient certainement aucune idée de quoi écrire comme texte alternatif, même si Flickr leur en offrait la possibilité. Et même si, bien sûr, tout le monde s’accorde à dire que si ce serait formidable si, comme la spécification l’encourage, tous les utilisateurs le faisaient, la plupart ne le feront tout simplement pas.

Le problème que nous venons de soulever est le suivant : que devons-nous faire dans le cas où aucun texte alternatif n’a été spécifié et où il reste virtuellement impossible à définir ? De nombreux systèmes actuels tentent de satisfaire la recommandation actuelle en matière de texte alternatif en générant ce texte à partir des métadonnées des images.

Flickr, par exemple, répète le titre de l’image ; Photobucket semble combiner le nom du fichier image, le titre et le nom de l’utilisateur - et Wikipédia reproduit de façon redondante la légende de l’image. Le problème de ces approches est qu’aucune d’entre elles ne fournit une information additionnelle utile à propos de l’image et, dans certains cas, cela est pire que de ne fournir aucun texte alternatif.

Le bénéfice que l’on peut retirer de permettre l’omission du texte alternatif, plutôt que de nécessiter une valeur vide est que cela permet de créer une distinction claire entre une image qui n’a pas de texte alternatif (comme une icône où une représentation graphique du texte environnant) et une image qui fait partie intégrante du contenu, mais pour laquelle aucun texte alternatif n’est disponible. Il a été dit que Lynx et Opera faisaient déjà cette distinction. Pour des images qui n’ont pas d’attribut alt, Lynx montre le nom du fichier et Opera le texte “Image”, mais aucun des deux ne montre quoi que ce soit pour les images dont l’attribut est laissé vide. Il reste à déterminer si cette disctinction est effectivement utile dans l’affichage du contenu du “monde réel” et il y a réellement un débat à mener si vous avez des preuves à avancer.

On a suggéré que retirer la présence inconditionnelle de l’attribut alt affecterait la capacité des validateurs à montrer leurs erreurs aux utilisateurs et retirerait un bon outil de promotion de l’accessibilité. Cependant, utiliser les erreurs de validation comme un outil d’évangélisation de l’accessibilité n’est certainement une bonne façon d’envisager cette problématique.

Tandis qu’il est en effet très utile pour les auteurs de savoir quand ils ont oublié par erreur un attribut alt en cherchant à les obliger à l’utiliser de façon inconditionnelle, utiliser un outil aussi éculé qu’un validateur est contre-productif, car il encourage l’utilisation de textes générés automatiquement de pauvre qualité. Paralèllement, rien n’empêchera les outils de validation et d’édition web de signaler ces erreurs aux auteurs si tel est leur bon plaisir.

Aucun des bénéfices de l’accessibilité n’est perdu en acceptant le fait qu’il est impossible de forcer tout le monde à fournir un texte alternatif et en rendant l’attribut alt optionnel pour pouvoir s’assurer de la conformité d’un document. Personne ne proclame que la conformité au HTML 5 est équivalente à la conformité avec les recommandations d’accessibilité. Il y a de nombreuses choses qui sont considérés conformes techniquement en HTML, mais qui demeurent “inaccessibles” si elles sont mal utilisées. Rendre l’attribut alt techniquement optionnel ne se dresse pas contre les recommandations d’accessibilité, pas plus qu’il n’a un quelconque impact sur l’évangélisation de l’accessibilité. Il s’agit simplement d’accepter la réalité de la situation à laquelle nous faisons face, dans l’espoir de réduire la prolifération des textes alternatifs de mauvaise qualité générés automatiquement.

WHATWG blogWhy the Alt Attribute May Be Omitted

The specification of the alt attribute was recently worked on to thoroughly improve its definition, including an in depth explanation of how to provide appropriate alternate text, with clear authoring requirements.

The requirements describe situations where alternate text must be provided, where an empty alt attribute must be used and, most controversially, where the alt attribute may be omitted entirely. This is controversial because at first glance, it seems like an attempt to endorse the bad and inaccessible practice of omitting the alt attribute, and thus yet another slap in the face for accessibility. That is an unfortunate misconception that needs to be carefully examined to settle any concerns people have. Although it may seem backwards, the situation is actually much more positive.

There are many observed cases where alternate text is simply unavailable and there’s little that can be done about it. For example, most users of photo sharing sites like Flickr wouldn’t have a clue how or why to provide alternate text, even if Flickr provided the ability. While everyone agrees that it would be wonderful if all users did – indeed, the spec strongly encourages that – most users simply won’t.

The problem being addressed is what should be done in those cases where no alt text has been provided and is virtually impossible to acquire. With the current requirement for including the alt attribute in HTML4, it has been observed that many systems will attempt to fulfil the requirement by generating alternate text from the images metadata.

Flickr, for example, repeats the images title; Photobucket appears to combine the image’s filename, title and the author’s username; and Wikipedia redundantly repeats the image caption. The problem with these approaches is that using such values does not provide any additional or useful information about the image and, in some cases, this is worse than providing no alternate text at all.

The benefit of requiring the alt attribute to be omitted, rather than simply requiring the empty value, is that it makes a clear distinction between an image that has no alternate text (such as an iconic or graphical representation of the surrounding text) and an image that is a critical part of the content, but for which not alt text is available. It has been claimed that Lynx and Opera already use this distinction. For images without alt attributes Lynx shows the filename and Opera displays "Image", but neither show anything for images with empty alt attributes. It is still somewhat questionable whether this distinction is actually useful and whether or not browsers can realistically make such a distinction with real world content, and that is certainly open to debate if you have further evidence to provide.

It has been suggested that taking away the unconditional requirement for the alt attribute will affect the ability of validators to notify authors of their mistakes and take away a useful tool for promoting accessibility. However, using validation errors as an accessibility evangelism tool is not necessarily the only, nor the best, way to address the issue.

While it is indeed very useful for authors to know when they have mistakenly omitted an alt attribute, attempting to unconditionally enforce their use, using a tool as blunt as a validator, is counter productive since it encourages the use of poor quality, automatically generated text. Besides, nothing will prevent conformance checkers and authoring tools from notifying authors, if they so desire.

No practical accessibility benefits are lost by conceding the fact that you cannot force everyone to provide alternate text and making the alt attribute optional for the purpose of document conformance. No-one is claiming that conformance to HTML5 equates to conformance with accessibility requirements. There are lots of things that are considered technically conforming in HTML, yet still inaccessible if used poorly. Making alt technically optional doesn’t stand in the way of accessibility requirements, nor greatly impact upon accessibility evangelism. It just acknowledges the reality of the situation in the hope of reducing the prevalence of poor quality, automatically generated alt text.

W3C Team blogThree Buckets of Thoughts

Kevin Lawer has written a great blog post Web Standards' Three Buckets of Pain explaining cultural differences between communities.

Opening up the W3C

Justin Thorp commented (emphasis is mine):

Karl, I'm really excited by your efforts with opening the W3C. The only thing is I hope its more then just "opening up." We can make our mailing lists and our documents publicly available... we can make more primers... but I'm not sure that's enough. It's about engaging the development and design community in a genuine conversation... its as you said it "being in liaison with the developers, designers and the Web pro communities." I'm here to be a servant to your efforts however i can be.

I'm pretty sure we can work out something. Seen from the inside, the door has always been opened, but people didn't know about it, or know that they could interact. The efforts of W3C staff to publicize it might not have been enough. There is also a question of scalability. More on that a bit later.

There is a perception problem on both sides. And I'm pretty sure there will be frictions and unknown issues by the mixing of communities, but that's more exciting than worrying. smile

Learning the vocabulary, the way people think or interpret is always challenging and wonderful at the same time. I emphasized the "we" because we will be able to achieve this if Web designers, Web developers, implementers, hackers are working together, which reminds me I have an action item for the HTML working group about HTML 5 for authors.

Waterfall model

Then Dan Bradley said:

First, you have to have a spec, then you implement against that spec, then you test the implementation against the spec. The W3C has created several specs, and Browser developers developed code against those specs. Sadly it's the third part that falls down. The W3C did a great job providing tools to validate HTML, CSS, and other specs for the web development community at large, but they haven't done anything to validate there specs for the browser developers that I know of.

This model of development is usually called the waterfall model. Not all W3C working groups are using this model. Some have introduced a test driven development for their specifications, some have chosen to be very practical and to carefully specify what has been already deployed.

Life of W3C specifications

It starts with implementers (from companies often) who have a product in development or an idea of a future product. These developers create a specification around their requirements for their own products, plus things they think would be cool to have. Sometimes the passionate discussions create heavy specification. It's a difficult exercise of knowing what should be inside or not. To find the right balance.

Personally, I'm much in favor of a specification which starts small and can evolve, adding a new set of features at the next version.

When the specification reaches Candidate Recommendation stage, the Working Group is officially asking for implementation experience. It doesn't mean that they don't have them already. It really depends on the working groups. It is the time where some groups start to think about test suite and how to prove what is implemented in an interoperable way. The group is producing an implementation report. It is far to be perfect and shows its own issues. I should write a separate article about it.

Personally again, I prefer when the group is working on test cases at the very beginning of the specification life. So that there is an iterative process between specification/test/implementation. But here I just show my QA hat.

To enter the next step, Proposed Recommendation, the WG has to show that each feature is implemented twice in an interoperable way. Really challenging. Features will be removed from the specification if there are no dual implementations of each feature.

Then it becomes a W3C Recommendation. Implementations are often already available on the market by this time and are in the process of being finalized. Software bugs related to ambiguities in the specification, to early implementations are part of the software life cycle. The working group will fix minor bugs in errata, and bigger issues in a next release (which takes time.)

This is another opportunity for an article about products and specifications ecosystem after publication.

W3C HTML blogReinventing HTML: Update

In a comment on Reinventing HTML: discuss, OffBeatMammal wrote:

I think it's very brave to try and reinvent something that's got such a widespread adoption and so many flawed interpretations. ...

I, for one, would love to be able to contribute to this process...

You can contribute. In fact, you just did. We have 44 valuable comments on that item alone. Comments on weblogs are an important part of the process, along with spec reviews, criticism, advocacy, tutorial articles, books, conference presentations, and helping your buddy down the hall find the part of the latest CSS draft that's relevant to the problem he's working on.

On the other hand, when you just write a weblog comment, it's hard to be certain who reads it and what their response is. There's nothing like face-to-face contact. One of the key benefits of paid W3C membership is an invitation to meet all the other W3C members and the staff at Advisory Committee meetings twice a year. I have been to almost every one since 1995, but I missed the recent AC meeting in Japan due to conflicting travel obligations. What a meeting to miss:

I wrote that this meeting was probably one of the most important ac meetings in the last eight years. I was right in that assumption.

The discussion level, including on controversial topics, was a real pleasure and I am immensely happy the consortium is able to do this work on itself.

Open letter to W3C staff and Members Daniel Glazman, 1 Dec 2006

Daniel's contributions to W3C and the Web include participating in HTML and CSS working groups and developing an open source HTML authoring tool. The Standards category in his blog includes Tim Berners-Lee, thank you, a 29 October response to Reinventing HTML, where he writes:

And Disruptive Innovations is probably going to join that Working Group to make the 21st century's web happen.

Those of you who commented on the disconnect between W3C and browser vendors and the WHAT WG, take note that Daniel praises the W3C's renewed connections with browser vendors. While he welcomes the contribution of Chris Wilson of Microsoft, Daniel also expresses concern at having the chair affiliated with any major browser vendor.

Formal comments on the charter from W3C members are due January 7, and a decision on whether to start the new working group should follow a few weeks later, following section 8.1 Advisory Committee Reviews of the W3C Process.

We consider informal comments too, on a best-effort basis. Ian Hickson has some interesting input on how the HTML Working Group should be chartered:

Regarding technical matters, there shouldn't be a difference between being a working group member as a W3C Member Company, a W3C Invited Expert, or participating as a non-W3C Member.

I support that goal, but I also support the goal of royalty-free Web specifications, and I'm not sure how to reconcile Hixie's suggestion with the W3C patent policy. He concludes:

This latest charter makes big strides towards being the basis of an important cornerstone of the Web in the coming years. I hope you will be able to take the above feedback to heart. I look forward to taking part in this new working group.

This is real progress since the 18 August www-svg comments from Maciej Stachowiak of Apple:

I don't think it makes sense for vendors of browser-hosted implementations to continue to participate. Instead we should work out amongst ourselves what makes sense to implement in a web browser.

In addition to the progress at an organizational level, there is lots of interesting technical work going on. The W3C Technical Architecture group acknowledged the broad impact of reinventing HTML on W3C's work as issue TagSoupIntegration-54:

Is the indefinite persistence of 'tag soup' HTML consistent with a sound architecture for the Web? If so, what changes, if any, to fundamental Web technologies are necessary to integrate 'tag soup' with SGML-valid HTML and well-formed XML?

We had a great discussion of the issue (mostly face-to-face but with T. V. Raman participating remotely by phone and IRC) of how the problem is not just missing quotes and mixed up nesting; the script tag is an example of The Rule of Least Power, Powerful languages inhibit information reuse. XHTML is somewhat underspecified in the area of <script>, and it's as much art as science to figure out which idioms are sufficiently widely deployed in things like google ads that we should standardize them and which ones we can deprecate in the interest of simplicity and interoperability. Survey work like David Hammond's Web Browser Standards Support is really great to have in cases like this.

tags: , ,

W3C HTML blogThe Tracker, Tracked

Since W3C launched the new HTML Working Group in March, over 450 people have joined. This is great, but making sense of the thousands of mail messages that followed is too much for any one person. I think the new issues tracking task force is a promising development. The small group of trackers (working closely with the co-chairs Dan Connolly and Chris Wilson) is a valuable complement to the larger set of people participating in the group who write and test code, review documents, and represent a large set of user needs. A richly structured community has a better chance of producing a widely accepted standard.

In every W3C group I've been a part of, it takes a while for the people involved to forge the roles (formal or informal) that suit them, and to start speaking the same language. Only then do they start to get work done. The HTML WG is unique at W3C in its size and makeup, so I am not surprised that the participants may require additional time to establish their own rhythms and rituals for getting things done. The tracking task force is an encouraging sign. Dan explained that several pieces fell into place around the same time, leading to the formation of the task force:

  • The W3C systems Team made two improvements to a W3C issue tracking tool called Tracker that motivated Dan to try it out with the HTML WG. The first is that the tool keeps a detailed paper trail of issue state changes (who changed what and when, down to the second). Dan calls this "elephant never forgets" mode. He has told me many times that software must behave this way for him to trust his data to it. The second change is that Tracker now talks to W3C's internal database of groups and participants, making it easier for the Chair to manage the set of volunteers (and their accounts) without requiring special permission.
  • People volunteered! James Graham, Shawn Medero, Julian Reschke, Gregory Rosmaita, and Michael Smith all offered their services. Volunteers -- whether Chairs, Editors, Issue Trackers, Test Case Writers, or other contributors -- make the difference at W3C between indifference and success. So thanks to all those who raised their hand. Contact Dan or Chris Wilson if you are interested in helping as an issue tracker or other role.

Dan shared a tip that I think will be useful to Chairs of other groups: when W3C launched the HTML WG, he set up a questionnaire for group participants to indicate (among other things) which roles interested them. It doesn't hurt to ask! And then, when the stars aligned and it came time to ask for volunteers, Dan had a short list of candidates.

What comes next? Tracked issues lead to decisions. Decisions lead to changes to specifications (or not). I look forward to seeing how the community's voice, now listened to by perked up ears, shapes HTML 5.

WHATWG blogWiki collaboration to describe HTML5 microsyntaxes

Currently, Validator.nu mines the HTML 5 spec for UI text describing permissible content models, element contexts and element-specific attributes. The text is shown when an element or attribute is misplaced on missing.

Unfortunately, the spec does not contain similarly extractable text for microsyntax descriptions. Microsyntaxes are syntaxes that appear mostly in attribute values—for example, HTML5 integer, Web Forms 2.0 week, RFC 2616 media types (aka. MIME types) or CSS3 Media Queries.

Based on IRC discussions, there is interest in producing the descriptions collaboratively. To that end, I have seeded the WHATWG wiki with a page for microsyntax descriptions. If you would like to help make validator messages better, please feel free to edit the wiki (under the MIT license).

W3C Team blogLinks Feast about Technical Plenary 2007

It was an amazing long week for the W3C community. Meetings, talks, corridors discussions, shared meals over brackets and parsers, many new projects started and some communities started to have a better understanding of each other. Some people posted their images on the Web. I hope there will be more for each W3C meetings.

About TPAC and topics discussed during the event

W3C Team

W3C staff group photo

Big meetings are often a good opportunity for the W3C staff to meet, some for the first time and some for the last time. Like Janet Daly and Susan Lesch.

Janet Daly

Thanks for all of you who have blogged, talked, shared your impressions about the Technical Plenary and its technical topics. See you next year?

Dimitri GlazkovBack into the Future of Web: HTML5 SQL Player

The one where I release a cool new toy: a Javascript sandbox with HTML5 client-side storage spec implementation.

Sam RubyDark Side of Postel’s “law”

Simon Fell’s weblog contains the following line:

<link rel="alternate" type="application/atom+xml" title="Simon Fell > Its just code" href="http://www.pocketsoap.com/weblog/feed.atom">

feedfinder.py, atomautodiscovery.py, and feedparser.py version 4.1 will fail to pick it up.

nightly feedparser.py picks it up, as does html5lib.

Demonstration: code, output.

Of course, many that can find Simon’s feed will fail to parse it.

There is a dark side to Postel’s “law”

W3C Team blogA story about namespaces, MIME types, and URIs

Noone seems to know where the story begins; Ian Jacobs reminded me about magic namespaces as I enjoyed breakfast on Thursday; Steven Pemberton and Bert Bos had told it to him, perhaps prompted by Ian Hickson's question in the URI-based extensibility panel the day before: how do we make namespaces usable by HTML authors?

On Saturday, I took an action in the HTML Working Group discussion of ARIA to write it up. Little did I know that by the time I got back to the office, Norm would have written up Implicit Namespaces for me.

Thanks, Norm!

Anne van KesterenW3CTP: The Technical Plenary Day

The W3C Q&A Weblog covers most of the topics discussed today prefixed with “TPAC 2007.” I was on the panel called “HTML5 and XHTML2” which mostly was about explaining some of the design decisions behind HTML5 and saying that it’s very unlikely to see XHTML2 in browsers, even more so because recent drafts use the same namespace as XHTML5. It was encouraging to see how much more acceptance there now is for HTML5 within the W3C. Every panel today either mentioned the WHATWG, HTML5, or both. This doesn’t mean there’s agreement on what the specification says, but that “W3C people” are reviewing the specification is already a huge step forward from last year.

W3C Team blogTPAC 2007 - HTML Working Group had informal jamming session!

It was intended to be a fun session for the HTML Working Group face to face meeting, but the word spread out and suddenly many people joined us at the room. The jam started and suddenly Tim Berners-Lee joined Dan Connolly, Steven Pemberton, Ian Jacobs, Janet Daly and others on the lyrics...

Singalong may not sound very serious when a working group with a stake in the future of the Web gets to meet. But face-to-face meetings are about much more than being around a table and arguing about issues: it is a great way to meet and get to know the people with whom you are going to work, sometimes for the span of several years, over e-mail and other remote communication means.

About a hundred people in the room enjoying life and music. Life is about small moments... and this was an awesome one.

note: Kevin Lawver and Justin Thorp published some pictures of the jamming session on the Web.

W3C Team blogTPAC 2007 - HTML Working Group holds first face-to-face meeting

The time has come for the much anticipated HTML Working Group face to face meeting, at the W3C Technical Plenary / Advisory Committee Meetings Week in Cambridge, MA (USA).

The HTML WG has 488 participants, by far the largest Working Group at W3C, with 69 participants from 26 organizations and 419 Public Invited Experts (as of 8 November).

The meeting was quiet for about 15 minutes, but after Dan Connolly introduced everybody and a did a brief presentation about the W3C process using Hixie's drawing, Aron Leventhal did an ARIA intro and presented some demos, and the discussion started.

As I write this, there are about 40 people in the room, 20+ on IRC, and Dan Connolly is playing the guitar and introducing Ben Millard who will be presenting the topic of data tables.

Maciej Stachowiak presents an extension to CSS that does transition and effect but still degrades gracefully (as stated by Olivier).

Still playing the guitar, the chair introduces an interesting and democratic way of deciding the topics of a number of "unconference" sessions. Participants with an idea come to the front of the room, and get 2 minutes to pitch their topic. Among them, "HTML for authors" gets the popularity vote, but others, such as versioning, test cases management or hacking, data tables or media elements, get a lot of attention.

The group is discussing about the social impact and needs of having a video element in the HTML specification. In terms of usability and given the numbers of video web services, it seems natural to create one. Then the cascading issue is how do we define the video element, which attributes, how much it should be compatible with video elements in other markup languages such as SMIL for example. Then there is an issue about codecs. Shall we recommend one, and which one? An issue has been opened as well as an entry in the wiki to define the scope of this media element.

Something truly collaborative is going on. Hixie just did a brief presentation on how to make test cases, and then everybody started testing different parts of the HTML5 spec ... nice! Hixie is going around the tables answering some questions and discussing topics that are coming out from the tests.

It's past six and the meeting has been adjourned until tomorrow morning. It was a great day for W3C.

W3C HTML blogTPAC 2007 - Openness of W3C Working Groups

The participants of the W3C tech plenary are back from their lunch overlooking the gorgeous Charles river, to tackle the question of "openness".

This is a development from a topic already raised today: a lot of people's lives and living depend on Web technologies, and there is an enormous pool of opinions and contributions to the work now being done at W3C.

Update 2007-11-08 The slides of the panelists' presentations are now available:

  • Deborah Dahl (Conversational Technologies, HyperText CG, Multimodal Interaction WG chair.
  • Art Barstow (Nokia, Web Application Formats WG chair)
  • Ian Hickson (Google, WHAT-WG, HTML WG, CSS WG, ...)
  • Paul Cotton (Microsoft, Web Services Policy WG chair)

I get the impression that a lot is being crammed into this topic of "openness", which seems a bit dangerous. Is allowing public feedback openness? Is letting anyone with enough time and energy participate, in effect keeping out people not able to follow an enormous volume of discussion (in english only...) a real open process? As one contributor from the floor notes, open does not imply inclusive, and "openness" as a mostly US-communities-based mindset defines still has issues of language barriers, not reaching all the communities you need to reach. Also, are we talking about openness, or transparency, or both?

Terminology and definition of openness notwithstanding, getting more people to contribute does have a great appeal: this means more ideas, more pairs of eyes and brains to look at the specification. All this, so far, was covered by a W3C process ensuring public reviews of every specification at every stage of their maturity, but for many, this was not sufficient: there is a clear wish for direct involvement in the decision making.

This indeed has some serious advantages: in the development of our Open source tools, I have observed that contributors react very differently when they feel they are "outside", as bug reporters for instance, and "inside", as part of the development team. Being part of the process means that you have a better chance of understanding it, and, sometimes, go and explain it to others.

In this way I feel that opening working groups to the public in a process similar to what has been done with the HTML Working Group since March this year not only made a lot of people happy and brought up tons (too much too soon?) of ideas, but it also made a lot of people who may have kept a cynical outlook on the new HTML work, ambassadors of this work, genuinely excited by the work.

There are a lot of good concerns about making groups working on specifications more open, and as I write this, Art Barstow of Nokia, on stage, works on debunking them one by one...

  • Openness brings too much input? that's the point, he thinks, to get feedback early, not too late.
  • IPR and licensing terms become a hairy subject with hundreds of participants, not always clear on their affiliations? Ensuring that patent policy commitment gets signed prior to joining the group would (should?) solve most of the issue.

Does this mean that an open working group is the perfect solution, or are there technologies where a small task force, with frequent calls for feedback on the material they process, is more efficient? Standards veteran Paul Cotton tells us that you can have a very healthy community without needing a lot of "invited experts", if there is a solid submission with early support from a community. And eventually, you get down to a core group that actually does the work.

Which model will W3C adopt for its future? This is hard to tell. we know that the small group, public feedback scheme has worked well (and less well) for many specs in the past, whereas the recently born HTML working group, yet to have its first face-to-face meeting or publish a snapshot of its specification(s), is still an experiment.

WHATWG blogThe WHATWG at the W3C technical plenary

The W3C is having its technical plenary day today, and a number of WHATWG contributors are there. It’s hard to participate remotely in this event, but you can watch and listen — the W3C is publishing an audio stream (in Ogg; a Java applet alternative is available too), and has commissioned realtime captioning for the event. There’s also W3C IRC channel on the topic on irc.w3.org, port 6665, channel #tp, password beantown * (a single asterisk) (it’s not clear why there’s a password, just go with it) (no password anymore). You can also chat with WHATWG contributors who are present at the event on our own IRC channel.

The agenda for the day is available from the W3C site. Don’t forget to adjust the times from the Boston timezone to your timezone if you want to listen to a particular session.

W3C Team blogTPAC 2007 - Cracks and Mortar

Tim Berners-Lee is taking the floor: "The world is a mess of interconnected communities and it is why it is working."

  • Content-Type: is a way to define the content available at a specific URI. It gives flexibility for evolution. It reminds me that we, olivier and I, gave at Keio University a talk on Web architecture and we talked about content-type. HTML 5 proposes to jeopardize the content-type ignoring the server part of the architecture.
  • TAG Soup: Tim reminds that documenting existing practice is fine but evolving toward clean markup sounds like a good idea.
  • Validation: it should allow extensibility, it should explain problems and motivation, and balance its level of disapprobation.
  • Browsers: save as to help having a cleaner markup.
  • Alternatives: central registry of terms - microformats, etc.
  • New tags and attributes in HTML: validator complains, then microformats overload classes. This is a vicious circle.
  • Canvas and SVG: Tim is asking for dialog between the communities. Look also at the comparison between the two.
  • XML top-down dispatch offers a recursive dispatch.
  • Multiple namespaces in RDF

The future open doors to new things linked data, Coumpound Document Format as an alternative to Silverlight and AIR, FOAF+Openid for blocking spam and makes open social networks, mobile web, video and more.

W3C Team blogTPAC 2007 - URI-Based Extensibility: Benefits, Deviations, Lessons-Learned

The Technical plenary day is continuing. Someone in a comment earlier asked what TPAC was. TPAC means Technical Plenary and Advisory Committee meeting. All W3C Working groups and representatives of W3C are meeting. This year we open a bit more the technical plenary day to bloggers, journalists, and Web designers community.

The session which just started is a bit more technical addressing URI-based extensibility in the Web architecture. URI are identifiers, exactly like a barcode in the physical world. The principle of URI-based extensibility relies on associating a vocabulary (markup language) with an URI. Imagine there is a Smith family in Boston and a Smith family in New-York, you will talk about the Smith of Boston or the Smith of New-York. You qualified the smith to disambiguate which ones you were talking about and to avoid misunderstandings. URI-extensibility is working in the same way but using URIs for the unique identifier.

Dave Orchard introduces the topics citing work in Web Architecture. Sam Ruby proposed a mechanism for HTML 5 and distributed extensibility. Ian Hickson challenged the benefits of URI extensibility by studying the small number of Web pages actually using the profile attribute in HTML pages. Dan Connolly says that even if his backyard is insignificant with regards to the rest of the planet, he might need disambiguation mechanism. Tim-Berners Lee remind that there is not only one community. HTML is a big community, but there are others communities. Smaller communities are more in need of uri-extensibility than bigger ones.

Dan Connolly is asking for support of the profile attribute in Dreamweaver. I tend to agree with Dan here. Some features are not used in the community because tools do not integrate them. Tools will not save us but they can help us a lot in many circumstances. Dan says that "it's a bug when you have a merger; it's a feature when people don't play nice."

Tim Berners-Lee: "The thing I like about web architecture is that I can follow my nose.... giving pointers, give context to things."

Ian Hickson stresses out that there is little value to give URI for microformats for example, and that you still need knowledge of the vocabulary to implement it.

Dan Connolly replies that it is not fair that a community removes the possibility of using some vocabulary from other communities, just because they decided it.

Chris Wilson finds interesting that the community decide to formalize well-known semantics used largely by the community. I would reply to that the English community is imposing a lot of patterns on others communities.

W3C HTML blogTPAC 2007 - HTML 5, XHTML 2.0, Future Formats

The title, just by reading it, reminds me of long discussions for the past 6 months as the (interim) HTML WG staff contact. HTML 5 and XHTML 2.0 ; Many fights, many misunderstandings often due to deaf dialogs. Let's hope that the session will give hints on how to articulate opinions. Last time I attended this session was at XTech 2007 in Paris.

Richard Schwerdtfeger, IBM, proposed to merge the two groups. In his opinion, one group is looking forward and the other one is looking at things backward.

I really think that one of the issues of all discussions we have are related to the use of "real world". There is no shared meaning across people in the Web community. As I said earlier on IRC: "My real world is not yours."

The discussion is taking an interesting turn about what it means for a technology to be ready to break. How do we process errors in technology? It is a much larger topic than HTML languages. It touches Web architecture as large. The discussion is lively on IRC as well.

Daniel Glazman stressed out that we really need editor vendors, CMS developers on the HTML WG. Daniel is part of the HTML WG and is working on a new authoring tool for HTML.

fantasai reminds the audience that the topic is about HTML 5 and XHTML 2.0. She says that both languages solve a real problem which exists now, but one of the issues is the shared namespace.

Anne van KesterenPublishing HTML 5 as W3C Working Draft?

Last Friday the question was put whether to publish the HTML Design Principles draft after I updated it to address a few comments. A second survey was also published with regards to publishing HTML 5, and the HTML 5 differences from HTML 4 document. Results are coming this week and by Saturday we should know the outcome:

Anne van KesterenIn Cambridge, Boston

In the Hyatt in Cambridge (Boston) using the internet for ten bucks a day. Feeling sleepy. It seems to be quite easy to let someone pay for internet access here as all it requires is the name of a person and the room he’s staying in. I haven’t actually tested this yet.

The more interesting news is that I had a few beers legally yesterday evening (around fricking six or seven in the morning Amsterdam time or so). Around two in the morning Amsterdam time I had dinner in The Cheesecake Factory. Letting you wait a long time before getting in and then getting you out fast seems to be the idea. And apparently, this works. One of the reasons is probably that you get a beeper so you can shop until it goes off at which point you go back, wait another twenty minutes, and then you are guided to your table.

Just got back from breakfast which we had around three in the afternoon Amsterdam time. I think I’m getting used to the idea that it’s sort of morning here. Tomorrow the meetings will start. Excellent. This week I’ll be discussing cross-site requests, ARIA, XMLHttpRequest, HTML5, and maybe some CSS. We’ll see how it goes. Meeting the rest of the standards crowd is probably more fun than all that and maybe even the primary reason to have such meetings in the first place. I should probably mention W3CTP 2007 (AC doesn’t apply to me).

Dimitri GlazkovSlides from my IPSA presentation on HTML5 and Google Gears

See? I knew there was a way to somehow combine Alabama, Google and HTML5 in one sentence.

Maciej Stachowiak (WebKit/Safari blog)WebKit Does HTML5 Client-side Database Storage

The current working spec for the HTML5 standard has a lot of exciting features we would eventually like to implement in WebKit. One feature we felt was exciting enough to tackle now even though the spec is still in flux is client-side database storage. So for the last few weeks andersca, xenon, and I have been cooking up an implementation!

The client-side database storage API allows web applications to store structured data locally using a medium many web developers are already familiar with - SQL.

The API is asynchronous and uses callback functions to track the results of a database query.
Compact usage defining a callback function on the fly might look something like this:

var database = openDatabase("Database Name", "Database Version");

database.executeSql("SELECT * FROM test", function(result1) {
   // do something with the results
   database.executeSql("DROP TABLE test", function(result2) {
     // do some more stuff
     alert("My second database query finished executing!");
   });
});

There will also be a small example of how to use the API in a real site that we’ll try to keep up to date as things evolve.

This initial implementation has some things missing from the spec as well as a few known bugs. But it does the basics and the best way to discover what needs work is to get it out there for people to start using it!

If you find any bugs, would like to suggest features, or have gripes about the spec itself, please drop by #webkit or drop us a line on the WebKit email lists.

Oh, and one more thing…

We’re landing this initial implementation with pretty cool Web Inspector support!
So far you can view the full contents of any table and run arbitrary queries on each database a page is using. We have a lot of ideas for improvements but would also love to hear yours.
DatabaseInspector

WHATWG blogCall for Comments

The WHATWG has how published a snapshot version of the HTML5 spec for review. Ian Hickson wrote to the WHATWG mailing list:

Last November, as part of the feedback on the W3C HTML WG charter, I wrote an e-mail saying that I thought a realistic timetable would have a first working draft released in October 2007.

We don’t really need archived copies with the way the WHATWG works, since everything happens in the open with a Subversion interface and everything, but, I figured that I should “publish” an archived copy anyway, so today I put out a frozen “call for comments” draft:

http://www.whatwg.org/specs/web-apps/2007-10-26/multipage/

If anyone was hoping for a semi-stable version to start reviewing the draft, I would say that this is it. We’re pretty much feature-complete at this point, which is to say I don’t think we’ll be adding any major features to HTML5 going forward (though of course minor features like additions to certain APIs are likely to still occur).

There is a public issues list:

http://www.whatwg.org/issues/

…which has about 3700 issues in it. The next order of business is simply to go through all of those issues. I’ve been tracking the issue count since early October, and at the moment the count is reducing at a rate of about 7 a day, which works out to being about a year and a bit of solid work, which puts us on track to reach Last Call in 2009, as I predicted in the aforementioned e-mail.

I’d like to thank everyone here in the WHATWG community for helping make this work fun and pleasant. It’s really nice to be able to work in such a friendly atmosphere. I hope the coming year will continue the same way!

Cheers,

I’d like to thank Ian for his hard work on editing the spec. Keep it up! :-)

Dimitri GlazkovHTML5 Wrapper for Gears

The one where I write a wrapper for Gears to support HTML5 API. [read more ..]

Anne van KesterenquerySelector() and querySelectorAll()

Why do DOM interfaces suck so much? argues that there should be more optional arguments for DOM interfaces. I agree that this is something we could look into making better. (Some time ago I suggested making the last argument of addEventListener optional, as people mess it up.) Feedback on that is welcome on public-webapi@w3.org and is likely also relevant to the WHATWG and W3C HTML WG if it affects interfaces defined by HTML 5. This is not to say that you’ll get an immediate reply and Web browsers will ship with the more convenient method next year, but it has a chance of improving things on the long haul.

APIs we’re introducing now can start right away with sucking less. Well, apart from the names. Thanks to Lachlan Hunt the Selectors API specification has been updated with lots of new examples and the new names. The interfaces are as follows:

interface DocumentSelector {
  Element         querySelector(in DOMString selectors);
  Element         querySelector(in DOMString selectors, in NSResolver nsresolver);
  StaticNodeList  querySelectorAll(in DOMString selectors);
  StaticNodeList  querySelectorAll(in DOMString selectors, in NSResolver nsresolver);
};

interface ElementSelector {
  Element