W3C Team blog — Syntax for ARIA: Cost-benefit analysis
Syntax for ARIA: Cost-benefit analysis
Table of Contents
1. Introduction
2. The core issue: How should the ARIA attributes be spelled?
3. Possible approaches: land-grab, colon or dash
4. The status quo: languages and implementations
5. The near future
5.1. HTML5
6. Cost-benefit analysis
6.1. Implementation cost
6.2. XML extensibility and SVG
6.3. Short- vs. long-range considerations
1. Introduction
This analysis is intended to be neutral with respect to ideology, history and constituency. For a useful overview of how we got here, see WAI-ARIA Implementation Concerns (member-only link) by Michael Cooper.
The W3C's WAI PF Working Group recently published the first public working draft of the Accessible Rich Internet Applications (WAI-ARIA) specification, which "describes mappings of user interface controls and navigation to accessibility APIs".
The ARIA spec. defines roles, states and properties to manage the interface between rich web documents and assistive technologies. The primary expression of roles, states and properties in markup languages is via attributes. Since ARIA is meant to augment web applications across a range of languages and user agents, ARIA has to specify how its vocabulary of attributes and values can be integrated into both existing and future languages.
In preparing this analysis, I have reviewed the available concrete evidence bearing on the matter, and have carried out a considerable amount of work to replicate and, in some cases, correct or extend, testing which has been done in the past. The details are available in a report entitled Some test results concerning ARIA attribute syntax.
2. The core issue: How should the ARIA attributes be spelled?
ARIA is useful only if it is widely supported. It therefore needs to integrate cleanly into existing and future languages as easily as possible. Before looking at possible answers to the spelling question, we need to consider exactly what supporting ARIA means.
We can distinguish two levels of support for ARIA on the part of user agents, which I'll call 'passive' and 'active' support. By passive support, I mean that documents with ARIA-conformant markup are not rejected by the agent, and the markup is available in the same way any other markup is, e.g. via a DOM API or for matching by CSS selectors. By 'active' support I mean the user agents actually implement their part of ARIA semantics, that is, reflecting changes to ARIA-defined states and properties via accessibility APIs.
Although already deployed implementations cannot offer active support, an optimal answer to the spelling question would maximise passive support from existing languages, as well as encouraging active support from subsequent implementations.
3. Possible approaches: land-grab, colon or dash
There are in principle three possible approachs to the spelling question:
- land-grab Just use 'role' and the names of the properties (e.g. 'checked', 'hidden') as attribute names.
- colon Use 'aria:' as a distinguishing prefix, giving e.g. 'aria:role', 'aria:checked' as attribute names.
- dash Use 'aria' plus some other punctuation character, e.g. dash, as a distinguishing prefix, giving e.g. 'aria-role', 'aria-checked' as attribute names.
The land-grab approach is pretty clearly unacceptable, because of clashes with existing vocabularies and the likelihood of clashes with future ones, and will not be considered further.
The current
ARIA WD specifies a combination of the colon and
dash approachs, with the colon being specified for use
with XML-based
languages, with the necessary additional requirement that 'aria' is bound to
the ARIA namespace in the usual way, i.e.
xmlns:aria="http://www.w3.org/2005/07/aaa", and the
dash approach being specified for use with non-XML languages. We'll
call this the mixed approach hereafter.
My understanding is that as of the date of this note, the WAI PF working group have indicated that their intention is that the next draft of the ARIA specs will move to the dash appropach.
4. The status quo: languages and implementations
Choosing an approach is made complicated by the landscape of language and infrastructure standards it has to fit in to, and by the fact that these are moving targets. We therefor have to distinguish between what is currently in place, what we have reason to expect in the near future, and what we can foresee in the longer term. Furthermore, for existing languages we have two categories: XML-based languages, with more or less explict provision for extensibility in general, typically namespace-based, and non-XML languages, which for the purposes of this analysis we will take to be HTML 4.01 and nothing else.
As noted above, the best we can expect from deployed user agents is passive support. The table below sets out the extent of passive support which is available for the colon and dash approaches for each of three host languages, which exemplify the major relevant categories: HTML 4.01 (for the non-XML languages), XHTML (an XML language, but not always treated as such, so we actually get two columns for it below) and SVG (only an XML language).
| Passive support |
HTML 4.01 | XHTML (as if HTML)0 |
XHTML (as XML) |
SVG |
|---|---|---|---|---|
| Allowed at all |
colon: Yes, by 'should ignore' advice dash: Yes, by 'should ignore' advice |
colon: Yes, by 'should ignore' advice dash: Yes, by 'should ignore' advice |
colon: Yes, by 'must ignore' rule dash: Yes, by 'must ignore' rule |
colon: Yes, by 'must ignore' rule dash: In principle,no in practice1, yes |
| Available via DOM |
colon: Yes, via GetAttribute dash: Yes, via GetAttribute |
colon: Yes, via GetAttribute dash: Yes, via GetAttribute |
colon: Yes2, via GetAttributeNS and GetAttribute dash: Yes2, via GetAttribute |
colon: Yes3, via GetAttributeNS and GetAttribute dash: Yes3, via GetAttribute |
| Matches CSS selector |
colon: Yes4, using [aria\:attr]dash: Yes5 |
colon: Yes4, using [aria\:attr]dash: Yes5 |
colon: Yes, using [aria|attr]dash: Yes5 |
colon: No dash: No |
Notes:
- 0 This column applies to the IE family, and to other browsers whenever treating XHTML as HTML
- 1 Firefox 2.0.0.14, IE7 + Adobe 3.03 SVG plugin
- 2 All browsers which treat XHTML as XML
- 3 Firefox 2.0.0.14 (unable to test IE+plugin so far)
- 4 Except IE family
- 5 If attribute selectors supported at all, i.e. not IE5, IE6
It should be noted that some of the entries above disagree with assertions made in the past about browser behaviour. At least some of those assertions were based on flawed test materials---see the discussion of experiments 1 and 2 in my testing report for details on the information summarised above.
5. The near future
A number of browser implementors have responded positively to the ARIA initiative and have included experimental active support for ARIA in pre-release versions of their products. Most of the test materials and implementation information I can find suggests that only the dash approach, and only HTML or XHTML, are currently being implemented.
With regard to improving passive support, it seems very possible that
IE8 will support attribute selectors of the form [aaa\:checked],
which would remove the qualification recorded in the table above by footnote 4.
5.1. HTML5
The situation with respect to HTML5 is complicated. As it currently stands, the HTML5 draft specification supports namespaces internally, and all HTML elements are parsed into the DOM nodes in the HTML namespace, regardless of whether they are parsed "as HTML" or "as XML". But when parsing documents "as HTML", no other namespaces are recognised. Unless this changes before HTML5 is completed, the HTML/"XHTML (as if HTML)" columns above will apply to HTML5-conformant user agents in at least some cases.
6. Cost-benefit analysis
On the basis of the above survey, there follows below an attempt at a
cost-benefit analysis with respect to the colon and
dash approaches, as well as the mixed
approach as currently specced in the ARIA working draft and a fourth approach, as proposed by me in
a
message to www-tag, which I'll call the xcolon approach.
The xcolon approach attempts to address some of the problems
revealed in the passive support table by defining a
pair of getter/setter Javascript functions for access to ARIA information in the
DOM, and giving a design pattern for duplicated CSS selectors (one using
[aria\:xxx] and the other [aria|xxx]).
- 1 HTML5's provision for extensibility, whether compatible with XML namespaces or not, is an open area of discussion at the moment.
- 2 That is, it requires the use of a fixed
ariaprefix and may not (i.e. in some browsers) correctly set thenamespaceURIproperty even when targetting an XML DOM. - 3 That is, in the IE family, only (putatively) IE8 and successors
will recognize
[aria\:...]selectors - 4 See discussion of re-implementation cost below
- 5 See discussion of XML extensibility below
- 6 That is, adds the concept of a fixed, dash-delimited, prefix as a way of managing distinct symbol spaces to the existing non-fixed, colon-delimited prefix for the same purpose.
- 7 That is, requires all embedding languages to explicitly allow and manage an inventory of fixed prefixes and, possibly, their vocabularies.
6.1. Implementation cost
For wholly commendable reasons, development of the ARIA spec. and pilot implementation work have proceeded in parallel. Most if not all existing implementations support only the dash approach. What is the likely cost for those implementations of any decision to adopt any other approach? My conclusion, having examined one implementation in some detail, is that the cost is likely to be very modest.
Michael Cooper, WAI PF staff contact, captured the reason for this very well, albeit unintentionally:
"The ARIA roles and properties are conceptually simple enough, but they are designed to provide a bridge between HTML and desktop accessibility APIs, a bridge which is exploited by the operating system, user agent, and assistive technology all working together. There's a complex set of interdependencies there and the feasibility and details of many of the ARIA features could only be worked out by testing in deployed systems, and therefore doing early implementation."
The complexity referred to above is fundamentally one of architecture, both static and dynamic. Not surprisingly, therefore, syntactic concerns account for a tiny fraction of the code needed to implement ARIA as it stands. Furthermore, and again not surprisingly, as it's what sound software engineering practice requires, the details of the concrete syntax are isolated, and the vast bulk of the code I looked at refers to it only indirectly. The consequence of all this is that the changes necessary to manage any change away from the dash approach will be very straightforward. For more details, see the discussion of experiment 3 in my testing report.
6.2. XML extensibility and SVG
Many existing XML languages make explicit, generic, provision for extensibility by including in their formal schemas and/or spec. prose allowance for any namespace-qualified elements and attributes from namespaces other than those which make up the language itself. Tools such as NVDL and, to a lesser extent, W3C XML Schema and RelaxNG, make it possible to combine the schemas for multiple XML languages to give a complete characterisation of mixed-language documents.
One particularly important example of this approach is SVG. ARIA integration into SVG is clean and straightforward under the colon or mixed approaches, but will require amending the spec. under the dash approach.
6.3. Short- vs. long-range considerations
In trying to weigh the tradeoffs which must of necessity be considered when confronted by the information given above, the matter of timescale is particular hard to address. Any assertion about how things will look five, or even two, years hence can always be countered with a contrary assertion. None-the-less, the centrality of the HTML languages for the Web, and the fundamental importance of accessibility for all of us, suggest that we must take the long-term impact of this decision seriously, and be prepared to discount some short-term discomfort in return for long-term stability and simplicity.
Steve Faulkner et al — HTML5 and alt: New Day Rising
Members of the HTML5 working group have produced an alternative text to that currently available in the specification. It provides text alternative examples that conform rather than conflict with WCAG 2.0.
W3C Team blog — utf-8 Growth On The Web
On Google's blog, Mark Davis is explaining that Google is moving to Unicode 5.1. The article unfortunately mixes unicode and utf-8 as it has been noticed by David Goodger in Unicode misinformation. But the really interesting bit is the growth of utf-8 on the Web. These data should be interesting for the development of http, html 5 and validators.
© graph from Google.
W3C Team blog — Vertical Layouts for Canvas Text (CJK)
I have noticed a discussion (I have cut some parts for readability) about vertical layout for text from the participants of the HTML WG.
<Hixie> ok for canvas text my proposal is:
<Hixie> drawHString(x, y, maxWidth, textAlign, s); and drawHString(x, y, maxHeight, textAlign, s);
<Hixie> drawVString(...) for the second one
<Lachy> what's the difference between them? drawVString for vertical stings where the letters are stacked on top of each other, and not just rotated 90 deg?
<Philip`> Hixie: They look complex and hard to use :-p
<Philip`> compared to e.g. translate(x,y);drawString(s)
<Hixie> lachy: drawVString() would be for vertical text (e.g. some CJK)
<Hixie> one is lack of support for vertical text :-)
In printed media, it is handled quite well for a long time. Japanese books have some complex layouts mixing western and japanese characters.

It happens not only in CJK (Chinese Japanese Korean) texts. Think about a neon sign of an hotel with the letters written vertically.
Felix Sasaki is my colleague at W3C/Keio and has worked with the Japanese Layout Task Force. He was sitting next to me when I was reading the logs of the discussion, so I just asked him some references. He sent me a link to 1.3 Directional Factors in Japanese Text Layout from the Requirements of Japanese Text Layout. He also reminded me about XSL 1.1: 7.29.3 "glyph-orientation-vertical" .
Wikipedia has a page on the topic of Horizontal and vertical writing in East Asian scripts and Unicode a note on Robust Vertical Text Layout.
All of that should help to define the API for Canvas Text.
Steve Faulkner et al — Matt May: @alt and the Flickr Defense
@alt and the Flickr Defense - Matt May deconstructs the photo site use case. explaining why it is a bogus argument for making alt optional in HTML5
Shawn Medero — HTML 5's `alt` Attribtue Conitnues to Spark Debate
Gez Lemon and Steve Faulkner (among others) are working hard to make alt a required attribute on HTML 5 <img> elements. I'll publish my thoughts on this soon, I've got some research I need to collect still.
Steve Faulkner et al — HTML5 and alt: The editors new clothes
The HTML5 editor has recently stated in his defence of the alt being optional:
“We truly do believe in research, hard data, and analysis, rather than hypotheticals; and we truly do believe that evidence suggests that what we are arguing for is going to improve the accessibility of the Web.”
Problem is, no “research, hard data, and analysis” has been provided.
If the editor has such detailed research, please provide it so that the members of the HTML working group and those groups within the W3C WAI that have a stake in this issue, can use the “research, hard data, and analysis” to inform their decision.
Show us the goods
To put the matter in perspective:
What we don’t need from the editor is more Google code statistics and a bit of pseudo scientific prose, dressing the statistics up as facts to support his argument. What is required from the editor to back up his claims? A proper scientific study that is based on scientific method. Research with firm aims and objectives stated up front, with an agreed methodology.
Scientific Method
For the sake of clarity, I have reproduced some information about the steps involved in the scientific method:
The scientific method has four steps
- Observation and description of a phenomenon or group of phenomena.
- Formulation of an hypothesis to explain the phenomena. In physics, the hypothesis often takes the form of a causal mechanism or a mathematical relation.
- Use of the hypothesis to predict the existence of other phenomena, or to predict quantitatively the results of new observations.
- Performance of experimental tests of the predictions by several independent experimenters and properly performed experiments.
If the experiments bear out the hypothesis it may come to be regarded as a theory or law of nature (more on the concepts of hypothesis, model, theory and law below). If the experiments do not bear out the hypothesis, it must be rejected or modified. What is key in the description of the scientific method just given is the predictive power (the ability to get more out of the theory than you put in; see Barrow, 1991) of the hypothesis or theory, as tested by experiment. It is often said in science that theories can never be proved, only disproved. There is always the possibility that a new observation or a new experiment will conflict with a long-standing theory.
Conclusion
If a scientific study with firm aims and objectives stated up front, and an agreed methodology is not forthcoming, we are in the position of reliance on expert opinion, rationale argument and the hope of concensus within the HTML WG or if not a vote on the issue. Whatever route is taken, let’s get this issue sorted so we can move on to other important accessibility issues within HTML5.
Further Reading
Steve Faulkner et al — HTML5 Alternative Text, and Authoring Tools
HTML5 Alternative Text, and Authoring Tools - Gez Lemon cuts through the crap, making a forceful case for alt being a requirement in HTML5.
Lachlan Hunt — Conforming target Attribute
One of the biggest annoyances on the web, and something I really hate, is popup windows. It frustrates me, and many others, whenever a site attempts to forcibly open a new window for any reason whatsoever. So, it may be surprising to hear that the target attribute has actually been made conforming in HTML5, even though it was non-conforming in HTML 4.01 Strict, and that this is a good thing. There are in fact several valid reasons for making it conforming, which I will attempt to explain.
When embedding documents within an iframe, it’s important to be able to set the target of links and forms to be the iframe. This is a useful technique for cases where it’s undesirable to refresh the entire page to update a small section. Although there are alternative techniques that could be used, such as the many AJAX solutions, the simplicity of using an iframe can outweigh the cost of using an alternative JavaScript solution.
Similarly, it’s important to be able to cause links within a framed document to be able to set the target to the _parent or _top in order to break out of frames. Without those values, links would default to opening within the frame itself, which is not always useful. When done well, using target in this way can actually be quite beneficial for the usability of a site.
The purpose of _blank value, however, is to cause a link to open within a new window or tab. Although there are many valid arguments against forcing a new window, mostly related to usability and accessibility, the reason for allowing this becomes clear when you consider the alternatives.
There are many authors who, for whatever reason, really want to have links opened in new windows, and nothing will convince them otherwise. But experience has shown over the years that because it is non-conforming in HTML 4.01 Strict, many authors will go to sometimes extreme lengths to get a popup window, while still writing technically valid markup.
Such techniques range from dynamically adding the target attribute to the DOM with script, to using event handlers and calls to window.open(). Such approaches actually ignore the reason for making it non-conforming in the first place, which was presumably to avoid the usability issues, and in fact have chosen to give validity a higher priority.
When a new window is desired, the benefit of using the target attribute over many of the other techniques is that it is actually more beneficial to the user because it is easier to override. Many browsers offer options to cause such links to open in a new tab instead of a window, and some even allow it open in the same tab. While it is also possible to do that with window.open(), doing so can actually interfere with sites that depend upon the new window to function correclty.
Allowing authors to get what they want using the least user-hostile method is significantly better than inadvertently forcing them to find more harmful workarounds. So this is why the target attribute has been made conforming in HTML 5.
W3C Team blog — font is dead, vive le style
Ian Hickson, one of the two editors of HTML 5 specification has sent this message this morning on HTML WG mailing-list.
Summary:
<font>is gone,style=""is made global.
What does it mean? The font element is part of the list of active formatting elements . The browsers (user agent) have to support the content which is available online following the guideline "Do not break the Web" but the font element has disappeared from the content model.
Basically, there is no way to use a font element to write a conforming HTML 5 document. You, or the authoring tool, will have to use the style attribute.
WHATWG blog — Reverse Ordered Lists
One of the newly introduced features in HTML 5 is the ability to mark up reverse ordered lists. These are the same as ordered lists, but instead of counting up from 1, they instead count down towards 1. This can be used, for example, to count down the top 10 movies, music, or LOLCats, or anything else you want to present as a countdown list.
In previous versions of HTML, the only way to achieve this was to place a value attribute on each li element, with successively decreasing values.
<h3>Top 5 TV Series</h3>
<ol>
<li value="5">Friends
<li value="4">24
<li value="3">The Simpsons
<li value="2">Stargate Atlantis
<li value="1">Stargate SG-1
</ol>
The problem with that approach is that manually specifying each value can be time consuming to write and maintain, and the value attribute was not allowed in the HTML 4.01 or XHTML 1.0 Strict DOCTYPEs (although HTML 5 fixes that problem and allows the value attribute)
The new markup is very simple: just add a reversed attribute to the ol element, and optionally provide a start value. If there’s no start value provided, the browser will count the number of list items, and count down from that number to 1.
<h3>Greatest Movies Sagas of All Time</h3>
<ol reversed>
<li>Police Academy (Series)
<li>Harry Potter (Series)
<li>Back to the Future (Trilogy)
<li>Star Wars (Saga)
<li>The Lord of the Rings (Trilogy)
</ol>
Since there are 5 list items in that list, the list will count down from 5 to 1.
The reversed attribute is a boolean attribute. In HTML, the value may be omitted, but in XHTML, it needs to be written as: reversed="reversed".
The start attribute can be used to specify the starting number for the countdown, or the value attribute can be used on an li element. Subsequent list items will, by default, be numbered with the value of 1 less than the previous item.
The following example starts counting down from 100, but omits a few items from the middle of the list and resumes from 3.
<h3>Top 100 Logical Fallacies Used By Creationists</h3>
<ol reversed="reversed" start="100">
<li>False Dichotomy</li>
<li>Appeal to Ridicule</li>
<li>Begging the Question (Circular Logic)</li>
<!-- Items omitted here -->
<li value="3">Strawman</li>
<li>Bare Assertion Fallacy</li>
<li>Argumentum ad Ignorantiam</li>
</ol>Anne van Kesteren — XTech 2008
In 2005 during exam time I was able to attend a boat dinner during XTech. The next two years I was able to attend the whole event (2006, 2007). In 2007 I did a lightning talk on HTML5. This year I want to do another one of those (I’ve yet to hear back from the lightning talk committee) and I’ll also be giving a full presentation! The presentation will be about the future of cross-site requests and I’ll post all stuff online here once I’ve given it. I hope it goes well.
You can still sign up for XTech 2008. Like last year we might have another browser summit hosted by Molly though details are still to be worked out I believe.
I’ll be arriving in Dublin in the evening of Sunday May 4 and since the conference starts May 7 I’ll gladly take any tips for what to do in Dublin. Thanks!
Shawn Medero — Web Browsers, HTML, and Multiple XML Feed Formats
Dean Allen asks weblog owners (and calls out Dan Hill's cityofsound.com) to stop advertising multiple feed formats (RSS, Atom, etc) and promote only one. Dean suggests that people and software interested in other formats will know where to find them.
Update: Upon rereading Dean's post for the third time, it is not clear to me who his intended audience is. What follows is still appropriate though... whether it is the HTML author, weblog software, or Safari the solutions aren't that obvious.
Update 2: Dean follows up:
Having seen some responses to this, it’s clear I should’ve been less terse here. My point is that feed autodiscovery as it is in Safari (and Firefox, Camino, Firefox 2 on XP, Opera, iCab) is a very good thing, and I agree it’s perfect for this sort of application, allowing feeds carrying different content to be quickly tweezed out without one having to hunt for links on a page.
Cool, I think we're on the same page.
If however you argue that multiple formats are important because, say, Microsoft prefers RSS while Google prefers Atom, then it’s trivial for you, Microsoft and Google to work that out amongst yourselves (think CSS). Just please don’t require every single person who tries a feed autodiscovery popup to have to decide if they want their ice cream served in a boot, a Pontiac, or a waffle cone.
Ok, so now I picked a shitty example. :) My point was that: in the present "wild wild web" environment semantic web crawlers aren't capable of finding multiple XML formats unless you tell them where they are located. And sadly... given the completely shit state of feed formats (they are rarely valid XML) developers and content aggregators seem religiously devoted to one format or the other.
One of the commentors on Alfke's post asks:
Is there a single aggregator out there that can handle RSS 2.0 but not 1.0?
It is not solely about feed aggregators. There are other platforms grabbing and parsing XML feeds. They should be doing both (well, they should be doing Atom) but it is just not that straightforward.
What follows was the original post (with some modifications):
Since John Gruber didn't call him out on this one, I've gotta throw my voice into the conversation.
Knowledgeable weblog authors have long complained about the plethora feed options displayed on the average weblog. RSS 0.91, RSS 2.0, Atom 0.3 and Atom 1.0? Who cares? Given that 88% of internet users don't even know what feeds are why complicate matters?
The problem with Dean's complaint is that in his example Dan is only actively advertising one feed. Visit Dan's website and if you read the content in his sidebar you'll see he only promotes one link to an XML feed with the text "Subscribe to this blog's feed."
Dean's web browser, Safari, is parsing the HTML page using the only "official" method authors have to alert HTML user agents to alternate content: the <link> element combined with a rel attribute and a a mime type attribute that signals if it is one of the known XML feed types. I call it unofficial because this practice has never been standardized to my knowledge. Mark Pilgrim pimped it and user agents implemented it after a couple major blogging packages started using it.
HTML authors can place references to alternate representations of their content by using the attribute+value pair of rel="alternate" (and in this case rel="feed" is a possibility, but it is not as widely supported as alternate.) There's no other way to "semantically find feeds". Options such as sniffing at common URLs and attempting to guess if you are using a common weblog system don't scale.
Safari could choose to advertise only one of the possible feed formats, right? Just pick the first element in the order it appears in the DOM or Apple could simply promote one format only. If only it were that easy. <link rel="alternate feed"> is not just about promoting multiple XML feed formats, it can also be used for alerting user agents to multiple content feeds: recent entries, recent comments, recent links, and so on. There's no easy way out for Safari and it probably does the most logical thing it could. (For the record Firefox does the same thing ... I'm not sure about Opera or IE. It doesn't really matter because I'm assuming you'd like to let bots know you have different feed formats. Google might prefer Atom and Microsoft might prefer RSS... it is a crazy world out there. Sadly, not everyone parses feeds using the Universal Feed Parser.)
Update: Well there is a very light specification for RSS Autodiscover. I wouldn't call this an official standard though... it is more of a "gentleman's agreement." This specification recommends Dean's approach:
Publishers who offer the same feed content in several syndication formats SHOULD NOT use autodiscovery links for all of them. Choosing only one feed format for autodiscovery makes it easier on new subscribers, especially if they are unfamiliar with syndication and can't distinguish between the Atom, RSS 1.0 and RSS 2.0 formats.
I still think there are use cases for providing links to alternate feed formats. Especially in the world of feeds there are a lot of bozos.
For background, here's a note from Lachlan Hunt on how HTML 5 is handling Feed Autodiscovery.
Shawn Medero — Final Word on the Internet Exploer Image Toolbar Hack
Following-up on my conversion to HTML 5 it looks like <meta http-equiv="imagetoolbar"> will not be considered valid in HTML 5. I missed an email, which is easy to do, back in Feburary 2008 from Ian Hickson about the IE Image Toolbar hack. Henri Sivonen raised this issue, probably because he gets gruff from authors who use his validator:
In short, some authors want to use
<meta http-equiv="imagetoolbar" content="no">but (X)HTML5 doesn't allow it.Personally, I think that authors who want to disable User Agent features like that are misguided.
Ian Hickson responded:
Proprietary extensions to HTML are just that, proprietary extensions, and are therefore by intentionally not conforming.
And you know what? I agree and I've removed it. If Internet Explorer wants to implement a usability flaw then who am I to disable it. If IE users don't like the image toolbar they should complain to MSFT or switch to Firefox or Opera.
Sam Ruby — Ingrates
Martin Atkins: it is impossible to use Yadis in this way while having a conforming HTML 5 document. The current ethos for HTML 5 seems to be to remove any mechanism by which it can be extended in any way without going through the HTML working group and changing the core spec.
Just because YADIS didn’t have the foresight to use the officially sanctioned way to embed custom non-visible data is no reason to complain.
WHATWG blog — New Image Report Feature in Validator.nu
There have been lots and lots of e-mail on the public-html mailing list about making the alt attribute syntactically required in HTML5. At the core of this debate is on one hand using HTML5 validators to send a strong message about accessibility and on the other hand of avoiding a situation where a simplified and idealistic strong message leads to behavior that is counterproductive considering the goal of making the Web accessible. As a policy debate, it is similar to abstinence-only sex education debates.
A validator is a computer program and cannot tell if a textual alternative is appropriate for a given image in a given context. That's why accessibility checking needs to be done by a person. A person may use a software tool to make the checking easier, but trusting on fully automated software to determine whether a page is accessible is misguided.
Given this basic problem, a policy that insists on the alt attribute always being present doesn’t necessarily lead to accessibility. In fact, considering that syntactic correctness and accessibility are different evaluation axes both in terms of computability and in terms of how HTML authors (other than accessibility advocates) tend to view things (judging from observations about the behavior of HTML authors who use validators), a policy that insists on the alt attribute being always present will likely cause people to put the attribute in there but with inappropriate content. In particular, putting an empty alt on images whose presence is important for understanding the context of other content is bad, because in that case the presence of those images is concealed from a non-graphical user. Also, a textual alternative that just says “image” is not an improvement over what, for example, Safari with VoiceOver says in the absence of alt, but would be worse than a smarter client-side heuristic.
Furthermore, there is a very real case where a textual alternative simply isn’t available to the HTML generator: a user uploads photos to a content management system and refuses to supply textual alternatives at the same moment. HTML 4 didn’t account for this case. In fact, requiring alt to under all circumstances assumes that markup is written by a person who knows what the images are at the time of writing markup. It doesn’t make sense to pretend that the case where the markup generator doesn’t have textual alternatives available doesn’t exist. The HTML 5 syntax needs to account for all use cases.
Expecting markup generators to knowingly emit markup that is not valid is not a winning proposition. Quoting me from 2006:
Authoring tools are judged by taking a page authored using the tool and running it through the W3C Validator or, presumably in the future, through an HTML5 conformance checker. Authoring tool makers who are capable of making their tool produce syntactically conforming documents will want to do so and minimize the chance that the users of their software tarnish the reputation of the tool in the eyes of people who use an automated test as a litmus test of authoring tool bogosity. (People who test tools that way will outnumber the people who make a more profound analysis due to the "validate, validate, validate" propaganda.)
To summarize: As a matter of principle, subjective checking or checking that is not applicable for all pages does not belong in the validation function. Practice is more important than principle, though. Baking the alt requirement into the validation function would be bad when the user of the validation function wants a clean report on syntax but isn’t as concerned with accessibility. It is bad for accessibility when authors put the simplest value that silences the validator into the attribute in order to make the validation report look clean, since doing so gives user agents like Safari with VoiceOver less information to work with. That's why I think the requirement to have an alt attribute present doesn’t belong in the validation function also as a practical matter.
It turns out, though, that some people think of validation as a first step toward accessibility, even though syntactic correctness and accessibility really are different evaluation axes. They expect a validator to help them flag images that are lacking a textual alternative. Moreover, the alt issue seems to be taken as the single most important web accessibility issue with the rest of issues somewhere in the long tail. When there is a demand for validators to flag images without alt, validators probably should meet the demand.
To this end, I have developed a new feature for Validator.nu: Image Report. This new feature is not part of the validation function. It also doesn’t do exactly want people are asking of the syntax definition in the long e-mail thread. (It is not a new idea for a validator user interface to offer tools that help a human perform an assessment about the page outside the validation function. For example, the W3C Validator has offered a “Show Document Outline” feature, which is also on file as a request for enhancement for Validator.nu.)
The new feature tries to address the issue of finding missing textual alternatives but it also seeks to address the issue of faulty textual alternatives. Furthermore, it seeks to address these in a way that doesn’t induce people to write bad textual alternatives in order to make the report look cleaner.
When you turn the feature on, it always lists all the images. There is no textual alternative you can fake to make the list look shorter. Instead, there are four categories and you can only change the category in which an image appears.
This has the benefit of removing the badge hunting problem: people trying to silence the validator without actually raising the quality of their page. However, it also has the benefit that the user can review the textual alternatives for appropriateness and the user can review that the right images have been marked as omitted from non-graphical presentation. Since this tool addresses more problems than simply making alt required on the syntax level, I believe this solution is much better than furiously staying entrenched in the status quo of HTML 4 validation, fearing so much a step backwards as to being too afraid to explore steps forward.
Finally, it should be noted that this feature is, by necessity, itself inaccessible to people who cannot view bitmap images. Yet, I think it is legitimate for this feature to be implemented with an HTML user interface. Also, this feature itself is a case where the generator of the user interface markup has no knowledge of the content of the images it is presenting to the user. Hence, it is itself an example of omitting the alt attribute. It would be truly ironic, if the syntax definition of HTML5 prevented Validator.nu from being self-validating.
Anne van Kesteren — HTML5: SVG in text/html Dropped For Now
Because of a request from the SVG Working Group SVG support in the HTML parser has been commented out. The SVG WG says it will make rapid progress on something they think is more suitable. We’ll see how that goes and if it doesn’t happen I guess the old proposal will go back in. Mathematics has not been tampered with, by the way.
Sam Ruby — HTML5 -= SVG
Planet WebKit — Introducing CSS Gradients
WebKit now supports gradients specified in CSS. There are two types of gradients: linear gradients and radial gradients.
The syntax is as follows:
-webkit-gradient(<type>, <point> [, <radius>]?, <point> [, <radius>]? [, <stop>]*)
The type of a gradient is either linear or radial.
A point is a pair of space-separated values. The syntax supports numbers, percentages or the keywords top, bottom, left and right for point values.
A radius is a number and may only be specified when the gradient type is radial.
A stop is a function, color-stop, that takes two arguments, the stop value (either a percentage or a number between 0 and 1.0), and a color (any valid CSS color). In addition the shorthand functions from and to are supported. These functions only require a color argument and are equivalent to color-stop(0, …) and color-stop(1.0, …) respectively.
Paraphrasing the HTML5 spec and adjusting the language slightly to not be canvas-specific:
“The color of the gradient at each stop is the color specified for that stop. Between each such stop, the colors and the alpha component must be linearly interpolated over the RGBA space without premultiplying the alpha value to find the color to use at that offset. Before the first stop, the color must be the color of the first stop. After the last stop, the color must be the color of the last stop. When there are no stops, the gradient is transparent black.
…
If multiple stops are added at the same offset on a gradient, they must be placed in the order added, with the first one closest to the start of the gradient, and each subsequent one infinitesimally further along towards the end point (in effect causing all but the first and last stop added at each point to be ignored).”
The points of a linear gradient specify a line. Linear gradients must be rendered such that at and before the starting point the color at offset 0 is used, that at and after the ending point the color at offset 1 is used, and that all points on a line perpendicular to the line that crosses the start and end points have the color at the point where those two lines cross (with the colors coming from the interpolation described above).
If x0 = x1 and y0 = y1, then the linear gradient must paint nothing.
For a radial gradient, the first two arguments represent a start circle with origin (x0, y0) and radius r0, and the next two arguments represent an end circle with origin (x1, y1) and radius r1.
Radial gradients must be rendered by following these steps:
If x0 = x1 and y0 = y1 and r0 = r1, then the radial gradient must paint nothing. Abort these steps.
Let x(ω) = (x1-x0)ω + x0
Let y(ω) = (y1-y0)ω + y0
Let r(ω) = (r1-r0)ω + r0
Let the color at ω be the color of the gradient at offset 0.0 for all values of ω less than 0.0, the color at offset 1.0 for all values of ω greater than 1.0, and the color at the given offset for values of ω in the range 0.0 ≤ ω ≤ 1.0
For all values of ω where r(ω) > 0, starting with the value of ω nearest to positive infinity and ending with the value of ω nearest to negative infinity, draw the circumference of the circle with radius r(ω) at position (x(ω), y(ω)), with the color at ω, but only painting on the parts of the surface that have not yet been painted on by earlier circles in this step for this rendering of the gradient.
This effectively creates a cone, touched by the two circles defined in the creation of the gradient, with the part of the cone before the start circle (0.0) using the color of the first offset, the part of the cone after the end circle (1.0) using the color of the last offset, and areas outside the cone untouched by the gradient (transparent black).”
So what exactly is a gradient in CSS? It is an image, usable anywhere that image URLs were used before. That’s right… anywhere.
You can use gradients in the following places:
background-image
border-image
list-style-image
content property
Gradients as Backgrounds
When specifying a gradient as a background, the gradient becomes a background tile. If no size is specified, then the gradient will size to the box specified by the background-origin CSS3 property. This value defaults to padding, so you the gradient will be as large as the padding-box. This is equivalent to a specified background-size of 100% in both directions.
If you want to achieve effects like tiling of a vertical gradient using a narrow strip, you should specify background-size to give the gradient tile an explicit size.
Gradients used as backgrounds will respect full page zoom, acquiring sharper resolution as the page is zoomed in.
Border Image Gradients
Gradients can be used as border images. The most sensible use for them is specifying only horizontal or only vertical borders (and splitting the gradients up between the top and bottom or left and right borders).
The size of the gradient image is always the size of the border box.
List Bullet Gradients
Gradients can be specified as list bullets. One problem with list bullet gradients is that there is currently no way in WebKit to specify the size of the marker box. Therefore the size of the image cannot be specified. WebKit has therefore chosen a default size based off the current font size of the list item.
Generated Content Gradients
Gradients can be used inside the content property. The image will fill the available width and height of its containing block. Therefore when using gradients inside ::before and ::after content with a specified size, it is important to set the display type to block or inline-block.
Gradients can also be used to do image replacement, so can be used with the img element in HTML or to replace the contents of other elements like spans and divs.
Generated Content Gradients Example
Final Notes
WebKit now supports a generic architecture for generated images, making it easy to add new generator effects to CSS in the future (lenticular halos, checkerboards, starbursts, etc.). The rules for sizing of these generated images will match whatever is decided for SVG content with no intrinsic size (the two are sharing the same rules right now).
We encourage you to try gradients out and file bugs if you see any unexpected or weird behavior. They will degrade safely in other browsers as long as you use multiple declarations (e.g., specify the image in one declaration and the gradient in a following declaration).
W3C Team blog — A validator is not an accessibility evaluation tool?
Currently, the most active discussion thread on the HTML working group's public mailing list, public-html, is one regarding the issue of whether in HTML5 the alt attribute should always be required on images. And Henri Sivonen is among the most active participants in that discussion, posting to that thread (among other messages), the following:
http://lists.w3.org/Archives/Public/public-html/2008Apr/0322.html
http://lists.w3.org/Archives/Public/public-html/2008Apr/0333.html
The question of whether or not alt should always be required is an issue that affects the behavior of validators, so it shouldn't be much of a surprise to see Henri taking interest in the discussion around it, because he maintains a validator (more precisely, a conformance-checking tool) called Validator.nu that he's spent a lot of time developing and that he clearly wants to be a beneficial and serviceable as possible to the people who take time to use it.
Among the assertions that Henri makes in his postings to that thread is the following:
An HTML5 validator isn't an accessibility evaluation tool--or at least I think it shouldn't be.
He goes on to compare the purpose of a validator to that of a spell checker, and in a later message, adds this:
A validator cannot check that a page is semantically correct. It can't properly check for accessibility, either.
We should dispel misconceptions about what validators do instead of catering to the misconceptions.
And to clarify what he intends Validator.nu to be useful (and not useful) for, he adds this:
The validator I develop is not a stamping tool. It is a tool that helps authors detect mistakes that they didn't intend to make, so that they don't need to spend time wondering about the effects about their unintentional doings. For example, the validator I develop helps author detect that the alt attribute was typoed as 'atl', which is useful, because atl wouldn't work... I'm not interested in developing a formal stamp. I am interested in developing a development tool.
The assertion that earning a "this page is valid" stamp or badge should not be an end goal (or any kind of goal at all) for users of a validator or conformance checker is something that Henri has stated consistently since the earliest public versions of Validator.nu were available (and that others have been stating for quite a long time also) -- as is the assertion that a validator should be a development tool, not a tool for advocacy. Henri states that most succinctly is a section of the Validator.nu FAQ:
Validation is a tool for you as a page author -- not something your readers need to verify.
To make a somewhat ham-handed analogy of my own: Consider the case of when you create a document with a word processor like Microsoft Word or whatever and you run in through that application's built-in spell-checking and grammar-checking tools to find and fix any spelling or grammar problems. You're using those tools as an author to ensure that the document doesn't contain any unintentional errors before you share it with others. And after you use them, you would never consider embedding a badge in the page to indicate that it's free from spelling and grammar errors -- because the fact that it is free from such is something of real value to you as an author, it's of no value to have it highlighted to all your readers, and not something you want or need your readers to verify.
Anyway, that (bad) analogy aside, I think Henri and most other reasonable people would agree that there is great value in encouraging authors to produce valid content, and beyond that, to encourage authors to be familiar with best-practice accessibility and usability guidelines and to try to follow them to the best of their ability. The main difference of opinion here is around what role (if any) validators should be expected to have in encouraging authors to do those things.
Sam Ruby — SVG and MathML Annexes to HTML5
As Anne previewed, HTML5 has recently added support for data attributes and MathML and SVG vocabularies.
The former feels OPMLish to me. I predict it won’t be long before we see escaped HTML inside data attributes “in order to validate”. And the RDF/A proponents will (correctly) point out that this isn’t enough information to reliably identify subjects, predicates, and objects.
The latter is of more interest to me. It isn’t full distributed extensibility, nor is it entirely consistent with the direction that IE8 says they are going, but it is substantial progress.
By my read of the document, a typical standalone SVG document would be rendered correctly if served as text/html to a browser that supports this specification. Of which, there are none, but more on that in a minute. An HTML5 parser would insert implicit <html> and <body> elements, would ignore the XML prolog (but likely would sniff the page correctly). <metadata> would be placed in the wrong namespace, and likely would not validate, but would not cause any rendering issues. There may be some edge cases with CSS and attribute-value normalization rules and characters like form feeds, but these are likely to be minor. I’m not clear at all yet on how <scripts> would be handled and if there would be any differences. I suspect there will be. Update: Anne assures me that this is not a concern.
As near as I can tell, my weblog would be considered valid HTML5 if served as text/html, with one exception: xml:lang. But again, none of the current browsers would render the SVG content. On one hand that’s entirely understandable as it has only been a few dozen hours since the spec updates were made. But it is troublesome in that these requirements seem entirely author driven and no browser vendors have been very active in the discussion (to be fair, it is clear that Anne has been monitoring the discussion).
I do think an implementation or two at this point would be very helpful. There’s enough documents to throw at such a prototype that we could quickly shake out any spec or implementation bugs.
IEBlog — HTML and DOM Standards Compliance in IE8 Beta 1
With the release of IE8 Beta 1, I'm pleased to be able to talk about the first round of improved standards compliance and bug fixes in IE's HTML and DOM support for the new IE8 standards mode. Doug hinted at some of these improvements, and I wrote a little bit about them in the IE8 Beta 1 whitepapers here and here. In this post, I'd like to enumerate the 'change list' (of sorts) here on the blog in response to requests for such a list that I received at MIX08. Personally, I've been long-awaiting this release because of what I know it means to web developers (like myself) that have had to code around a lot of IE's DOM quirks for many years.
For IE8, I have really focused on the HTML and DOM Core standards and concentrated on building a solid cross-browser compatible foundation for many of the APIs that are already supported by Trident. This effort to fix some of the cracks in IE's foundation has been a long time in coming, and I believe it's a critical and necessary first step before adding on additional standards support.
For IE8 Beta1, we looked at many community-provided bug reports and found that the top pain-points were related to IE's attribute handling (with a few prominent exceptions like getElementById). Therefore, attribute-handling has served as the 'theme' for the set of issues to tackle in IE8. We probably won't be able to fix all of the community-reported bugs in the DOM in this release (there are many), but we want to make sure that we get to the worst offenders first. Help us out by submitting or voting on the bugs that you feel are most impactful to your business.
HTML/DOM Standards Compliance in IE8 Beta 1
Note: I use HTML5 nomenclature for DOM attribute/content attribute.
Big-impact improvements in Beta 1
Within the scope of attribute-related fixes, the following address some of the well-known, oft-cited, compliance issues in IE's HTML and DOM support.
- <BUTTON> type attribute defaults to 'submit' rather than 'button' in IE8 standards mode.
- setAttribute now uses the content attribute name (rather than the DOM attribute name) for applying an attribute value (also camelCase no longer required).
- This fixes the commonly reported issues regarding the 'style', 'class', and 'for' attributes not working.
- This fixes the commonly reported issues regarding the 'style', 'class', and 'for' attributes not working.
- getElementById finds only elements with matching id (not name) and performs case-sensitive matching.
- <BUTTON> value attribute text now submitted iin form submit in IE8 standards mode. IE7 standards mode continues to submit the innerText.
- <OBJECT> now supports native image loading (see the whitepaper for more details).
- <OBJECT> now supports fallback for two additional scenarios: HTML embedding and native image loading (where the HTML/image resource cannot be loaded, i.e., 4xx-5xx HTTP response codes. ActiveX controls still do not support fallback (see the whitepaper for more details).
- URL-type DOM attributes separated from content attributes. For example: <A>.href (DOM attribute) != <A>.getAttribute('href') (content attribute). You will find that all URL-type DOM attributes return an absolute URL, while the content attribute returns the string that was provided in the source. These changes apply to the Attr.value and getAttributeNode as well. Specifically:
- The following element's DOM attributes now return absolute URLs: applet [codebase], base [href], body [background], del [cite], form [action], frame [src, longdesc], head [profile], iframe [src, longdesc], img [longdesc], ins [cite], link [href], object [codebase, data], q [cite], script [src].
- The following element's content attributes now return relative URLs: a [href], area [href], img [src], input [src].
Consistency and reliability with Standards and other browsers (attribute-related) in Beta 1
Many reported (and some not-reported) issues with IE's attribute handling involve the NamedNodeMap interface object (object.attributes), correct DOM attribute reflection of content attributes, and case-sensitivity. In principle, the standards indicate that HTML documents are case-insensitive, while DOM Core-related APIs are case-sensing--they depend on the underlying document rules to determine their sensitivity. To resolve ambiguities, I appealed to the most common behavior of other browsers.
- <element>.attributes.getNamedItem no longer creates Attr objects that don't exist in the collection (returns null when an attribute is not found).
- Radio button fixes:
- Dynamically setting the 'name' attribute on a radio button now correctly applies that radio to same named group (old known-issue fixed in Quirks, IE7, and IE8 standards modes).
- Radio buttons without a name attribute can now be selected by the user in IE8 standards mode (I found it interesting that the code revealed this to be an old Netscape compatibility issue).
- <FORM> enctype DOM attribute now supported. Reflects the enctype content attribute.
- Checkbox fixes:
- Inserting checkboxes into the tree (and moving them around the document) no longer resets the 'checked' state with the 'defaultChecked' state.
- The 'defaultChecked' DOM attribute now reflects the 'checked' content attribute. The 'checked' DOM attribute affects both the intrinsic behavior on screen and the form's submitted value.
- Parsing operations on the 'checked' content attribute always affect both the 'checked' and 'defaultChecked' DOM attributes. (For example, removeAttribute('checked') sets 'checked' and 'defaultChecked' to false, setAttribute('checked', 'checked') sets both DOM attributes to true (as if the element were being re-parsed).
- getAttributeNode now correctly populates the .value property of the returned Attr object for all attributes (whether .specified=true or not).
- removeAttribute now uses case-insensitive comparisons.
- <P> element now closes when <TABLE> is encountered (ACID 2 compliance).
- <LINK> rel content attribute now finds 'alternate' token in any location in the string (ACID 2 compliance).
Additional compliance and feature completion in Beta 1
- <BASE> href no longer applies a 'new' document base if the supplied URL is a relative URL (relative URL being defined as not having a schema ['http:'] and a hostname ['/' or 'domain']).
- Title attribute now preferred (over alt) when specified as the popup tooltip for images and maps (img, input, object, and area elements).
- When retrieving Boolean attributes by name, the value is now correctly reported as the canonical attribute name (e.g., checked='checked').
- Implemented hasAttribute (case insensitive matching) which is the suggested workaround while the NamedNodeMap is under construction.
- Completed the Attr interface (of DOM L2 Core) by implementing ownerElement.
- Completed the interfaces for object, iframe, and frame (DOM L2 HTML), by implementing contentDocument. Note: like contentWindow, this property will not allow cross-domain access to the inner content.
- HTMLCollection fixes:
- 'item' API is no longer overloaded to accept strings and act like 'namedItem'. 'item' now only accepts numerical indexes (or tries to convert a string to a numerical index as is JavaScript behavior).
- 'namedItem' no longer returns collections if more than one named item is found. Instead, the first matching (case-insensitive) element is returned.
- As IE8 does not implement all collections using the HTMLCollection interface, the following exceptions currently exist: elements [HTMLFormElement], rows/tbodies [HTMLTableElement], rows [HTMLTableSectionElement], and cells [HTMLTableRowElement].
Known Issues
A significant bug in our JavaScript invoke code path in IE8 Beta 1, causes some JavaScript calls to inadvertently revert to IE7 compatibility mode and therefore make it appear as if some of the aforementioned bugs are not actually fixed. :( This has personally affected some of my tests that pass DOM objects (like HTMLCollections) through a function parameter for testing--I mention this only by way of example. While you will see this bug fixed in Beta2, it may indirectly impact your own testing--I recommend checking for the existence of document.querySelector to see if your script execution has reverted to IE7 compatibility mode before concluding that IE8 Beta1 has not fixed a particular bug (the Selectors API is only visible to IE8 standards mode).
Known issues we are planning to address in Beta 2
At a minimum, all previously available functionality in the DOM will be restored in Beta 2.
- setAttribute still does not work with event handlers.
- <element>.attributes.length fails. The IE8 NamedNodeMap object is in the middle of an overhaul.
- Many TABLE-related API are 'not implemented' as of Beta1. As critical pieces of the IE8 layout engine come online, these APIs are being re-enabled:
- rows/tbodies [HTMLTableElement], rows [HTMLTableSectionElement], cells [HTMLTableRowElement].
- rows/tbodies [HTMLTableElement], rows [HTMLTableSectionElement], cells [HTMLTableRowElement].
- <OBJECT> elements don't fall back on cross-domain security failures.
Known issues we are not planning to change in IE8
- <OBJECT> is not parsed in a cross-browser compatible way (parsing stops at the OBJECT, whereas other browsers continue parsing all the fallback content and make it available. No support for this parsing behavior is planned for IE8; I'll take this opportunity to ask for real-world scenarios that can help me prioritize this feature.
- <OBJECT> elements cannot be 'reactivated' by dynamically correcting the attributes that caused the original fallback. Again, your feedback on the potential benefits/use-cases for this feature appreciated.
Acknowledgements
I'd like to acknowledge the amazing work done by all the IE developers and testers that make it possible to push a button and get IE7 compatible behavior for each of these significant changes.
Also, special thanks to PPK for updating his compatibility tables to showcase some of the work that we've done.
And there's more to come.
Regards,
Travis Leithead
Program Manager
IE8 Object Model
Anne van Kesteren — HTML5: custom data
Sometimes HTML is not enough and you need a mechanism to include some custom data into the document. In 2005 Validating a Custom DTD was published on A List Apart and illustrated how you could add custom attributes to HTML and have them validate. The problem with the approach outlined in that article is that DTDs are a thing from the past and that only the W3C validator cares about them. (Newer validators, such as Validator.nu, don’t have this issue.) Browsers ignore DTDs and the only reason they still look at the doctype of a page is to determine the rendering mode. So if you add a custom DTD and add a required attribute to the browser that will look as if you added a required attribute to HTML. Now if a future version of HTML introduces an attribute with the same name, but with different semantics, your page might behave slightly weird in future browsers.
There is a custom data proposal for HTML5 that allows authors to add custom data to their pages without interfering with future extensions to HTML. The idea is that all attributes starting with data- are reserved for Web authors and they can do whatever they like with them. (They are not intended for browser extensions, et cetera.) In addition there will be a DOM attribute dataset that will allow easier access to these attributes. For an attribute data-opacity you can access that using dataset.opacity instead of having to use getAttribute("data-opacity") and setAttribute("data-opacity", x).
Namespaces were considered, but integrating them in the existing HTML environment is harder and they would also make it harder to author.
Now in the specification: embedding custom non-visible data.
Anne van Kesteren — HTML5: the foreign lands (mathematics and graphics)
The last few weeks there has substantive discussion on the HTML WG mailing list (public-html) and the Math WG public dicussion mailing list (www-math) regarding embedding non-HTML languages in the text/html serialization of HTML focusing mostly on MathML and SVG. (Sidenote: the number of e-mails totalled over the yearly average of the www-math list.)
I wrote about SVG and text/html before and what the complexities of introducing it would be. What wasn’t mentioned back then was having HTML or MathML inside your SVG, having HTML or SVG in your MathML, et cetera. What’s currently in the HTML5 draft addresses all these scenarios. A few details on the parsing side have yet to be worked out and the authoring side needs a whole lot of introductory text to make it more understandable, but the basic concept clear.
The <math> and <svg> start tags act as namespace scopes for MathML elements and SVG elements respectively. They represent the doors to the foreign lands, where:
- XML prefixes cannot be used.
/>is no longer a token of faith and pops the start tag from the stack of open elements immediately.xmlns“attributes” have no effect, but will be allowed so that tool output is by default compatible. They might even end up in the correct namespace,http://www.w3.org/2000/xmlns/.- Element and attribute names are case-insensitive. Case-insensitive names in SVG will be dealt with using a lookup table of some sorts.
- Lots of MathML entities will be added. (Some may still be dropped, such as
:and other entities representing “ASCII” characters…) (For what it’s worth, these entities will work outside the foreign lands too.) - A magic list of HTML element names which is not defined currently will provide a quick escape route. For instance, at the cost of one parse error, the
pstart tag in<math><p>will take you right back home safe turning that fragment into<math/><p>. (I expectpto be in that list.) - The MathML elements
mi,mo,mn,ms, andmtextand the SVG elementsforeignObject,desc, andtitleprovide bridges so that the aforementioned magic list of HTML element names can be nested safely inside SVG and MathML (in the HTML namespace, too). They also allow for SVG in MathML and vice versa.
Not everybody likes this approach, Tim Berners-Lee said: The idea of using SVG without XML is horrifying.
(I guess that statement does not necessarily apply to using the DOM instead of any specific serialization which is how libraries are currently using SVG.)
However, solving the generic extensibility problem for HTML is hard and it’s not that clear what the best approach would be. Also, mathematics and graphics are very basic utilities that everybody in the world should have easy access to. And just as with HTML, there should be no need for complex tools to publish mathematics and graphics (though admittedly MathML and SVG are not as simple as HTML as they are solving somewhat more complex problems). Given that both MathML and SVG started gaining traction in user agents and that the vast majority of authoring software is catered for text/html special casing these two vocabularies makes sense. When implemented it enables a very wide audience to share information in a more accessible way than now (mathematics is often hidden in a bitmap image). Just imagine how much more rich Wikipedia could become.
WHATWG blog — Validator.nu HTML Parser 1.0.7 Released
There is now a new release of the Validator.nu HTML Parser. Change highlights:
- Adds optional support for heuristic encoding sniffing using the ICU4J sniffer, jchardet or both.
- Adds support for rewinding and reparsing when becoming confident about the character encoding and the tentative encoding was wrong.
- Performs encoding name matching per spec instead of using the JDK mechanism.
- Implements spec changes up until just before SVG and MathML support. (Those will merit 1.1 or something.)
- Warning: The semantics of the doctype token have changed in case you have your own token handler (unlikely).
Henry Sivonen — The Validator.nu HTML Parser
Shawn Medero — Now in glorious HTML 5
As someone working on the HTML 5 specification I thought it made sense to convert my weblog into valid HTML 5.
Nearly.
It is not quite valid in a few spots:
- I'm still using the
<meta http-equiv="imagetoolbar" content="no">IE hack to prevent the image toolbar feature from kicking in. I believe that's still useful and I don't clearly understand why the spec prevents me from doing it. Sean Fraser was confused too, posted about it and got Ian's attention (scan the comments on Sean's post). Update: Highly unlikely this will be part of HTML 5. - My use of the footnotes feature in PHP Markdown Extra causes an issue because it utilizes a
revattribute onanchorelements.revis currently missing from the HTML 5 working draft but I believe that was done because not enough research had been done to support including it at the time. I've started a dialog with the maintainer of the PHP implementation of Markdown (Michel Fortin) in hopes of proposing a solution to the W3C HTML Working Group. - Anywhere I've just blindly embedded the YouTube cut & paste code has a problem. YouTube doesn't automatically insert the
dataandtypeattributes on their<object>elements and that's a validation error. Given all of the issues with<object>I think forcingdataandtypeis a good thing but the requirement does seem to fly against the "pave the cowpaths" design decision of HTML 5. Then again embedded YouTube videos hardly have any lasting meaning since I suspect many of these embedded uses will disappear with age in a very short period of time.
Though I'm about a year behind my swedish cohort my implementation makes use of the more experimental tags like <article>, <section>, <header>, <footer>, and <nav> where possible. I'll keep tweaking the format over time because there's no way I've interpreted the spec 100% correctly.
Steve Faulkner et al — ARIA Toggle Button and Tri-state Checkbox examples
It is an exciting time for proponents of WAI-ARIA (Web Accessibility Initiative - Accessible Rich Internet Applications). With support introduced in IE 8 (beta), better support in Firefox 3, planned in Opera and mooted in Safari, coupled with the recent changes to the specification making it easier to develop and deploy ARIA based widgets.
Henry Sivonen — ARIA in HTML5 Integration: Document Conformance (Draft, Take Two)
WHATWG blog — Exploring new vocabularies for HTML
The four hottest topics in the WHATWG Issues List are:
- Finding a suitable common codec for the
videoelement. - The accessibility of tabular data.
- Web Forms 2.
- Using markup from namespaces other than HTML in text/html.
The video codec issue is being actively worked on, but we're not close to a good solution yet (it's mostly an economic and political issue, not a technical one, which is why we don't have any transparency on this issue, sadly). I recently responded to most of the table-related feedback. Web Forms 2 work is waiting for a decision from the W3C's forms task force on whether WF2 will be integrated as-is into HTML5 or whether it will be changed before being merged. The namespace issue is the one I'm working on now.
The first thing I have to do is work out what the problem is! There has been a lot of discussion, but not much of it is focussed on a problem, most of it is focussed on possible solutions. One can't evaluate a solution without knowing what it's trying to solve, though. To this end, I have created a wiki page where I will note down any problem descriptions I can find as I read all 367 of the e-mails in this folder.
Feel free to help! If you want to coordinate, I'm Hixie in #whatwg on Freenode IRC.
Henry Sivonen — ARIA in HTML5 Integration: Document Conformance (Draft)
Planet WebKit — Google Summer Of Code
Are you a university student? Would you like to get paid to hack on WebKit? WebKit is participating in Google’s Summer of Code. Google will pay you to work on WebKit for the summer. Coding sure beats flipping burgers.
Google begins accepting applications March 24th, 2008. We’ve posted some example project ideas to start your thought processes. Some highlights:
- Finish SVG 1.1 - filters, animation, various other bugs
- Finish CSS 2.1 or parts of CSS 3 (e.g. paged media)
- Web technology support - MathML, Ruby, HTML5, XBL2, ARIA
- Port WebKit to your favorite platform
- Improve our developer tools (Drosera, Web Inspector) or add your own (e.g. a JS profiler)
- Propose your own idea! What cool things should tomorrow’s web browsers do?
Prospective applicants should read Google’s student application instructions and join #webkit on freenode.net to meet the WebKit community. We’re happy to answer any questions about possible projects or mentors. Join #gsoc to ask the Google folks any administrative questions you may have.
We look forward to hacking with you this summer!
Planet WebKit — WebKit Hits 93/100 in Acid3
This one didn’t actually involve any code changes. Rather, the Acid3 test itself was changed as a result of feedback from the browser developers. The issue was some rather obscure details of click() DOM method which are not specified anywhere but are implemented in certain way by all browsers (more details in this thread). The behavior expected by earlier versions of the Acid3 was different and would have broken some web content. The ongoing HTML5 work will allow standardizing these kinds of details.
It is good to see that besides testing many exciting new standards Acid3 also takes real-world web compatibility seriously.
Sam Ruby — Strange Loops
Mark Pilgrim: On a somewhat related note, I’ve cobbled together a firehose which tracks comments (like these) that I make on other selected sites. Many thanks to Sam for teaching me about Venus filters, which make it all possible.
Ah, yes. The Tools Will Save Us, circa 2004. I remember it well.
There’s more to the story. Filters are not unique to Venus. Mark’s first prototype was based on another popular tool. When he first showed the results to me, I immediately pointed out that relative URIs were not handled correctly. Better tools, based on better libraries make problems like these go away. Tools that could convert his firehose template to xslt, and thereby consistently be served as well formed XML.
On a somewhat related note, I see Planet HTML5 is powered by Venus and includes a selection of my entries. Presumably ones that mention HTML5. Like this one. Even though it isn’t about HTML5.
W3C Team blog — Browser wars, HTML test jam, and CSS awards at SXSW Interactive in Austin
When he opened the panel today to a packed room, Arun admitted that the "browser wars" title was a little sensationalist; mostly Brendan Eich, Chris Wilson, and Charles McCathieNevile are on the same side, trying to make the Web better for everybody.
The question from the audience that got the biggest reaction was "what would it take for you guys to implemement margins and padding in the same way?" Chris Wilson said they worked hard on that for the IE8 beta and asked that people give feedback on any remaining problems. He also said test suites help a lot with building interoperable implementations.
The question was more about CSS, but I took the mic to add that we're doing an HTML Working Group test jam this evening mostly on IRC and tomorrow at lunch.
After having 500 people join the HTML Working Group, it's somewhat comforting to see that searching the hundreds of SXSW panel descriptions gives not a single hit for HTML, but CSS is big enough to get its own category at the awards show. (Congrats to Kevin and the ficlets gang!)
On that note, everybody please join Bert Bos and me in welcoming our new CSS Working Group co-chairs, Daniel Glazman of Disruptive Innovations and Peter Linss of HP.
W3C Team blog — Character encoding in HTML
In the beginning the Web had ASCII. And that was good. But then, not really. The Europeans and their strange accents were a bit of a problem.
So then the Web had iso-latin1. And HTML could be assumed to be using that, by default (RFC2854, section 4). And that was good. But then, not really. There was a whole world out there, with a lot of writing systems, tons of different characters. Many different character encodings...
Today we have Unicode, at long last well adopted in most modern computing systems, and a basic building block of a lot of web technologies. And although there are still a lot of different characters encoding available for documents on the web, this is not an issue, as there are mechanisms, both in HTTP and within HTML for instance, to declare the encoding used, and help tools determine how to decode the content.
All is not always rosy, however. The first issue is that there are quite a lot of mechanisms to declare encoding, and that they don't necessarily agree. The second issue is that not everyone can configure a Web server to declare encoding of HTML documents at the HTTP level.
Many sources, One encoding
if the box says "dangerous, do not open", don't peek inside the box...
A long (web) time ago, there was a very serious discussion to try and determine a Web resource was supposed to know its encoding best, or whether the Web server should be the authoritative source.
In the "resource" camp, some were pushing the rather logical argument that a specific document surely knew best about its own metadata that a misconfigured Web server. Who cares if the server thinks that all HTML document it serves are iso-8859-1, when I, as document author, know full well that I am authoring this particular resource as utf-8?
The other camp had two killer arguments.
The first, and perhaps the simplest, argument was: what's the point of having user agents sniff garbage in hope to find content, and perhaps a character encoding declaration, when the transport protocol has a way of declaring it? This is the basis for the authoritative metadata principle. This principle is also sometimes summarized as: If I want to show an HTML document as plain text source, rather than have it interpreted by browsers, I should be able to do so. I should be able to serve any document as
text/plainif that is my choice.The second killer argument was transcoding. A lot of proxies, they said, transform the content they proxy, sometimes from a character encoding to another. So even though a document might say "I am encoded in
iso-2022-jp", the proxy should be able to say "actually, trust me, the content I am delivering to you is inutf-8".
In the end, the apparent consensus was that the "server knows best" camp had the sound architectural arguments behind them, and so, for everything on the web served via the HTTP protocol, HTTP has precedence over any other method in determining the encoding (and content type, etc.) of resources.
This means that regardless of what is in an (x)html document, if the server says "this is a text/html document encoded as utf-8", user agents should follow that information. Second guessing is likely to cause more harm than good.
Unlabeled boxes can be full of treasures, or full or trouble
But what if there is no character encoding declared at the HTTP level? This is where it gets tricky.
"Old school" HTML introduced a specific meta tag for the declaration of the encoding within the document:
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-5">
Over the years, we have seen that this method was plagued by two serious issues:
Its syntax.
Nobody seems to get it right (it is just... too complicated!) and the Web is littered with approximate, sometimes comical, variants of this syntax. This is no laughing matter for user agents, however, which can't even expect to find this encoding declaration properly marked up!
The
metaelements have to be within theheadof a document, but there is no guarantee that it will be anywhere near the top of the document. theheadof a document can have lots of other metadata, title, description, scripts and stylesheeets, before declaring the encoding. This means a lot of sniffing and pseudo-parsing of undecoded garbage. In some cases, it can have dreadful consequences, such as security flaws in the approximate sniffing code.
It is worth noting that current work on html5 tries to work around these issues by providing a simpler alternate syntax, and making sure that the declaration of encoding should be present at the very beginning of the head.
XML, on the other hand, had a way to declare encoding at the document level in the XML declaration. The good thing about that being that this declaration MUST be at the very beginning of the document, which alleviates the pain of having to sniff the content.
<?xml version="1.0" encoding="UTF-8"?>
The XML specification also defines, in its Appendix F, a recommended algorithm for the encoding detection.
The Recipe
Given all these potential sources for the declaration (or automatic detection) of the document character encoding, all potentially contradicting the others, what should be the recipe to reliably figure out which encoding to use?
The charset info in the HTTP
Content-Typeheader should have precedence. AlwaysNext in line is the charset information in the XML declaration. Which may be there, or may not.
For XHTML documents, and in particular for the XHTML documents served as
text/html, it is recommended to avoid using an XML declaration.But let's remember: XHTML is XML, and XML requires an XML declaration or some other method of declaration for XML documents using encodings other than UTF-8 or UTF-16 (or ascii, which is a convenient subset...).
As a result, there is a strong likeliness that anything served as
application/xhtml+xml(ortext/htmland looking a lot like XHTML), with neither encoding declaration at the HTTP level nor in an XML declaration is quite likely to be UTF-8 or UTF-16Then there is the BOM, a signature for Unicode character encodings.
Then comes the search for the
metainformation that might, just might, provide a character encoding declaration.Beyond that point, it's the land of defaults and heuristics. You may choose to default to
iso-8859-1fortext/htmlresources,utf-8forapplication/xhtml+xml.