W3C

HTML 5

A vocabulary and associated APIs for HTML and XHTML

Editor's Draft 26 May 2009

Latest Published Version:
http://www.w3.org/TR/html5/
Latest Editor's Draft:
http://www.w3.org/html/wg/html5/
Previous Versions:
http://www.w3.org/TR/2009/WD-html5-20090423/
http://www.w3.org/TR/2009/WD-html5-20090212/
http://www.w3.org/TR/2008/WD-html5-20080610/
http://www.w3.org/TR/2008/WD-html5-20080122/
Editors:
Ian Hickson, Google, Inc.
David Hyatt, Apple, Inc.

Abstract

This specification defines the 5th major revision of the core language of the World Wide Web: the Hypertext Markup Language (HTML). In this version, new features are introduced to help Web application authors, new elements are introduced based on research into prevailing authoring practices, and special attention has been given to defining clear conformance criteria for user agents in an effort to improve interoperability.

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the most recently formally published revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

The WHATWG version of this specification is available under a license that permits reuse of the specification text.

If you wish to make comments regarding this document, please send them to public-html-comments@w3.org (subscribe, archives) or whatwg@whatwg.org (subscribe, archives), or submit them using our public bug database. All feedback is welcome.

We maintain a list of all e-mails that have not yet been considered and a list of all bug reports that have not yet been resolved.

Implementors should be aware that this specification is not stable. Implementors who are not taking part in the discussions are likely to find the specification changing out from under them in incompatible ways. Vendors interested in implementing this specification before it eventually reaches the Candidate Recommendation stage should join the aforementioned mailing lists and take part in the discussions.

The publication of this document by the W3C as a W3C Working Draft does not imply that all of the participants in the W3C HTML working group endorse the contents of the specification. Indeed, for any section of the specification, one can usually find many members of the working group or of the W3C as a whole who object strongly to the current text, the existence of the section at all, or the idea that the working group should even spend time discussing the concept of that section.

The latest stable version of the editor's draft of this specification is always available on the W3C CVS server and in the WHATWG Subversion repository. The latest editor's working copy (which may contain unfinished text in the process of being prepared) is also available.

There are various ways to follow the change history for the specification:

E-mail notifications of changes
HTML-Diffs mailing list (diff-marked HTML versions for each change): http://lists.w3.org/Archives/Public/public-html-diffs/latest
Commit-Watchers mailing list (complete source diffs): http://lists.whatwg.org/listinfo.cgi/commit-watchers-whatwg.org
Real-time notifications of changes:
Generated diff-marked HTML versions for each change: http://twitter.com/HTML5
All (non-editorial) changes to the spec source: http://twitter.com/WHATWG
Browsable version-control record of all changes:
CVSWeb interface with side-by-side diffs: http://dev.w3.org/cvsweb/html5/spec/Overview.html
Annotated summary with unified diffs: http://html5.org/tools/web-apps-tracker
Raw Subversion interface: svn checkout http://svn.whatwg.org/webapps/

The W3C HTML Working Group is the W3C working group responsible for this specification's progress along the W3C Recommendation track. This specification is the 26 May 2009 Editor's Draft.

This specification is also being produced by the WHATWG. The two specifications are identical from the table of contents onwards.

This specification is intended to replace (be a new version of) what was previously the HTML4, XHTML 1.0, and DOM2 HTML specifications.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Stability

Different parts of this specification are at different levels of maturity.

Some of the more major known issues are marked like this. There are many other issues that have been raised as well; the issues given in this document are not the only known issues! Also, firing of events needs to be unified (right now some bubble, some don't, they all use different text to fire events, etc).

Table of contents

  1. 1 Introduction
    1. 1.1 Background
    2. 1.2 Audience
    3. 1.3 Scope
    4. 1.4 History
    5. 1.5 Design notes
      1. 1.5.1 Serializability of script execution
      2. 1.5.2 Compliance with other specifications
    6. 1.6 Relationships to other specifications
      1. 1.6.1 Relationship to HTML 4.01 and DOM2 HTML
      2. 1.6.2 Relationship to XHTML 1.x
      3. 1.6.3 Relationship to XHTML2 and XForms
    7. 1.7 HTML vs XHTML
    8. 1.8 Structure of this specification
      1. 1.8.1 How to read this specification
      2. 1.8.2 Typographic conventions
  2. 2 Common infrastructure
    1. 2.1 Terminology
      1. 2.1.1 XML
      2. 2.1.2 DOM trees
      3. 2.1.3 Scripting
      4. 2.1.4 Plugins
      5. 2.1.5 Character encodings
      6. 2.1.6 Resources
    2. 2.2 Conformance requirements
      1. 2.2.1 Dependencies
      2. 2.2.2 Features defined in other specifications
      3. 2.2.3 Common conformance requirements for APIs exposed to JavaScript
      4. 2.2.4 Extensibility
    3. 2.3 Case-sensitivity and string comparison
    4. 2.4 Common microsyntaxes
      1. 2.4.1 Common parser idioms
      2. 2.4.2 Boolean attributes
      3. 2.4.3 Keywords and enumerated attributes
      4. 2.4.4 Numbers
        1. 2.4.4.1 Non-negative integers
        2. 2.4.4.2 Signed integers
        3. 2.4.4.3 Real numbers
        4. 2.4.4.4 Ratios
        5. 2.4.4.5 Percentages and lengths
        6. 2.4.4.6 Lists of integers
        7. 2.4.4.7 Lists of dimensions
      5. 2.4.5 Dates and times
        1. 2.4.5.1 Months
        2. 2.4.5.2 Dates
        3. 2.4.5.3 Times
        4. 2.4.5.4 Local dates and times
        5. 2.4.5.5 Global dates and times
        6. 2.4.5.6 Weeks
        7. 2.4.5.7 Vaguer moments in time
      6. 2.4.6 Colors
      7. 2.4.7 Space-separated tokens
      8. 2.4.8 Comma-separated tokens
      9. 2.4.9 Reversed DNS identifiers
      10. 2.4.10 References
    5. 2.5 URLs
      1. 2.5.1 Terminology
      2. 2.5.2 Parsing URLs
      3. 2.5.3 Resolving URLs
      4. 2.5.4 Dynamic changes to base URLs
      5. 2.5.5 Interfaces for URL manipulation
    6. 2.6 Fetching resources
      1. 2.6.1 Protocol concepts
      2. 2.6.2 Encrypted HTTP and related security concerns
    7. 2.7 Determining the type of a resource
      1. 2.7.1 Content-Type metadata
      2. 2.7.2 Content-Type sniffing: Web pages
      3. 2.7.3 Content-Type sniffing: text or binary
      4. 2.7.4 Content-Type sniffing: unknown type
      5. 2.7.5 Content-Type sniffing: image
      6. 2.7.6 Content-Type sniffing: feed or HTML
    8. 2.8 Character encodings
    9. 2.9 Common DOM interfaces
      1. 2.9.1 Reflecting content attributes in DOM attributes
      2. 2.9.2 Collections
        1. 2.9.2.1 HTMLCollection
        2. 2.9.2.2 HTMLFormControlsCollection
        3. 2.9.2.3 HTMLOptionsCollection
        4. 2.9.2.4 HTMLPropertyCollection
      3. 2.9.3 DOMTokenList
      4. 2.9.4 DOMSettableTokenList
      5. 2.9.5 Safe passing of structured data
      6. 2.9.6 DOMStringMap
      7. 2.9.7 DOM feature strings
      8. 2.9.8 Exceptions
      9. 2.9.9 Garbage collection
  3. 3 Semantics and structure of HTML documents
    1. 3.1 Introduction
    2. 3.2 Documents
      1. 3.2.1 Documents in the DOM
      2. 3.2.2 Security
      3. 3.2.3 Resource metadata management
      4. 3.2.4 DOM tree accessors
    3. 3.3 Elements
      1. 3.3.1 Semantics
      2. 3.3.2 Elements in the DOM
      3. 3.3.3 Global attributes
        1. 3.3.3.1 The id attribute
        2. 3.3.3.2 The title attribute
        3. 3.3.3.3 The lang and xml:lang attributes
        4. 3.3.3.4 The xml:base attribute (XML only)
        5. 3.3.3.5 The dir attribute
        6. 3.3.3.6 The class attribute
        7. 3.3.3.7 The style attribute
        8. 3.3.3.8 Embedding custom non-visible data
    4. 3.4 Content models
      1. 3.4.1 Kinds of content
        1. 3.4.1.1 Metadata content
        2. 3.4.1.2 Flow content
        3. 3.4.1.3 Sectioning content
        4. 3.4.1.4 Heading content
        5. 3.4.1.5 Phrasing content
        6. 3.4.1.6 Embedded content
        7. 3.4.1.7 Interactive content
      2. 3.4.2 Transparent content models
    5. 3.5 Paragraphs
    6. 3.6 APIs in HTML documents
    7. 3.7 Dynamic markup insertion
      1. 3.7.1 Controlling the input stream
      2. 3.7.2 document.write()
      3. 3.7.3 document.writeln()
      4. 3.7.4 innerHTML
      5. 3.7.5 outerHTML
      6. 3.7.6 insertAdjacentHTML()
  4. 4 The elements of HTML
    1. 4.1 The root element
      1. 4.1.1 The html element
    2. 4.2 Document metadata
      1. 4.2.1 The head element
      2. 4.2.2 The title element
      3. 4.2.3 The base element
      4. 4.2.4 The link element
      5. 4.2.5 The meta element
        1. 4.2.5.1 Standard metadata names
        2. 4.2.5.2 Other metadata names
        3. 4.2.5.3 Pragma directives
        4. 4.2.5.4 Other pragma directives
        5. 4.2.5.5 Specifying the document's character encoding
      6. 4.2.6 The style element
      7. 4.2.7 Styling
    3. 4.3 Scripting
      1. 4.3.1 The script element
        1. 4.3.1.1 Scripting languages
        2. 4.3.1.2 Inline documentation for external scripts
      2. 4.3.2 The noscript element
    4. 4.4 Sections
      1. 4.4.1 The body element
      2. 4.4.2 The section element
      3. 4.4.3 The nav element
      4. 4.4.4 The article element
      5. 4.4.5 The aside element
      6. 4.4.6 The h1, h2, h3, h4, h5, and h6 elements
      7. 4.4.7 The hgroup element
      8. 4.4.8 The header element
      9. 4.4.9 The footer element
      10. 4.4.10 The address element
      11. 4.4.11 Headings and sections
        1. 4.4.11.1 Creating an outline
        2. 4.4.11.2 Distinguishing site-wide headings from page headings
    5. 4.5 Grouping content
      1. 4.5.1 The p element
      2. 4.5.2 The hr element
      3. 4.5.3 The br element
      4. 4.5.4 The pre element
      5. 4.5.5 The dialog element
      6. 4.5.6 The blockquote element
      7. 4.5.7 The ol element
      8. 4.5.8 The ul element
      9. 4.5.9 The li element
      10. 4.5.10 The dl element
      11. 4.5.11 The dt element
      12. 4.5.12 The dd element
      13. 4.5.13 Common grouping idioms
        1. 4.5.13.1 Tag clouds
    6. 4.6 Text-level semantics
      1. 4.6.1 The a element
      2. 4.6.2 The q element
      3. 4.6.3 The cite element
      4. 4.6.4 The em element
      5. 4.6.5 The strong element
      6. 4.6.6 The small element
      7. 4.6.7 The mark element
      8. 4.6.8 The dfn element
      9. 4.6.9 The abbr element
      10. 4.6.10 The time element
      11. 4.6.11 The progress element
      12. 4.6.12 The meter element
      13. 4.6.13 The code element
      14. 4.6.14 The var element
      15. 4.6.15 The samp element
      16. 4.6.16 The kbd element
      17. 4.6.17 The sub and sup elements
      18. 4.6.18 The span element
      19. 4.6.19 The i element
      20. 4.6.20 The b element
      21. 4.6.21 The bdo element
      22. 4.6.22 The ruby element
      23. 4.6.23 The rt element
      24. 4.6.24 The rp element
      25. 4.6.25 Usage summary
      26. 4.6.26 Footnotes
    7. 4.7 Edits
      1. 4.7.1 The ins element
      2. 4.7.2 The del element
      3. 4.7.3 Attributes common to ins and del elements
      4. 4.7.4 Edits and paragraphs
      5. 4.7.5 Edits and lists
    8. 4.8 Embedded content
      1. 4.8.1 The figure element
      2. 4.8.2 The img element
        1. 4.8.2.1 Requirements for providing text to act as an alternative for images
          1. 4.8.2.1.1 A link or button containing nothing but the image
          2. 4.8.2.1.2 A phrase or paragraph with an alternative graphical representation: charts, diagrams, graphs, maps, illustrations
          3. 4.8.2.1.3 A short phrase or label with an alternative graphical representation: icons, logos
          4. 4.8.2.1.4 Text that has been rendered to a graphic for typographical effect
          5. 4.8.2.1.5 A graphical representation of some of the surrounding text
          6. 4.8.2.1.6 A purely decorative image that doesn't add any information
          7. 4.8.2.1.7 A group of images that form a single larger picture with no links
          8. 4.8.2.1.8 A group of images that form a single larger picture with links
          9. 4.8.2.1.9 A key part of the content
          10. 4.8.2.1.10 An image not intended for the user
          11. 4.8.2.1.11 An image in an e-mail or private document intended for a specific person who is known to be able to view images
          12. 4.8.2.1.12 General guidelines
          13. 4.8.2.1.13 Guidance for markup generators
          14. 4.8.2.1.14 Guidance for conformance checkers
      3. 4.8.3 The iframe element
      4. 4.8.4 The embed element
      5. 4.8.5 The object element
      6. 4.8.6 The param element
      7. 4.8.7 The video element
        1. 4.8.7.1 Video and audio codecs for video elements
      8. 4.8.8 The audio element
        1. 4.8.8.1 Audio codecs for audio elements
      9. 4.8.9 The source element
      10. 4.8.10 Media elements
        1. 4.8.10.1 Error codes
        2. 4.8.10.2 Location of the media resource
        3. 4.8.10.3 Media types
        4. 4.8.10.4 Network states
        5. 4.8.10.5 Loading the media resource
        6. 4.8.10.6 Offsets into the media resource
        7. 4.8.10.7 The ready states
        8. 4.8.10.8 Cue ranges
        9. 4.8.10.9 Playing the media resource
        10. 4.8.10.10 Seeking
        11. 4.8.10.11 User interface
        12. 4.8.10.12 Time ranges
        13. 4.8.10.13 Event summary
        14. 4.8.10.14 Security and privacy considerations
      11. 4.8.11 The canvas element
        1. 4.8.11.1 The 2D context
          1. 4.8.11.1.1 The canvas state
          2. 4.8.11.1.2 Transformations
          3. 4.8.11.1.3 Compositing
          4. 4.8.11.1.4 Colors and styles
          5. 4.8.11.1.5 Line styles
          6. 4.8.11.1.6 Shadows
          7. 4.8.11.1.7 Simple shapes (rectangles)
          8. 4.8.11.1.8 Complex shapes (paths)
          9. 4.8.11.1.9 Text
          10. 4.8.11.1.10 Images
          11. 4.8.11.1.11 Pixel manipulation
          12. 4.8.11.1.12 Drawing model
        2. 4.8.11.2 Color spaces and color correction
        3. 4.8.11.3 Security with canvas elements
      12. 4.8.12 The map element
      13. 4.8.13 The area element
      14. 4.8.14 Image maps
        1. 4.8.14.1 Authoring
        2. 4.8.14.2 Processing model
      15. 4.8.15 MathML
      16. 4.8.16 SVG
      17. 4.8.17 Dimension attributes
    9. 4.9 Tabular data
      1. 4.9.1 Introduction
      2. 4.9.2 The table element
      3. 4.9.3 The caption element
      4. 4.9.4 The colgroup element
      5. 4.9.5 The col element
      6. 4.9.6 The tbody element
      7. 4.9.7 The thead element
      8. 4.9.8 The tfoot element
      9. 4.9.9 The tr element
      10. 4.9.10 The td element
      11. 4.9.11 The th element
      12. 4.9.12 Attributes common to td and th elements
      13. 4.9.13 Processing model
        1. 4.9.13.1 Forming a table
        2. 4.9.13.2 Forming relationships between data cells and header cells
    10. 4.10 Forms
      1. 4.10.1 The form element
      2. 4.10.2 The fieldset element
      3. 4.10.3 The label element
      4. 4.10.4 The input element
        1. 4.10.4.1 States of the type attribute
          1. 4.10.4.1.1 Hidden state
          2. 4.10.4.1.2 Text state and Search state
          3. 4.10.4.1.3 Telephone state
          4. 4.10.4.1.4 URL state
          5. 4.10.4.1.5 E-mail state
          6. 4.10.4.1.6 Password state
          7. 4.10.4.1.7 Date and Time state
          8. 4.10.4.1.8 Date state
          9. 4.10.4.1.9 Month state
          10. 4.10.4.1.10 Week state
          11. 4.10.4.1.11 Time state
          12. 4.10.4.1.12 Local Date and Time state
          13. 4.10.4.1.13 Number state
          14. 4.10.4.1.14 Range state
          15. 4.10.4.1.15 Color state
          16. 4.10.4.1.16 Checkbox state
          17. 4.10.4.1.17 Radio Button state
          18. 4.10.4.1.18 File Upload state
          19. 4.10.4.1.19 Submit Button state
          20. 4.10.4.1.20 Image Button state
          21. 4.10.4.1.21 Reset Button state
          22. 4.10.4.1.22 Button state
        2. 4.10.4.2 Common input element attributes
          1. 4.10.4.2.1 The autocomplete attribute
          2. 4.10.4.2.2 The list attribute
          3. 4.10.4.2.3 The readonly attribute
          4. 4.10.4.2.4 The size attribute
          5. 4.10.4.2.5 The required attribute
          6. 4.10.4.2.6 The multiple attribute
          7. 4.10.4.2.7 The maxlength attribute
          8. 4.10.4.2.8 The pattern attribute
          9. 4.10.4.2.9 The min and max attributes
          10. 4.10.4.2.10 The step attribute
          11. 4.10.4.2.11 The placeholder attribute
        3. 4.10.4.3 Common input element APIs
        4. 4.10.4.4 Common event behaviors
      5. 4.10.5 The button element
      6. 4.10.6 The select element
      7. 4.10.7 The datalist element
      8. 4.10.8 The optgroup element
      9. 4.10.9 The option element
      10. 4.10.10 The textarea element
      11. 4.10.11 The keygen element
      12. 4.10.12 The output element
      13. 4.10.13 Association of controls and forms
      14. 4.10.14 Attributes common to form controls
        1. 4.10.14.1 Naming form controls
        2. 4.10.14.2 Enabling and disabling form controls
        3. 4.10.14.3 A form control's value
        4. 4.10.14.4 Autofocusing a form control
        5. 4.10.14.5 Limiting user input length
        6. 4.10.14.6 Form submission
      15. 4.10.15 Constraints
        1. 4.10.15.1 Definitions
        2. 4.10.15.2 Constraint validation
        3. 4.10.15.3 The constraint validation API
        4. 4.10.15.4 Security
      16. 4.10.16 Form submission
        1. 4.10.16.1 Introduction
        2. 4.10.16.2 Implicit submission
        3. 4.10.16.3 Form submission algorithm
        4. 4.10.16.4 URL-encoded form data
        5. 4.10.16.5 Multipart form data
        6. 4.10.16.6 Plain text form data
      17. 4.10.17 Resetting a form
      18. 4.10.18 Event dispatch
    11. 4.11 Interactive elements
      1. 4.11.1 The details element
      2. 4.11.2 The datagrid element
        1. 4.11.2.1 Introduction
          1. 4.11.2.1.1 Example: a datagrid backed by a static table element
          2. 4.11.2.1.2 Example: a datagrid backed by nested ol elements
          3. 4.11.2.1.3 Example: a datagrid backed by a server
        2. 4.11.2.2 Populating the datagrid
          1. 4.11.2.2.1 The listener
          2. 4.11.2.2.2 The columns
          3. 4.11.2.2.3 The rows
          4. 4.11.2.2.4 The cells
        3. 4.11.2.3 Listening to notifications from the datagrid
      3. 4.11.3 The command element
      4. 4.11.4 The bb element
        1. 4.11.4.1 Browser button types
          1. 4.11.4.1.1 The make application state
      5. 4.11.5 The menu element
        1. 4.11.5.1 Introduction
        2. 4.11.5.2 Building menus and tool bars
        3. 4.11.5.3 Context menus
        4. 4.11.5.4 Tool bars
      6. 4.11.6 Commands
        1. 4.11.6.1 Using the a element to define a command
        2. 4.11.6.2 Using the button element to define a command
        3. 4.11.6.3 Using the input element to define a command
        4. 4.11.6.4 Using the option element to define a command
        5. 4.11.6.5 Using the command element to define a command
        6. 4.11.6.6 Using the bb element to define a command
        7. 4.11.6.7 Using the accesskey attribute to define a command
    12. 4.12 Miscellaneous elements
      1. 4.12.1 The legend element
      2. 4.12.2 The div element
    13. 4.13 Matching HTML elements using selectors
  5. 5 Microdata
    1. 5.1 Introduction
      1. 5.1.1 The basic syntax
      2. 5.1.2 Typed items
      3. 5.1.3 Selecting names when defining vocabularies
      4. 5.1.4 Using the microdata DOM API
    2. 5.2 Encoding microdata
      1. 5.2.1 The microdata model
      2. 5.2.2 Items: the item attribute
      3. 5.2.3 Associating names with items
      4. 5.2.4 Names: the itemprop attribute
      5. 5.2.5 Values
    3. 5.3 Microdata DOM API
    4. 5.4 Predefined vocabularies
      1. 5.4.1 vCard
        1. 5.4.1.1 Examples
      2. 5.4.2 vEvent
        1. 5.4.2.1 Examples
      3. 5.4.3 BibTeX
        1. 5.4.3.1 Examples
      4. 5.4.4 RDF
    5. 5.5 Converting HTML to other formats
      1. 5.5.1 JSON
      2. 5.5.2 RDF
      3. 5.5.3 vCard
      4. 5.5.4 iCalendar
      5. 5.5.5 BibTeX
      6. 5.5.6 Atom
  6. 6 Web browsers
    1. 6.1 Browsing contexts
      1. 6.1.1 Nested browsing contexts
        1. 6.1.1.1 Navigating nested browsing contexts in the DOM
      2. 6.1.2 Auxiliary browsing contexts
        1. 6.1.2.1 Navigating auxiliary browsing contexts in the DOM
      3. 6.1.3 Secondary browsing contexts
      4. 6.1.4 Security
      5. 6.1.5 Groupings of browsing contexts
      6. 6.1.6 Browsing context names
    2. 6.2 The WindowProxy object
    3. 6.3 The Window object
      1. 6.3.1 Security
      2. 6.3.2 APIs for creating and navigating browsing contexts by name
      3. 6.3.3 Accessing other browsing contexts
      4. 6.3.4 Named access on the Window object
      5. 6.3.5 Garbage collection and browsing contexts
      6. 6.3.6 Browser interface elements
    4. 6.4 Origin
      1. 6.4.1 Relaxing the same-origin restriction
    5. 6.5 Scripting
      1. 6.5.1 Introduction
      2. 6.5.2 Enabling and disabling scripting
      3. 6.5.3 Processing model
        1. 6.5.3.1 Definitions
        2. 6.5.3.2 Calling scripts
        3. 6.5.3.3 Creating scripts
        4. 6.5.3.4 Killing scripts
      4. 6.5.4 Event loops
        1. 6.5.4.1 Definitions
        2. 6.5.4.2 Processing model
        3. 6.5.4.3 Generic task sources
      5. 6.5.5 The javascript: protocol
      6. 6.5.6 Events
        1. 6.5.6.1 Event handler attributes
        2. 6.5.6.2 Event handler attributes on elements, Document objects, and Window objects
        3. 6.5.6.3 Event firing
        4. 6.5.6.4 Events and the Window object
        5. 6.5.6.5 Runtime script errors
    6. 6.6 Timers
    7. 6.7 User prompts
      1. 6.7.1 Simple dialogs
      2. 6.7.2 Printing
      3. 6.7.3 Dialogs implemented using separate documents
    8. 6.8 System state and capabilities
      1. 6.8.1 Client identification
      2. 6.8.2 Custom protocol and content handlers
        1. 6.8.2.1 Security and privacy
        2. 6.8.2.2 Sample user interface
      3. 6.8.3 Manually releasing the storage mutex
    9. 6.9 Offline Web applications
      1. 6.9.1 Introduction
      2. 6.9.2 Application caches
      3. 6.9.3 The cache manifest syntax
        1. 6.9.3.1 A sample manifest
        2. 6.9.3.2 Writing cache manifests
        3. 6.9.3.3 Parsing cache manifests
      4. 6.9.4 Updating an application cache
      5. 6.9.5 Matching a fallback namespace
      6. 6.9.6 The application cache selection algorithm
      7. 6.9.7 Changes to the networking model
      8. 6.9.8 Application cache API
      9. 6.9.9 Browser state
    10. 6.10 Session history and navigation
      1. 6.10.1 The session history of browsing contexts
      2. 6.10.2 The History interface
      3. 6.10.3 Activating state object entries
      4. 6.10.4 The Location interface
        1. 6.10.4.1 Security
      5. 6.10.5 Implementation notes for session history
    11. 6.11 Browsing the Web
      1. 6.11.1 Navigating across documents
      2. 6.11.2 Page load processing model for HTML files
      3. 6.11.3 Page load processing model for XML files
      4. 6.11.4 Page load processing model for text files
      5. 6.11.5 Page load processing model for images
      6. 6.11.6 Page load processing model for content that uses plugins
      7. 6.11.7 Page load processing model for inline content that doesn't have a DOM
      8. 6.11.8 Navigating to a fragment identifier
      9. 6.11.9 History traversal
      10. 6.11.10 Unloading documents
        1. 6.11.10.1 Event definition
    12. 6.12 Links
      1. 6.12.1 Hyperlink elements
      2. 6.12.2 Following hyperlinks
        1. 6.12.2.1 Hyperlink auditing
      3. 6.12.3 Link types
        1. 6.12.3.1 Link type "alternate"
        2. 6.12.3.2 Link type "archives"
        3. 6.12.3.3 Link type "author"
        4. 6.12.3.4 Link type "bookmark"
        5. 6.12.3.5 Link type "external"
        6. 6.12.3.6 Link type "feed"
        7. 6.12.3.7 Link type "help"
        8. 6.12.3.8 Link type "icon"
        9. 6.12.3.9 Link type "license"
        10. 6.12.3.10 Link type "nofollow"
        11. 6.12.3.11 Link type "noreferrer"
        12. 6.12.3.12 Link type "pingback"
        13. 6.12.3.13 Link type "prefetch"
        14. 6.12.3.14 Link type "search"
        15. 6.12.3.15 Link type "stylesheet"
        16. 6.12.3.16 Link type "sidebar"
        17. 6.12.3.17 Link type "tag"
        18. 6.12.3.18 Hierarchical link types
          1. 6.12.3.18.1 Link type "index"
          2. 6.12.3.18.2 Link type "up"
        19. 6.12.3.19 Sequential link types
          1. 6.12.3.19.1 Link type "first"
          2. 6.12.3.19.2 Link type "last"
          3. 6.12.3.19.3 Link type "next"
          4. 6.12.3.19.4 Link type "prev"
        20. 6.12.3.20 Other link types
  7. 7 User Interaction
    1. 7.1 Introduction
    2. 7.2 The hidden attribute
    3. 7.3 Activation
    4. 7.4 Scrolling elements into view
    5. 7.5 Focus
      1. 7.5.1 Sequential focus navigation
      2. 7.5.2 Focus management
      3. 7.5.3 Document-level focus APIs
      4. 7.5.4 Element-level focus APIs
    6. 7.6 The accesskey attribute
    7. 7.7 The text selection APIs
      1. 7.7.1 APIs for the browsing context selection
      2. 7.7.2 APIs for the text field selections
    8. 7.8 The contenteditable attribute
      1. 7.8.1 User editing actions
      2. 7.8.2 Making entire documents editable
    9. 7.9 Spelling and grammar checking
    10. 7.10 Drag and drop
      1. 7.10.1 Introduction
      2. 7.10.2 The DragEvent and DataTransfer interfaces
      3. 7.10.3 Events fired during a drag-and-drop action
      4. 7.10.4 Drag-and-drop processing model
        1. 7.10.4.1 When the drag-and-drop operation starts or ends in another document
        2. 7.10.4.2 When the drag-and-drop operation starts or ends in another application
      5. 7.10.5 The draggable attribute
      6. 7.10.6 Copy and paste
        1. 7.10.6.1 Copy to clipboard
        2. 7.10.6.2 Cut to clipboard
        3. 7.10.6.3 Paste from clipboard
        4. 7.10.6.4 Paste from selection
      7. 7.10.7 Security risks in the drag-and-drop model
    11. 7.11 Undo history
      1. 7.11.1 Introduction
      2. 7.11.2 Definitions
      3. 7.11.3 The UndoManager interface
      4. 7.11.4 Undo: moving back in the undo transaction history
      5. 7.11.5 Redo: moving forward in the undo transaction history
      6. 7.11.6 The UndoManagerEvent interface and the undo and redo events
      7. 7.11.7 Implementation notes
    12. 7.12 Editing APIs
  8. 8 Communication
    1. 8.1 Event definitions
    2. 8.2 Cross-document messaging
      1. 8.2.1 Introduction
      2. 8.2.2 Security
        1. 8.2.2.1 Authors
        2. 8.2.2.2 User agents
      3. 8.2.3 Posting messages
      4. 8.2.4 Posting messages with message ports
    3. 8.3 Channel messaging
      1. 8.3.1 Introduction
      2. 8.3.2 Message channels
      3. 8.3.3 Message ports
        1. 8.3.3.1 Ports and garbage collection
  9. 9 The HTML syntax
    1. 9.1 Writing HTML documents
      1. 9.1.1 The DOCTYPE
      2. 9.1.2 Elements
        1. 9.1.2.1 Start tags
        2. 9.1.2.2 End tags
        3. 9.1.2.3 Attributes
        4. 9.1.2.4 Optional tags
        5. 9.1.2.5 Restrictions on content models
        6. 9.1.2.6 Restrictions on the contents of CDATA and RCDATA elements
      3. 9.1.3 Text
        1. 9.1.3.1 Newlines
      4. 9.1.4 Character references
      5. 9.1.5 CDATA sections
      6. 9.1.6 Comments
    2. 9.2 Parsing HTML documents
      1. 9.2.1 Overview of the parsing model
      2. 9.2.2 The input stream
        1. 9.2.2.1 Determining the character encoding
        2. 9.2.2.2 Preprocessing the input stream
        3. 9.2.2.3 Changing the encoding while parsing
      3. 9.2.3 Parse state
        1. 9.2.3.1 The insertion mode
        2. 9.2.3.2 The stack of open elements
        3. 9.2.3.3 The list of active formatting elements
        4. 9.2.3.4 The element pointers
        5. 9.2.3.5 Other parsing state flags
      4. 9.2.4 Tokenization
        1. 9.2.4.1 Data state
        2. 9.2.4.2 Character reference data state
        3. 9.2.4.3 Tag open state
        4. 9.2.4.4 Close tag open state
        5. 9.2.4.5 Tag name state
        6. 9.2.4.6 Before attribute name state
        7. 9.2.4.7 Attribute name state
        8. 9.2.4.8 After attribute name state
        9. 9.2.4.9 Before attribute value state
        10. 9.2.4.10 Attribute value (double-quoted) state
        11. 9.2.4.11 Attribute value (single-quoted) state
        12. 9.2.4.12 Attribute value (unquoted) state
        13. 9.2.4.13 Character reference in attribute value state
        14. 9.2.4.14 After attribute value (quoted) state
        15. 9.2.4.15 Self-closing start tag state
        16. 9.2.4.16 Bogus comment state
        17. 9.2.4.17 Markup declaration open state
        18. 9.2.4.18 Comment start state
        19. 9.2.4.19 Comment start dash state
        20. 9.2.4.20 Comment state
        21. 9.2.4.21 Comment end dash state
        22. 9.2.4.22 Comment end state
        23. 9.2.4.23 DOCTYPE state
        24. 9.2.4.24 Before DOCTYPE name state
        25. 9.2.4.25 DOCTYPE name state
        26. 9.2.4.26 After DOCTYPE name state
        27. 9.2.4.27 Before DOCTYPE public identifier state
        28. 9.2.4.28 DOCTYPE public identifier (double-quoted) state
        29. 9.2.4.29 DOCTYPE public identifier (single-quoted) state
        30. 9.2.4.30 After DOCTYPE public identifier state
        31. 9.2.4.31 Before DOCTYPE system identifier state
        32. 9.2.4.32 DOCTYPE system identifier (double-quoted) state
        33. 9.2.4.33 DOCTYPE system identifier (single-quoted) state
        34. 9.2.4.34 After DOCTYPE system identifier state
        35. 9.2.4.35 Bogus DOCTYPE state
        36. 9.2.4.36 CDATA section state
        37. 9.2.4.37 Tokenizing character references
      5. 9.2.5 Tree construction
        1. 9.2.5.1 Creating and inserting elements
        2. 9.2.5.2 Closing elements that have implied end tags
        3. 9.2.5.3 Foster parenting
        4. 9.2.5.4 The "initial" insertion mode
        5. 9.2.5.5 The "before html" insertion mode
        6. 9.2.5.6 The "before head" insertion mode
        7. 9.2.5.7 The "in head" insertion mode
        8. 9.2.5.8 The "in head noscript" insertion mode
        9. 9.2.5.9 The "after head" insertion mode
        10. 9.2.5.10 The "in body" insertion mode
        11. 9.2.5.11 The "in CDATA/RCDATA" insertion mode
        12. 9.2.5.12 The "in table" insertion mode
        13. 9.2.5.13 The "in caption" insertion mode
        14. 9.2.5.14 The "in column group" insertion mode
        15. 9.2.5.15 The "in table body" insertion mode
        16. 9.2.5.16 The "in row" insertion mode
        17. 9.2.5.17 The "in cell" insertion mode
        18. 9.2.5.18 The "in select" insertion mode
        19. 9.2.5.19 The "in select in table" insertion mode
        20. 9.2.5.20 The "in foreign content" insertion mode
        21. 9.2.5.21 The "after body" insertion mode
        22. 9.2.5.22 The "in frameset" insertion mode
        23. 9.2.5.23 The "after frameset" insertion mode
        24. 9.2.5.24 The "after after body" insertion mode
        25. 9.2.5.25 The "after after frameset" insertion mode
      6. 9.2.6 The end
      7. 9.2.7 Coercing an HTML DOM into an infoset
    3. 9.3 Namespaces
    4. 9.4 Serializing HTML fragments
    5. 9.5 Parsing HTML fragments
    6. 9.6 Named character references
  10. 10 The XHTML syntax
    1. 10.1 Writing XHTML documents
    2. 10.2 Parsing XHTML documents
    3. 10.3 Serializing XHTML fragments
    4. 10.4 Parsing XHTML fragments
  11. 11 Rendering
    1. 11.1 Introduction
    2. 11.2 The CSS user agent style sheet and presentational hints
      1. 11.2.1 Introduction
      2. 11.2.2 Display types
      3. 11.2.3 Margins and padding
      4. 11.2.4 Alignment
      5. 11.2.5 Fonts and colors
      6. 11.2.6 Punctuation and decorations
      7. 11.2.7 Resetting rules for inherited properties
      8. 11.2.8 The hr element
      9. 11.2.9 The fieldset element
    3. 11.3 Replaced elements
      1. 11.3.1 Embedded content
      2. 11.3.2 Images
      3. 11.3.3 Attributes for embedded content and images
      4. 11.3.4 Image maps
      5. 11.3.5 Tool bars
    4. 11.4 Bindings
      1. 11.4.1 Introduction
      2. 11.4.2 The bb element
      3. 11.4.3 The button element
      4. 11.4.4 The datagrid element
      5. 11.4.5 The details element
      6. 11.4.6 The input element as a text entry widget
      7. 11.4.7 The input element as domain-specific widgets
      8. 11.4.8 The input element as a range control
      9. 11.4.9 The input element as a color well
      10. 11.4.10 The input element as a check box and radio button widgets
      11. 11.4.11 The input element as a file upload control
      12. 11.4.12 The input element as a button
      13. 11.4.13 The marquee element
      14. 11.4.14 The meter element
      15. 11.4.15 The progress element
      16. 11.4.16 The select element
      17. 11.4.17 The textarea element
      18. 11.4.18 The keygen element
      19. 11.4.19 The time element
    5. 11.5 Frames and framesets
    6. 11.6 Interactive media
      1. 11.6.1 Links, forms, and navigation
      2. 11.6.2 The mark element
      3. 11.6.3 The title attribute
    7. 11.7 Print media
    8. 11.8 Interaction with CSS
  12. 12 Obsolete features
    1. 12.1 Self-contained features
      1. 12.1.1 The applet element
      2. 12.1.2 The marquee element
    2. 12.2 Other elements and attributes
    3. 12.3 Other DOM APIs
    4. 12.4 Conformance checkers
  13. 13 Things that you can't do with this specification because they are better handled using other technologies that are further described herein
    1. 13.1 Localization
    2. 13.2 Declarative 3D scenes
    3. 13.3 Rendering and the DOM
  14. Index
  15. References
  16. Acknowledgements

1 Introduction

1.1 Background

This section is non-normative.

The World Wide Web's markup language has always been HTML. HTML was primarily designed as a language for semantically describing scientific documents, although its general design and adaptations over the years has enabled it to be used to describe a number of other types of documents.

The main area that has not been adequately addressed by HTML is a vague subject referred to as Web Applications. This specification attempts to rectify this, while at the same time updating the HTML specifications to address issues raised in the past few years.

1.2 Audience

This section is non-normative.

This specification is intended for authors of documents and scripts that use the features defined in this specification, and implementors of tools that are intended to conform to this specification, and individuals wishing to establish the correctness of documents or implementations with respect to the requirements of this specification.

This document is probably not suited to readers who do not already have at least a passing familiarity with Web technologies, as in places it sacrifices clarity for precision, and brevity for completeness. More approachable tutorials and authoring guides can provide a gentler introduction to the topic.

In particular, readers should be familiar with the basics of DOM Core and DOM Events before reading this specification. An understanding of WebIDL, HTTP, XML, Unicode, character encodings, JavaScript, and CSS will be helpful in places but is not essential.

1.3 Scope

This section is non-normative.

This specification is limited to providing a semantic-level markup language and associated semantic-level scripting APIs for authoring accessible pages on the Web ranging from static documents to dynamic applications.

The scope of this specification does not include providing mechanisms for media-specific customization of presentation (although default rendering rules for Web browsers are included at the end of this specification, and several mechanisms for hooking into CSS are provided as part of the language).

The scope of this specification does not include documenting every HTML or DOM feature supported by Web browsers. Browsers support many features that are considered to be very bad for accessibility or that are otherwise inappropriate. For example, the blink element is clearly presentational and authors wishing to cause text to blink should instead use CSS.

The scope of this specification is not to describe an entire operating system. In particular, hardware configuration software, image manipulation tools, and applications that users would be expected to use with high-end workstations on a daily basis are out of scope. In terms of applications, this specification is targeted specifically at applications that would be expected to be used by users on an occasional basis, or regularly but from disparate locations, with low CPU requirements. For instance online purchasing systems, searching systems, games (especially multiplayer online games), public telephone books or address books, communications software (e-mail clients, instant messaging clients, discussion software), document editing software, etc.

1.4 History

This section is non-normative.

Work on HTML5 originally started in late 2003, as a proof of concept to show that it was possible to extend HTML4's forms to provide many of the features that XForms 1.0 introduced, without requiring browsers to implement rendering engines that were incompatible with existing HTML Web pages. At this early stage, while the draft was already publicly available, and input was already being solicited from all sources, the specification was only under Opera Software's copyright.

In early 2004, some of the principles that underlie this effort, as well as an early draft proposal covering just forms-related features, were presented to the W3C jointly by Mozilla and Opera at a workshop discussing the future of Web Applications on the Web. The proposal was rejected on the grounds that the proposal conflicted with the previously chosen direction for the Web's evolution.

Shortly thereafter, Apple, Mozilla, and Opera jointly announced their intent to continue working on the effort. A public mailing list was created, and the drafts were moved to the WHATWG site. The copyright was subsequently amended to be jointly owned by all three vendors, and to allow reuse of the specifications.

In 2006, the W3C expressed interest in the specification, and created a working group chartered to work with the WHATWG on the development of the HTML5 specifications. The working group opened in 2007. Apple, Mozilla, and Opera allowed the W3C to publish the specifications under the W3C copyright, while keeping versions with the less restrictive license on the WHATWG site.

Since then, both groups have been working together.

1.5 Design notes

This section is non-normative.

It must be admitted that many aspects of HTML appear at first glance to be nonsensical and inconsistent.

HTML, its supporting DOM APIs, as well as many of its supporting technologies, have been developed over a period of several decades by a wide array of people with different priorities who, in many cases, did not know of each other's existence.

Features have thus arisen from many sources, and have not always been designed in especially consistent ways. Furthermore, because of the unique characteristics of the Web, implementation bugs have often become de-facto, and now de-jure, standards, as content is often unintentionally written in ways that rely on them before they can be fixed.

Despite all this, efforts have been made to adhere to certain design goals. These are described in the next few subsections.

1.5.1 Serializability of script execution

This section is non-normative.

To avoid exposing Web authors to the complexities of multithreading, the HTML and DOM APIs are designed such that no script can ever detect the simultaneous execution of other scripts. Even with workers, the intent is that the behavior of implementations can be thought of as completely serialising the execution of all scripts in all browsing contexts.

The navigator.getStorageUpdates() method, in this model, is equivalent to allowing other scripts to run while the calling script is blocked.

1.5.2 Compliance with other specifications

This section is non-normative.

HTML5 interacts with and relies on a wide variety of other specifications. In certain circumstances, unfortunately, the desire to be compatible with legacy content has led to HTML5 violating the requirements of these other specifications. Whenever this has occured, the transgressions have been noted as "willful violations".

1.6 Relationships to other specifications

1.6.1 Relationship to HTML 4.01 and DOM2 HTML

This section is non-normative.

This specification represents a new version of HTML4, along with a new version of the associated DOM2 HTML API. Migration from HTML4 to the format and APIs described in this specification should in most cases be straightforward, as care has been taken to ensure that backwards-compatibility is retained. [HTML4] [DOM2HTML]

1.6.2 Relationship to XHTML 1.x

This section is non-normative.

This specification is intended to replace XHTML 1.0 as the normative definition of the XML serialization of the HTML vocabulary. [XHTML10]

While this specification updates the semantics and requirements of the vocabulary defined by XHTML Modularization 1.1 and used by XHTML 1.1, it does not attempt to provide a replacement for the modularization scheme defined and used by those (and other) specifications, and therefore cannot be considered a complete replacement for them. [XHTMLMOD] [XHTML11]

Thus, authors and implementors who do not need such a modularization scheme can consider this specification a replacement for XHTML 1.x, but those who do need such a mechanism are encouraged to continue using the XHTML 1.1 line of specifications.

1.6.3 Relationship to XHTML2 and XForms

This section is non-normative.

XHTML2 defines a new vocabulary with features for hyperlinks, multimedia content, annotating document edits, rich metadata, declarative interactive forms, and describing the semantics of human literary works such as poems and scientific papers. [XHTML2]

XForms similarly defines a new vocabulary with features for complex data entry, such as tax forms or insurance forms.

However, XHTML2 and XForms lack features to express the semantics of many of the non-document types of content often seen on the Web. For instance, they are not well-suited for marking up forum sites, auction sites, search engines, online shops, mapping applications, e-mail applications, word processors, real-time strategy games, and the like.

This specification aims to extend HTML so that it is also suitable in these contexts.

XHTML2, XForms, and this specification all use different namespaces and therefore can all be implemented in the same XML processor.

1.7 HTML vs XHTML

This section is non-normative.

This specification defines an abstract language for describing documents and applications, and some APIs for interacting with in-memory representations of resources that use this language.

The in-memory representation is known as "DOM5 HTML", or "the DOM" for short.

There are various concrete syntaxes that can be used to transmit resources that use this abstract language, two of which are defined in this specification.

The first such concrete syntax is "HTML5". This is the format recommended for most authors. It is compatible with all legacy Web browsers. If a document is transmitted with the MIME type text/html, then it will be processed as an "HTML5" document by Web browsers.

The second concrete syntax uses XML, and is known as "XHTML5". When a document is transmitted with an XML MIME type, such as application/xhtml+xml, then it is processed by an XML processor by Web browsers, and treated as an "XHTML5" document. Authors are reminded that the processing for XML and HTML differs; in particular, even minor syntax errors will prevent an XML document from being rendered fully, whereas they would be ignored in the "HTML5" syntax.

The "DOM5 HTML", "HTML5", and "XHTML5" representations cannot all represent the same content. For example, namespaces cannot be represented using "HTML5", but they are supported in "DOM5 HTML" and "XHTML5". Similarly, documents that use the noscript feature can be represented using "HTML5", but cannot be represented with "XHTML5" and "DOM5 HTML". Comments that contain the string "-->" can be represented in "DOM5 HTML" but not in "HTML5" and "XHTML5". And so forth.

1.8 Structure of this specification

This section is non-normative.

This specification is divided into the following major sections:

Common Infrastructure
The conformance classes, algorithms, definitions, and the common underpinnings of the rest of the specification.
The DOM
Documents are built from elements. These elements form a tree using the DOM. This section defines the features of this DOM, as well as introducing the features common to all elements, and the concepts used in defining elements.
Elements
Each element has a predefined meaning, which is explained in this section. Rules for authors on how to use the element, along with user agent requirements for how to handle each element, are also given.
Web Browsers
HTML documents do not exist in a vacuum — this section defines many of the features that affect environments that deal with multiple pages, links between pages, and running scripts.
User Interaction
HTML documents can provide a number of mechanisms for users to interact with and modify content, which are described in this section.
The Communication APIs
Applications written in HTML often require mechanisms to communicate with remote servers, as well as communicating with other applications from different domains running on the same client.
The Language Syntax
All of these features would be for naught if they couldn't be represented in a serialized form and sent to other people, and so this section defines the syntax of HTML, along with rules for how to parse HTML.

There are also a couple of appendices, defining rendering rules for Web browsers and listing areas that are out of scope for this specification.

1.8.1 How to read this specification

This specification should be read like all other specifications. First, it should be read cover-to-cover, multiple times. Then, it should be read backwards at least once. Then it should be read by picking random sections from the contents list and following all the cross-references.

1.8.2 Typographic conventions

This is a definition, requirement, or explanation.

This is a note.

This is an example.

This is an open issue.

This is a warning.

interface Example {
  // this is an IDL definition
};
variable = object . method( [ optionalArgument ] )

This is a note to authors describing the usage of an interface.

/* this is a CSS fragment */

The defining instance of a term is marked up like this. Uses of that term are marked up like this or like this.

The defining instance of an element, attribute, or API is marked up like this. References to that element, attribute, or API are marked up like this.

Other code fragments are marked up like this.

Variables are marked up like this.

This is an implementation requirement.

2 Common infrastructure

2.1 Terminology

This specification refers to both HTML and XML attributes and DOM attributes, often in the same context. When it is not clear which is being referred to, they are referred to as content attributes for HTML and XML attributes, and DOM attributes for those from the DOM. Similarly, the term "properties" is used for both JavaScript object properties and CSS properties. When these are ambiguous they are qualified as object properties and CSS properties respectively.

The term HTML documents is sometimes used in contrast with XML documents to specifically mean documents that were parsed using an HTML parser (as opposed to using an XML parser or created purely through the DOM).

Generally, when the specification states that a feature applies to HTML or XHTML, it also includes the other. When a feature specifically only applies to one of the two languages, it is called out by explicitly stating that it does not apply to the other format, as in "for HTML, ... (this does not apply to XHTML)".

This specification uses the term document to refer to any use of HTML, ranging from short static documents to long essays or reports with rich multimedia, as well as to fully-fledged interactive applications.

For simplicity, terms such as shown, displayed, and visible might sometimes be used when referring to the way a document is rendered to the user. These terms are not meant to imply a visual medium; they must be considered to apply to other media in equivalent ways.

When an algorithm B says to return to another algorithm A, it implies that A called B. Upon returning to A, the implementation must continue from where it left off in calling B.

2.1.1 XML

To ease migration from HTML to XHTML, UAs conforming to this specification will place elements in HTML in the http://www.w3.org/1999/xhtml namespace, at least for the purposes of the DOM and CSS. The term "elements in the HTML namespace", or "HTML elements" for short, when used in this specification, thus refers to both HTML and XHTML elements.

Unless otherwise stated, all elements defined or mentioned in this specification are in the http://www.w3.org/1999/xhtml namespace, and all attributes defined or mentioned in this specification have no namespace (they are in the per-element partition).

When an XML name, such as an attribute or element name, is referred to in the form prefix:localName, as in xml:id or svg:rect, it refers to a name with the local name localName and the namespace given by the prefix, as defined by the following table:

xml
http://www.w3.org/XML/1998/namespace
html
http://www.w3.org/1999/xhtml
svg
http://www.w3.org/2000/svg

Attribute names are said to be XML-compatible if they match the Name production defined in XML, they contain no U+003A COLON (:) characters, and their first three characters are not an ASCII case-insensitive match for the string "xml". [XML]

2.1.2 DOM trees

The term root element, when not explicitly qualified as referring to the document's root element, means the furthest ancestor element node of whatever node is being discussed, or the node itself if it has no ancestors. When the node is a part of the document, then that is indeed the document's root element; however, if the node is not currently part of the document tree, the root element will be an orphaned node.

A node's home subtree is the subtree rooted at that node's root element.

The Document of a Node (such as an element) is the Document that the Node's ownerDocument DOM attribute returns.

An element is said to have been inserted into a document when its root element changes and is now the document's root element. If a Node is in a Document then that Document is always the Node's Document, and the Node's ownerDocument DOM attribute thus always returns that Document.

The term tree order means a pre-order, depth-first traversal of DOM nodes involved (through the parentNode/childNodes relationship).

When it is stated that some element or attribute is ignored, or treated as some other value, or handled as if it was something else, this refers only to the processing of the node after it is in the DOM. A user agent must not mutate the DOM in such situations.

The term text node refers to any Text node, including CDATASection nodes; specifically, any Node with node type TEXT_NODE (3) or CDATA_SECTION_NODE (4). [DOM3CORE]

2.1.3 Scripting

The construction "a Foo object", where Foo is actually an interface, is sometimes used instead of the more accurate "an object implementing the interface Foo".

A DOM attribute is said to be getting when its value is being retrieved (e.g. by author script), and is said to be setting when a new value is assigned to it.

If a DOM object is said to be live, then that means that any attributes returning that object must always return the same object (not a new object each time), and the attributes and methods on that object must operate on the actual underlying data, not a snapshot of the data.

The terms fire and dispatch are used interchangeably in the context of events, as in the DOM Events specifications. [DOM3EVENTS]

2.1.4 Plugins

The term plugin is used to mean any content handler, typically a third-party content handler, for Web content types that are not supported by the user agent natively, or for content types that do not expose a DOM, that supports rendering the content as part of the user agent's interface.

One example of a plugin would be a PDF viewer that is instantiated in a browsing context when the user navigates to a PDF file. This would count as a plugin regardless of whether the party that implemented the PDF viewer component was the same as that which implemented the user agent itself. However, a PDF viewer application that launches separate from the user agent (as opposed to using the same interface) is not a plugin by this definition.

This specification does not define a mechanism for interacting with plugins, as it is expected to be user-agent- and platform-specific. Some UAs might opt to support a plugin mechanism such as the Netscape Plugin API; others might use remote content converters or have built-in support for certain types. [NPAPI]

Browsers should take extreme care when interacting with external content intended for plugins. When third-party software is run with the same privileges as the user agent itself, vulnerabilities in the third-party software become as dangerous as those in the user agent.

2.1.5 Character encodings

An ASCII-compatible character encoding is one that is a superset of US-ASCII (specifically, ANSI_X3.4-1968) for bytes in the set 0x09, 0x0A, 0x0C, 0x0D, 0x20 - 0x22, 0x26, 0x27, 0x2C - 0x3F, 0x41 - 0x5A, and 0x61 - 0x7A .

2.1.6 Resources

The specification uses the term supported when referring to whether a user agent has an implementation capable of decoding the semantics of an external resource. A format or type is said to be supported if the implementation can process an external resource of that format or type without critical aspects of the resource being ignored. Whether a specific resource is supported can depend on what features of the resource's format are in use.

For example, a PNG image would be considered to be in a supported format if its pixel data could be decoded and rendered, even if, unbeknownst to the implementation, the image actually also contained animation data.

A MPEG4 video file would not be considered to be in a supported format if the compression format used was not supported, even if the implementation could determine the dimensions of the movie from the file's metadata.

2.2 Conformance requirements

All diagrams, examples, and notes in this specification are non-normative, as are all sections explicitly marked non-normative. Everything else in this specification is normative.

The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative parts of this document are to be interpreted as described in RFC2119. For readability, these words do not appear in all uppercase letters in this specification. [RFC2119]

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

This specification describes the conformance criteria for user agents (relevant to implementors) and documents (relevant to authors and authoring tool implementors).

There is no implied relationship between document conformance requirements and implementation conformance requirements. User agents are not free to handle non-conformant documents as they please; the processing model described in this specification applies to implementations regardless of the conformity of the input documents.

User agents fall into several (overlapping) categories with different conformance requirements.

Web browsers and other interactive user agents

Web browsers that support XHTML must process elements and attributes from the HTML namespace found in XML documents as described in this specification, so that users can interact with them, unless the semantics of those elements have been overridden by other specifications.

A conforming XHTML processor would, upon finding an XHTML script element in an XML document, execute the script contained in that element. However, if the element is found within a transformation expressed in XSLT (assuming the user agent also supports XSLT), then the processor would instead treat the script element as an opaque element that forms part of the transform.

Web browsers that support HTML must process documents labeled as text/html as described in this specification, so that users can interact element is empty, or if the first thing inside the body element is not a space character or a comment , except if the first thing inside the body element is a script or style element. A body element's end tag may be omitted if the body element is not immediately followed by a comment . A li element's end tag may be omitted if the li element is immediately followed by another li element or if there is no more content in the parent element. A dt element's end tag may be omitted if the dt element is immediately followed by another dt element or a dd element. A dd element's end tag may be omitted if the dd element is immediately followed by another dd element or a dt element, or if there is no more content in the parent element. A p element's end tag may be omitted if the p element is immediately followed by an address , article , aside , blockquote , datagrid , dialog , dir , div , dl , fieldset , footer , form , h1 , h2 , h3 , h4 , h5 , h6 , header , hgroup , hr , menu , nav , ol , p , pre , section , table , or ul , element, or if there is no more content in the parent element and the parent element is not an a element. An rt element's end tag may be omitted if the rt element is immediately followed by an rt or rp element, or if there is no more content in the parent element. An rp element's end tag may be omitted if the rp element is immediately followed by an rt or rp element, or if there is no more content in the parent element. An optgroup element's end tag may be omitted if the optgroup element is immediately followed by another optgroup element, or if there is no more content in the parent element. An option element's end tag may be omitted if the option element is immediately followed by another option element, or if it is immediately followed by an optgroup element, or if there is no more content in the parent element. A colgroup element's start tag may be omitted if the first thing inside the colgroup element is a col element, and if the element is not immediately preceded by another colgroup element whose end tag has been omitted. (It can't be omitted if the element is empty.) A colgroup element's end tag may be omitted if the colgroup element is not immediately followed by a space character or a comment . A thead element's end tag may be omitted if the thead element is immediately followed by a tbody or tfoot element. A tbody element's start tag may be omitted if the first thing inside the tbody element is a tr element, and if the element is not immediately preceded by a tbody , thead , or tfoot element whose end tag has been omitted. (It can't be omitted if the element is empty.) A tbody element's end tag may be omitted if the tbody element is immediately followed by a tbody or tfoot element, or if there is no more content in the parent element. A tfoot element's end tag may be omitted if the tfoot element is immediately followed by a tbody element, or if there is no more content in the parent element. A tr element's end tag may be omitted if the tr element is immediately followed by another tr element, or if there is no more content in the parent element. A td element's end tag may be omitted if the td element is immediately followed by a td or th element, or if there is no more content in the parent element. A th element's end tag may be omitted if the th element is immediately followed by a td or th element, or if there is no more content in the parent element. However , a start tag must never be omitted if it has any attributes. 9.1.2.5 Restrictions on content models For historical reasons, certain elements have extra restrictions beyond even the restrictions given by their content model. A table element must not contain tr elements, even though these elements are technically allowed inside table elements according to the content models described in this specification. (If a tr element is put inside a table in the markup, it will in fact imply a tbody start tag before it.) A single U+000A LINE FEED (LF) character may be placed immediately after the start tag of pre and textarea elements. This does not affect the processing of the element. The otherwise optional U+000A LINE FEED (LF) character must be included if the element's contents start with that character (because otherwise the leading newline in the contents would be treated like the optional newline, and ignored). The following two pre blocks are equivalent: <pre>Hello</pre> <pre> Hello</pre> 9.1.2.6 Restrictions on the contents of CDATA and RCDATA elements The text in CDATA and RCDATA elements must not contain any occurrences of the string " </ " (U+003C LESS-THAN SIGN, U+002F SOLIDUS) followed by characters that case-insensitively match the tag name of the element followed by one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), U+0020 SPACE, U+003E GREATER-THAN SIGN (>), or U+002F SOLIDUS (/), unless that string is part of an escaping text span . An escaping text span is a span of text that starts with an escaping text span start that is not itself in an escaping text span , and ends at the next escaping text span end . There cannot be any character references inside an escaping text span — sequences of characters that would look like character references do not have special meaning. An escaping text span start is a part of text that consists of the four character sequence " <!-- " (U+003C LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS). An escaping text span end is a part of text that consists of the three character sequence " --> " (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN) whose U+003E GREATER-THAN SIGN (>). An escaping text span start may share its U+002D HYPHEN-MINUS characters with its corresponding escaping text span end . The text in CDATA and RCDATA elements must not have an escaping text span start that is not followed by an escaping text span end . 9.1.3 Text Text is allowed inside elements, attributes, and comments. Text must consist of Unicode characters. Text must not contain U+0000 characters. Text must not contain permanently undefined Unicode characters. Text must not contain control characters other than space characters . Extra constraints are placed on what is and what is not allowed in text based on where the text is to be put, as described in the other sections. 9.1.3.1 Newlines Newlines in HTML may be represented either as U+000D CARRIAGE RETURN (CR) characters, U+000A LINE FEED (LF) characters, or pairs of U+000D CARRIAGE RETURN (CR), U+000A LINE FEED (LF) characters in that order. 9.1.4 Character references In certain cases described in other sections, text may be mixed with character references . These can be used to escape characters that couldn't otherwise legally be included in text . Character references must start with a U+0026 AMPERSAND ( & ). Following this, there are three possible kinds of character references: Named character references The ampersand must be followed by one of the names given in the named character references section, using the same case. The name must be one that is terminated by a U+003B SEMICOLON ( ; ) character. Decimal numeric character reference The ampersand must be followed by a U+0023 NUMBER SIGN ( # ) character, followed by one or more digits in the range U+0030 DIGIT ZERO .. U+0039 DIGIT NINE, representing a base-ten integer that itself is a Unicode code point that is not U+0000, U+000D, in the range U+0080 .. U+009F, or in the range 0xD800 .. 0xDFFF (surrogates). The digits must then be followed by a U+003B SEMICOLON character ( ; ). Hexadecimal numeric character reference The ampersand must be followed by a U+0023 NUMBER SIGN ( # ) character, which must be followed by either a U+0078 LATIN SMALL LETTER X or a U+0058 LATIN CAPITAL LETTER X character, which must then be followed by one or more digits in the range U+0030 DIGIT ZERO .. U+0039 DIGIT NINE, U+0061 LATIN SMALL LETTER A .. U+0066 LATIN SMALL LETTER F, and U+0041 LATIN CAPITAL LETTER A .. U+0046 LATIN CAPITAL LETTER F, representing a base-sixteen integer that itself is a Unicode code point that is not U+0000, U+000D, in the range U+0080 .. U+009F, or in the range 0xD800 .. 0xDFFF (surrogates). The digits must then be followed by a U+003B SEMICOLON character ( ; ). An ambiguous ampersand is a U+0026 AMPERSAND ( & ) character that is followed by some text other than a space character , a U+003C LESS-THAN SIGN character ('<'), or another U+0026 AMPERSAND ( & ) character. 9.1.5 CDATA sections CDATA sections must start with the character sequence U+003C LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+005B LEFT SQUARE BRACKET, U+0043 LATIN CAPITAL LETTER C, U+0044 LATIN CAPITAL LETTER D, U+0041 LATIN CAPITAL LETTER A, U+0054 LATIN CAPITAL LETTER T, U+0041 LATIN CAPITAL LETTER A, U+005B LEFT SQUARE BRACKET ( <![CDATA[ ). Following this sequence, the CDATA section may have text , with the additional restriction that the text must not contain the three character sequence U+005D RIGHT SQUARE BRACKET, U+005D RIGHT SQUARE BRACKET, U+003E GREATER-THAN SIGN ( ]]> ). Finally, the CDATA section must be ended by the three character sequence U+005D RIGHT SQUARE BRACKET, U+005D RIGHT SQUARE BRACKET, U+003E GREATER-THAN SIGN ( ]]> ). 9.1.6 Comments Comments must start with the four character sequence U+003C LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS ( <!-- ). Following this sequence, the comment may have text , with the additional restriction that the text must not start with a single U+003E GREATER-THAN SIGN ('>') character, nor start with a U+002D HYPHEN-MINUS ( - ) character followed by a U+003E GREATER-THAN SIGN ('>') character, nor contain two consecutive U+002D HYPHEN-MINUS ( - ) characters, nor end with a U+002D HYPHEN-MINUS ( - ) character. Finally, the comment must be ended by the three character sequence U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN ( --> ). 9.2 Parsing HTML documents This section only applies to user agents, data mining tools, and conformance checkers. The rules for parsing XML documents (and thus XHTML documents) into DOM trees are covered by the next section, entitled " The XHTML syntax ". For HTML documents , user agents must use the parsing rules described in this section to generate the DOM trees. Together, these rules define what is referred to as the HTML parser . While the HTML form of HTML5 bears a close resemblance to SGML and XML, it is a separate language with its own parsing rules. Some earlier versions of HTML (in particular from HTML2 to HTML4) were based on SGML and used SGML parsing rules. However, few (if any) web browsers ever implemented true SGML parsing for HTML documents; the only user agents to strictly handle HTML as an SGML application have historically been validators. The resulting confusion — with validators claiming documents to have one representation while widely deployed Web browsers interoperably implemented a different representation — has wasted decades of productivity. This version of HTML thus returns to a non-SGML basis. Authors interested in using SGML tools in their authoring pipeline are encouraged to use XML tools and the XML serialization of HTML5. This specification defines the parsing rules for HTML documents, whether they are syntactically correct or not. Certain points in the parsing algorithm are said to be parse errors . The error handling for parse errors is well-defined: user agents must either act as described below when encountering such problems, or must abort processing at the first error that they encounter for which they do not wish to apply the rules described below. Conformance checkers must report at least one parse error condition to the user if one or more parse error conditions exist in the document and must not report parse error conditions if none exist in the document. Conformance checkers may report more than one parse error condition if more than one parse error conditions exist in the document. Conformance checkers are not required to recover from parse errors. Parse errors are only errors with the syntax of HTML. In addition to checking for parse errors, conformance checkers will also verify that the document obeys all the other conformance requirements described in this specification. 9.2.1 Overview of the parsing model The input to the HTML parsing process consists of a stream of Unicode characters, which is passed through a tokenization stage followed by a tree construction stage. The output is a Document object. Implementations that do not support scripting do not have to actually create a DOM Document object, but the DOM tree in such cases is still used as the model for the rest of the specification. In the common case, the data handled by the tokenization stage comes from the network, but it can also come from script , e.g. using the document.write() API. There is only one set of states for the tokenizer stage and the tree construction stage, but the tree construction stage is reentrant, meaning that while the tree construction stage is handling one token, the tokenizer might be resumed, causing further tokens to be emitted and processed before the first token's processing is complete. In the following example, the tree construction stage will be called upon to handle a "p" start tag token while handling the "script" start tag token: ... <script> document.write('<p>'); </script> ... To handle these cases, parsers have a script nesting level , which must be initially set to zero, and a parser pause flag , which must be initially set to false. 9.2.2 The input stream The stream of Unicode characters that comprises the input to the tokenization stage will be initially seen by the user agent as a stream of bytes (typically coming over the network or from the local file system). The bytes encode the actual characters according to a particular character encoding , which the user agent must use to decode the bytes into characters. For XML documents, the algorithm user agents must use to determine the character encoding is given by the XML specification. This section does not apply to XML documents. [XML] 9.2.2.1 Determining the character encoding In some cases, it might be impractical to unambiguously determine the encoding before parsing the document. Because of this, this specification provides for a two-pass mechanism with an optional pre-scan. Implementations are allowed, as described below, to apply a simplified parsing algorithm to whatever bytes they have available before beginning to parse the document. Then, the real parser is started, using a tentative encoding derived from this pre-parse and other out-of-band metadata. If, while the document is being loaded, the user agent discovers an encoding declaration that conflicts with this information, then the parser can get reinvoked to perform a parse of the document with the real encoding. User agents must use the following algorithm (the encoding sniffing algorithm ) to determine the character encoding to use when decoding a document in the first pass. This algorithm takes as input any out-of-band metadata available to the user agent (e.g. the Content-Type metadata of the document) and all the bytes available so far, and returns an encoding and a confidence . The confidence is either tentative , certain , or irrelevant . The encoding used, and whether the confidence in that encoding is tentative or certain , is used during the parsing to determine whether to change the encoding . If no encoding is necessary, e.g. because the parser is operating on a stream of Unicode characters and doesn't have to use an encoding at all, then the confidence is irrelevant . If the transport layer specifies an encoding, and it is supported, return that encoding with the confidence certain , and abort these steps. The user agent may wait for more bytes of the resource to be available, either in this step or at any later step in this algorithm. For instance, a user agent might wait 500ms or 512 bytes, whichever came first. In general preparsing the source to find the encoding improves performance, as it reduces the need to throw away the data structures used when parsing upon finding the encoding information. However, if the user agent delays too long to obtain data to determine the encoding, then the cost of the delay could outweigh any performance improvements from the preparse. For each of the rows in the following table, starting with the first one and going down, if there are as many or more bytes available than the number of bytes in the first column, and the first bytes of the file match the bytes given in the first column, then return the encoding given in the cell in the second column of that row, with the confidence certain , and abort these steps: Bytes in Hexadecimal Encoding FE FF UTF-16BE FF FE UTF-16LE EF BB BF UTF-8 This step looks for Unicode Byte Order Marks (BOMs). Otherwise, the user agent will have to search for explicit character encoding information in the file itself. This should proceed as follows: Let position be a pointer to a byte in the input stream, initially pointing at the first byte. If at any point during these substeps the user agent either runs out of bytes or decides that scanning further bytes would not be efficient, then skip to the next step of the overall character encoding detection algorithm. User agents may decide that scanning any bytes is not efficient, in which case these substeps are entirely skipped. Now, repeat the following "two" steps until the algorithm aborts (either because user agent aborts, as described above, or because a character encoding is found): If position points to: A sequence of bytes starting with: 0x3C 0x21 0x2D 0x2D (ASCII '<!--') Advance the position pointer so that it points at the first 0x3E byte which is preceded by two 0x2D bytes (i.e. at the end of an ASCII '-->' sequence) and comes after the 0x3C byte that was found. (The two 0x2D bytes can be the same as the those in the '<!--' sequence.) A sequence of bytes starting with: 0x3C, 0x4D or 0x6D, 0x45 or 0x65, 0x54 or 0x74, 0x41 or 0x61, and finally one of 0x09, 0x0A, 0x0C, 0x0D, 0x20, 0x2F (case-insensitive ASCII '<meta' followed by a space or slash) Advance the position pointer so that it points at the next 0x09, 0x0A, 0x0C, 0x0D, 0x20, or 0x2F byte (the one in sequence of characters matched above). Get an attribute and its value. If no attribute was sniffed, then skip this inner set of steps, and jump to the second step in the overall "two step" algorithm. If the attribute's name is neither " charset " nor " content ", then return to step 2 in these inner steps. If the attribute's name is " charset ", let charset be the attribute's value, interpreted as a character encoding. Otherwise, the attribute's name is " content ": apply the algorithm for extracting an encoding from a Content-Type , giving the attribute's value as the string to parse. If an encoding is returned, let charset be that encoding. Otherwise, return to step 2 in these inner steps. If charset is a UTF-16 encoding, change it to UTF-8. If charset is a supported character encoding, then return the given encoding, with confidence tentative , and abort all these steps. Otherwise, return to step 2 in these inner steps. A sequence of bytes starting with a 0x3C byte (ASCII '<'), optionally a 0x2F byte (ASCII '/'), and finally a byte in the range 0x41-0x5A or 0x61-0x7A (an ASCII letter) Advance the position pointer so that it points at the next 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E (ASCII '>') byte. Repeatedly get an attribute until no further attributes can be found, then jump to the second step in the overall "two step" algorithm. A sequence of bytes starting with: 0x3C 0x21 (ASCII '<!') A sequence of bytes starting with: 0x3C 0x2F (ASCII '</') A sequence of bytes starting with: 0x3C 0x3F (ASCII '<?') Advance the position pointer so that it points at the first 0x3E byte (ASCII '>') that comes after the 0x3C byte that was found. Any other byte Do nothing with that byte. Move position so it points at the next byte in the input stream, and return to the first step of this "two step" algorithm. When the above "two step" algorithm says to get an attribute , it means doing this: If the byte at position is one of 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x2F (ASCII '/') then advance position to the next byte and redo this substep. If the byte at position is 0x3E (ASCII '>'), then abort the "get an attribute" algorithm. There isn't one. Otherwise, the byte at position is the start of the attribute name. Let attribute name and attribute value be the empty string. Attribute name : Process the byte at position as follows: If it is 0x3D (ASCII '='), and the attribute name is longer than the empty string Advance position to the next byte and jump to the step below labeled value . If it is 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space) Jump to the step below labeled spaces . If it is 0x2F (ASCII '/') or 0x3E (ASCII '>') Abort the "get an attribute" algorithm. The attribute's name is the value of attribute name , its value is the empty string. If it is in the range 0x41 (ASCII 'A') to 0x5A (ASCII 'Z') Append the Unicode character with code point b +0x20 to attribute name (where b is the value of the byte at position ). Anything else Append the Unicode character with the same code point as the value of the byte at position ) to attribute name . (It doesn't actually matter how bytes outside the ASCII range are handled here, since only ASCII characters can contribute to the detection of a character encoding.) Advance position to the next byte and return to the previous step. Spaces. If the byte at position is one of 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space) then advance position to the next byte, then, repeat this step. If the byte at position is not 0x3D (ASCII '='), abort the "get an attribute" algorithm. The attribute's name is the value of attribute name , its value is the empty string. Advance position past the 0x3D (ASCII '=') byte. Value. If the byte at position is one of 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space) then advance position to the next byte, then, repeat this step. Process the byte at position as follows: If it is 0x22 (ASCII '"') or 0x27 ("'") Let b be the value of the byte at position . Advance position to the next byte. If the value of the byte at position is the value of b , then advance position to the next byte and abort the "get an attribute" algorithm. The attribute's name is the value of attribute name , and its value is the value of attribute value . Otherwise, if the value of the byte at position is in the range 0x41 (ASCII 'A') to 0x5A (ASCII 'Z'), then append a Unicode character to attribute value whose code point is 0x20 more than the value of the byte at position . Otherwise, append a Unicode character to attribute value whose code point is the same as the value of the byte at position . Return to the second step in these substeps. If it is 0x3E (ASCII '>') Abort the "get an attribute" algorithm. The attribute's name is the value of attribute name , its value is the empty string. If it is in the range 0x41 (ASCII 'A') to 0x5A (ASCII 'Z') Append the Unicode character with code point b +0x20 to attribute value (where b is the value of the byte at position ). Advance position to the next byte. Anything else Append the Unicode character with the same code point as the value of the byte at position ) to attribute value . Advance position to the next byte. Process the byte at position as follows: If it is 0x09 (ASCII TAB), 0x0A (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), 0x20 (ASCII space), or 0x3E (ASCII '>') Abort the "get an attribute" algorithm. The attribute's name is the value of attribute name and its value is the value of attribute value . If it is in the range 0x41 (ASCII 'A') to 0x5A (ASCII 'Z') Append the Unicode character with code point b +0x20 to attribute value (where b is the value of the byte at position ). Anything else Append the Unicode character with the same code point as the value of the byte at position ) to attribute value . Advance position to the next byte and return to the previous step. For the sake of interoperability, user agents should not use a pre-scan algorithm that returns different results than the one described above. (But, if you do, please at least let us know, so that we can improve this algorithm and benefit everyone...) If the user agent has information on the likely encoding for this page, e.g. based on the encoding of the page when it was last visited, then return that encoding, with the confidence tentative , and abort these steps. The user agent may attempt to autodetect the character encoding from applying frequency analysis or other algorithms to the data stream. If autodetection succeeds in determining a character encoding, then return that encoding, with the confidence tentative , and abort these steps. [UNIVCHARDET] Otherwise, return an implementation-defined or user-specified default character encoding, with the confidence tentative . In non-legacy environments, the more comprehensive UTF-8 encoding is recommended. Due to its use in legacy content, windows-1252 is recommended as a default in predominantly Western demographics instead. Since these encodings can in many cases be distinguished by inspection, a user agent may heuristically decide which to use as a default. The document's character encoding must immediately be set to the value returned from this algorithm, at the same time as the user agent uses the returned value to select the decoder to use for the input stream. 9.2.2.2 Preprocessing the input stream Given an encoding, the bytes in the input stream must be converted to Unicode characters for the tokenizer, as described by the rules for that encoding, except that the leading U+FEFF BYTE ORDER MARK character, if any, must not be stripped by the encoding layer (it is stripped by the rule below). Bytes or sequences of bytes in the original byte stream that could not be converted to Unicode characters must be converted to U+FFFD REPLACEMENT CHARACTER code points. Bytes or sequences of bytes in the original byte stream that did not conform to the encoding specification (e.g. invalid UTF-8 byte sequences in a UTF-8 input stream) are errors that conformance checkers are expected to report. Any byte or sequences of bytes in the original byte stream that is misinterpreted for compatibility is a parse error . One leading U+FEFF BYTE ORDER MARK character must be ignored if any are present. All U+0000 NULL characters in the input must be replaced by U+FFFD REPLACEMENT CHARACTERs. Any occurrences of such characters is a parse error . Any occurrences of any characters in the ranges U+0001 to U+0008, U+007F to U+009F, U+D800 to U+DFFF, U+FDD0 to U+FDEF, and characters U+000B, U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF, U+10FFFE, and U+10FFFF are parse errors . (These are all control characters or permanently undefined Unicode characters.) U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF) characters are treated specially. Any CR characters that are followed by LF characters must be removed, and any CR characters not followed by LF characters must be converted to LF characters. Thus, newlines in HTML DOMs are represented by LF characters, and there are never any CR characters in the input to the tokenization stage. The next input character is the first character in the input stream that has not yet been consumed . Initially, the next input character is the first character in the input. The current input character is the last character to have been consumed . The insertion point is the position (just before a character or just before the end of the input stream) where content inserted using document.write() is actually inserted. The insertion point is relative to the position of the character immediately after it, it is not an absolute offset into the input stream. Initially, the insertion point is uninitialized. The "EOF" character in the tables below is a conceptual character representing the end of the input stream . If the parser is a script-created parser , then the end of the input stream is reached when an explicit "EOF" character (inserted by the document.close() method) is consumed. Otherwise, the "EOF" character is not a real character in the stream, but rather the lack of any further characters. 9.2.2.3 Changing the encoding while parsing When the parser requires the user agent to change the encoding , it must run the following steps. This might happen if the encoding sniffing algorithm described above failed to find an encoding, or if it found an encoding that was not the actual encoding of the file. If the new encoding is a UTF-16 encoding, change it to UTF-8. If the new encoding is identical or equivalent to the encoding that is already being used to interpret the input stream, then set the confidence to certain and abort these steps. This happens when the encoding information found in the file matches what the encoding sniffing algorithm determined to be the encoding, and in the second pass through the parser if the first pass found that the encoding sniffing algorithm described in the earlier section failed to find the right encoding. If all the bytes up to the last byte converted by the current decoder have the same Unicode interpretations in both the current encoding and the new encoding, and if the user agent supports changing the converter on the fly, then the user agent may change to the new converter for the encoding on the fly. Set the document's character encoding and the encoding used to convert the input stream to the new encoding, set the confidence to certain , and abort these steps. Otherwise, navigate to the document again, with replacement enabled , and using the same source browsing context , but this time skip the encoding sniffing algorithm and instead just set the encoding to the new encoding and the confidence to certain . Whenever possible, this should be done without actually contacting the network layer (the bytes should be re-parsed from memory), even if, e.g., the document is marked as not being cacheable. If this is not possible and contacting the network layer would involve repeating a request that uses a method other than HTTP GET ( or equivalent for non-HTTP URLs), then instead set the confidence to certain and ignore the new encoding. The resource will be misinterpreted. User agents may notify the user of the situation, to aid in application development. 9.2.3 Parse state 9.2.3.1 The insertion mode The insertion mode is a state variable that controls the primary operation of the tree construction stage. Initially, the insertion mode is " initial ". It can change to " before html ", " before head ", " in head ", " in head noscript ", " after head ", " in body ", " in CDATA/RCDATA ", " in table ", " in caption ", " in column group ", " in table body ", " in row ", " in cell ", " in select ", " in select in table ", " in foreign content ", " after body ", " in frameset ", " after frameset ", " after after body ", and " after after frameset " during the course of the parsing, as described in the tree construction stage. The insertion mode affects how tokens are processed and whether CDATA sections are supported. Seven of these modes, namely " in head ", " in body ", " in CDATA/RCDATA ", " in table ", " in table body ", " in row ", " in cell ", and " in select ", are special, in that the other modes defer to them at various times. When the algorithm below says that the user agent is to do something " using the rules for the m insertion mode", where m is one of these modes, the user agent must use the rules described under the m insertion mode 's section, but must leave the insertion mode unchanged unless the rules in m themselves switch the insertion mode to a new value. When the insertion mode is switched to " in CDATA/RCDATA ", the original insertion mode is also set. This is the insertion mode to which the tree construction stage will return when the corresponding end tag is parsed. When the insertion mode is switched to " in foreign content ", the secondary insertion mode is also set. This secondary mode is used within the rules for the " in foreign content " mode to handle HTML (i.e. not foreign) content. When the steps below require the UA to reset the insertion mode appropriately , it means the UA must follow these steps: Let last be false. Let node be the last node in the stack of open elements . If node is the first node in the stack of open elements, then set last to true and set node to the context element. ( fragment case ) If node is a select element, then switch the insertion mode to " in select " and abort these steps. ( fragment case ) If node is a td or th element and last is false, then switch the insertion mode to " in cell " and abort these steps. If node is a tr element, then switch the insertion mode to " in row " and abort these steps. If node is a tbody , thead , or tfoot element, then switch the insertion mode to " in table body " and abort these steps. If node is a caption element, then switch the insertion mode to " in caption " and abort these steps. If node is a colgroup element, then switch the insertion mode to " in column group " and abort these steps. ( fragment case ) If node is a table element, then switch the insertion mode to " in table " and abort these steps. If node is an element from the MathML namespace or the SVG namespace , then switch the insertion mode to " in foreign content ", let the secondary insertion mode be " in body ", and abort these steps. If node is a head element, then switch the insertion mode to " in body " (" in body "! not " in head " !) and abort these steps. ( fragment case ) If node is a body element, then switch the insertion mode to " in body " and abort these steps. If node is a frameset element, then switch the insertion mode to " in frameset " and abort these steps. ( fragment case ) If node is an html element, then: if the head element pointer is null, switch the insertion mode to " before head ", otherwise, switch the insertion mode to " after head ". In either case, abort these steps. ( fragment case ) If last is true, then switch the insertion mode to " in body " and abort these steps. ( fragment case ) Let node now be the node before node in the stack of open elements . Return to step 3. 9.2.3.2 The stack of open elements Initially, the stack of open elements is empty. The stack grows downwards; the topmost node on the stack is the first one added to the stack, and the bottommost node of the stack is the most recently added node in the stack (notwithstanding when the stack is manipulated in a random access fashion as part of the handling for misnested tags ). The " before html " insertion mode creates the html root element node, which is then added to the stack. In the fragment case , the stack of open elements is initialized to contain an html element that is created as part of that algorithm . (The fragment case skips the " before html " insertion mode .) The html node, however it is created, is the topmost node of the stack. It never gets popped off the stack. The current node is the bottommost node in this stack. The current table is the last table element in the stack of open elements , if there is one. If there is no table element in the stack of open elements ( fragment case ), then the current table is the first element in the stack of open elements (the html element). Elements in the stack fall into the following categories: Special The following HTML elements have varying levels of special parsing rules: address , area , article , aside , base , basefont , bgsound , blockquote , body , br , center , col , colgroup , command , datagrid , dd , details , dialog , dir , div , dl , dt , embed , fieldset , figure , footer , form , frame , frameset , h1 , h2 , h3 , h4 , h5 , h6 , head , header , hgroup , hr , iframe , img , input , isindex , li , link , listing , menu , meta , nav , noembed , noframes , noscript , ol , p , param , plaintext , pre , script , section , select , spacer , style , tbody , textarea , tfoot , thead , title , tr , ul , and wbr . Scoping The following HTML elements introduce new scopes for various parts of the parsing: applet , button , caption , html , marquee , object , table , td , th , and SVG's foreignObject . Formatting The following HTML elements are those that end up in the list of active formatting elements : a , b , big , code , em , font , i , nobr , s , small , strike , strong , tt , and u . Phrasing All other elements found while parsing an HTML document. The stack of open elements is said to have an element in scope when the following algorithm terminates in a match state: Initialize node to be the current node (the bottommost node of the stack). If node is the target node, terminate in a match state. Otherwise, if node is one of the following elements, terminate in a failure state: applet in the HTML namespace caption in the HTML namespace html in the HTML namespace table in the HTML namespace td in the HTML namespace th in the HTML namespace button in the HTML namespace marquee in the HTML namespace object in the HTML namespace foreignObject in the SVG namespace Otherwise, set node to the previous entry in the stack of open elements and return to step 2. (This will never fail, since the loop will always terminate in the previous step if the top of the stack — an html element — is reached.) The stack of open elements is said to have an element in table scope when the following algorithm terminates in a match state: Initialize node to be the current node (the bottommost node of the stack). If node is the target node, terminate in a match state. Otherwise, if node is one of the following elements, terminate in a failure state: html in the HTML namespace table in the HTML namespace Otherwise, set node to the previous entry in the stack of open elements and return to step 2. (This will never fail, since the loop will always terminate in the previous step if the top of the stack — an html element — is reached.) Nothing happens if at any time any of the elements in the stack of open elements are moved to a new location in, or removed from, the Document tree. In particular, the stack is not changed in this situation. This can cause, amongst other strange effects, content to be appended to nodes that are no longer in the DOM. In some cases (namely, when closing misnested formatting elements ), the stack is manipulated in a random-access fashion. 9.2.3.3 The list of active formatting elements Initially, the list of active formatting elements is empty. It is used to handle mis-nested formatting element tags . The list contains elements in the formatting category, and scope markers. The scope markers are inserted when entering applet elements, buttons, object elements, marquees, table cells, and table captions, and are used to prevent formatting from "leaking" into applet elements, buttons, object elements, marquees, and tables. In addition, each element in the list of active formatting elements is associated with the token for which it was created, so that further elements can be created for that token if necessary. When the steps below require the UA to reconstruct the active formatting elements , the UA must perform the following steps: If there are no entries in the list of active formatting elements , then there is nothing to reconstruct; stop this algorithm. If the last (most recently added) entry in the list of active formatting elements is a marker, or if it is an element that is in the stack of open elements , then there is nothing to reconstruct; stop this algorithm. Let entry be the last (most recently added) element in the list of active formatting elements . If there are no entries before entry in the list of active formatting elements , then jump to step 8. Let entry be the entry one earlier than entry in the list of active formatting elements . If entry is neither a marker nor an element that is also in the stack of open elements , go to step 4. Let entry be the element one later than entry in the list of active formatting elements . Create an element for the token for which the element entry was created, to obtain new element . Append new element to the current node and push it onto the stack of open elements so that it is the new current node . Replace the entry for entry in the list with an entry for new element . If the entry for new element in the list of active formatting elements is not the last entry in the list, return to step 7. This has the effect of reopening all the formatting elements that were opened in the current body, cell, or caption (whichever is youngest) that haven't been explicitly closed. The way this specification is written, the list of active formatting elements always consists of elements in chronological order with the least recently added element first and the most recently added element last (except for while steps 8 to 11 of the above algorithm are being executed, of course). When the steps below require the UA to clear the list of active formatting elements up to the last marker , the UA must perform the following steps: Let entry be the last (most recently added) entry in the list of active formatting elements . Remove entry from the list of active formatting elements . If entry was a marker, then stop the algorithm at this point. The list has been cleared up to the last marker. Go to step 1. 9.2.3.4 The element pointers Initially, the head element pointer and the form element pointer are both null. Once a head element has been parsed (whether implicitly or explicitly) the head element pointer gets set to point to this node. The form element pointer points to the last form element that was opened and whose end tag has not yet been seen. It is used to make form controls associate with forms in the face of dramatically bad markup, for historical reasons. 9.2.3.5 Other parsing state flags The scripting flag is set to "enabled" if scripting was enabled for the Document with which the parser is associated when the parser was created, and "disabled" otherwise. The frameset-ok flag is set to "ok" when the parser is created. It is set to "not ok" after certain tokens are seen. 9.2.4 Tokenization Implementations must act as if they used the following state machine to tokenize HTML. The state machine must start in the data state . Most states consume a single character, which may have various side-effects, and either switches the state machine to a new state to reconsume the same character, or switches it to a new state (to consume the next character), or repeats the same state (to consume the next character). Some states have more complicated behavior and can consume several characters before switching to another state. The exact behavior of certain states depends on a content model flag that is set after certain tokens are emitted. The flag has several states: PCDATA , RCDATA , CDATA , and PLAINTEXT . Initially, it must be in the PCDATA state. In the RCDATA and CDATA states, a further escape flag is used to control the behavior of the tokenizer. It is either true or false, and initially must be set to the false state. The insertion mode and the stack of open elements also affects tokenization. The output of the tokenization step is a series of zero or more of the following tokens: DOCTYPE, start tag, end tag, comment, character, end-of-file. DOCTYPE tokens have a name, a public identifier, a system identifier, and a force-quirks flag . When a DOCTYPE token is created, its name, public identifier, and system identifier must be marked as missing (which is a distinct state from the empty string), and the force-quirks flag must be set to off (its other state is on ). Start and end tag tokens have a tag name, a self-closing flag , and a list of attributes, each of which has a name and a value. When a start or end tag token is created, its self-closing flag must be unset (its other state is that it be set), and its attributes list must be empty. Comment and character tokens have data. When a token is emitted, it must immediately be handled by the tree construction stage. The tree construction stage can affect the state of the content model flag , and can insert additional characters into the stream. (For example, the script element can result in scripts executing and using the dynamic markup insertion APIs to insert characters into the stream being tokenized.) When a start tag token is emitted with its self-closing flag set, if the flag is not acknowledged when it is processed by the tree construction stage, that is a parse error . When an end tag token is emitted, the content model flag must be switched to the PCDATA state. When an end tag token is emitted with attributes, that is a parse error . When an end tag token is emitted with its self-closing flag set, that is a parse error . Before each step of the tokenizer, the user agent must first check the parser pause flag . If it is true, then the tokenizer must abort the processing of any nested invocations of the tokenizer, yielding control back to the caller. If it is false, then the user agent may then check to see if either one of the scripts in the list of scripts that will execute as soon as possible or the first script in the list of scripts that will execute asynchronously , has completed loading . If one has, then it must be executed and removed from its list. The tokenizer state machine consists of the states defined in the following subsections. 9.2.4.1 Data state Consume the next input character : U+0026 AMPERSAND (&) When the content model flag is set to one of the PCDATA or RCDATA states and the escape flag is false: switch to the character reference data state . Otherwise: treat it as per the "anything else" entry below. U+002D HYPHEN-MINUS (-) If the content model flag is set to either the RCDATA state or the CDATA state, and the escape flag is false, and there are at least three characters before this one in the input stream, and the last four characters in the input stream, including this one, are U+003C LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, and U+002D HYPHEN-MINUS ("<!--"), then set the escape flag to true. In any case, emit the input character as a character token. Stay in the data state . U+003C LESS-THAN SIGN (<) When the content model flag is set to the PCDATA state: switch to the tag open state . When the content model flag is set to either the RCDATA state or the CDATA state, and the escape flag is false: switch to the tag open state . Otherwise: treat it as per the "anything else" entry below. U+003E GREATER-THAN SIGN (>) If the content model flag is set to either the RCDATA state or the CDATA state, and the escape flag is true, and the last three characters in the input stream including this one are U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN ("-->"), set the escape flag to false. In any case, emit the input character as a character token. Stay in the data state . EOF Emit an end-of-file token. Anything else Emit the input character as a character token. Stay in the data state . 9.2.4.2 Character reference data state (This cannot happen if the content model flag is set to the CDATA state.) Attempt to consume a character reference , with no additional allowed character . If nothing is returned, emit a U+0026 AMPERSAND character token. Otherwise, emit the character token that was returned. Finally, switch to the data state . 9.2.4.3 Tag open state The behavior of this state depends on the content model flag . If the content model flag is set to the RCDATA or CDATA states Consume the next input character . If it is a U+002F SOLIDUS (/) character, switch to the close tag open state . Otherwise, emit a U+003C LESS-THAN SIGN character token and reconsume the current input character in the data state . If the content model flag is set to the PCDATA state Consume the next input character : U+0021 EXCLAMATION MARK (!) Switch to the markup declaration open state . U+002F SOLIDUS (/) Switch to the close tag open state . U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z Create a new start tag token, set its tag name to the lowercase version of the input character (add 0x0020 to the character's code point), then switch to the tag name state . (Don't emit the token yet; further details will be filled in before it is emitted.) U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z Create a new start tag token, set its tag name to the input character, then switch to the tag name state . (Don't emit the token yet; further details will be filled in before it is emitted.) U+003E GREATER-THAN SIGN (>) Parse error . Emit a U+003C LESS-THAN SIGN character token and a U+003E GREATER-THAN SIGN character token. Switch to the data state . U+003F QUESTION MARK (?) Parse error . Switch to the bogus comment state . Anything else Parse error . Emit a U+003C LESS-THAN SIGN character token and reconsume the current input character in the data state . 9.2.4.4 Close tag open state If the content model flag is set to the RCDATA or CDATA states but no start tag token has ever been emitted by this instance of the tokenizer ( fragment case ), or, if the content model flag is set to the RCDATA or CDATA states and the next few characters do not match the tag name of the last start tag token emitted (compared in an ASCII case-insensitive manner), or if they do but they are not immediately followed by one of the following characters: U+0009 CHARACTER TABULATION U+000A LINE FEED (LF) U+000C FORM FEED (FF) U+0020 SPACE U+003E GREATER-THAN SIGN (>) U+002F SOLIDUS (/) EOF ...then emit a U+003C LESS-THAN SIGN character token, a U+002F SOLIDUS character token, and switch to the data state to process the next input character . Otherwise, if the content model flag is set to the PCDATA state, or if the next few characters do match that tag name, consume the next input character : U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z Create a new end tag token, set its tag name to the lowercase version of the input character (add 0x0020 to the character's code point), then switch to the tag name state . (Don't emit the token yet; further details will be filled in before it is emitted.) U+0061 LATIN SMALL LETTER A through to U+007A LATIN SMALL LETTER Z Create a new end tag token, set its tag name to the input character, then switch to the tag name state . (Don't emit the token yet; further details will be filled in before it is emitted.) U+003E GREATER-THAN SIGN (>) Parse error . Switch to the data state . EOF Parse error . Emit a U+003C LESS-THAN SIGN character token and a U+002F SOLIDUS character token. Reconsume the EOF character in the data state . Anything else Parse error . Switch to the bogus comment state . 9.2.4.5 Tag name state Consume the next input character : U+0009 CHARACTER TABULATION U+000A LINE FEED (LF) U+000C FORM FEED (FF) U+0020 SPACE Switch to the before attribute name state . U+002F SOLIDUS (/) Switch to the self-closing start tag state . U+003E GREATER-THAN SIGN (>) Emit the current tag token. Switch to the data state . U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z Append the lowercase version of the current input character (add 0x0020 to the character's code point) to the current tag token's tag name. Stay in the tag name state . EOF Parse error . Reconsume the EOF character in the data state . Anything else Append the current input character to the current tag token's tag name. Stay in the tag name state . 9.2.4.6 Before attribute name state Consume the next input character : U+0009 CHARACTER TABULATION U+000A LINE FEED (LF) U+000C FORM FEED (FF) U+0020 SPACE Stay in the before attribute name state . U+002F SOLIDUS (/) Switch to the self-closing start tag state . U+003E GREATER-THAN SIGN (>) Emit the current tag token. Switch to the data state . U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z Start a new attribute in the current tag token. Set that attribute's name to the lowercase version of the current input character (add 0x0020 to the character's code point), and its value to the empty string. Switch to the attribute name state . U+0022 QUOTATION MARK (") U+0027 APOSTROPHE (') U+003D EQUALS SIGN (=) Parse error . Treat it as per the "anything else" entry below. EOF Parse error . Reconsume the EOF character in the data state . Anything else Start a new attribute in the current tag token. Set that attribute's name to the current input character , and its value to the empty string. Switch to the attribute name state . 9.2.4.7 Attribute name state Consume the next input character : U+0009 CHARACTER TABULATION U+000A LINE FEED (LF) U+000C FORM FEED (FF) U+0020 SPACE Switch to the after attribute name state . U+002F SOLIDUS (/) Switch to the self-closing start tag state . U+003D EQUALS SIGN (=) Switch to the before attribute value state . U+003E GREATER-THAN SIGN (>) Emit the current tag token. Switch to the data state . U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z Append the lowercase version of the current input character (add 0x0020 to the character's code point) to the current attribute's name. Stay in the attribute name state . U+0022 QUOTATION MARK (") U+0027 APOSTROPHE (') Parse error . Treat it as per the "anything else" entry below. EOF Parse error . Reconsume the EOF character in the data state . Anything else Append the current input character to the current attribute's name. Stay in the attribute name state . When the user agent leaves the attribute name state (and before emitting the tag token, if appropriate), the complete attribute's name must be compared to the other attributes on the same token; if there is already an attribute on the token with the exact same name, then this is a parse error and the new attribute must be dropped, along with the value that gets associated with it (if any). 9.2.4.8 After attribute name state Consume the next input character : U+0009 CHARACTER TABULATION U+000A LINE FEED (LF) U+000C FORM FEED (FF) U+0020 SPACE Stay in the after attribute name state . U+002F SOLIDUS (/) Switch to the self-closing start tag state . U+003D EQUALS SIGN (=) Switch to the before attribute value state . U+003E GREATER-THAN SIGN (>) Emit the current tag token. Switch to the data state . U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z Start a new attribute in the current tag token. Set that attribute's name to the lowercase version of the current input character (add 0x0020 to the character's code point), and its value to the empty string. Switch to the attribute name state . U+0022 QUOTATION MARK (") U+0027 APOSTROPHE (') Parse error . Treat it as per the "anything else" entry below. EOF Parse error . Reconsume the EOF character in the data state . Anything else Start a new attribute in the current tag token. Set that attribute's name to the current input character , and its value to the empty string. Switch to the attribute name state . 9.2.4.9 Before attribute value state Consume the next input character : U+0009 CHARACTER TABULATION U+000A LINE FEED (LF) U+000C FORM FEED (FF) U+0020 SPACE Stay in the before attribute value state . U+0022 QUOTATION MARK (") Switch to the attribute value (double-quoted) state . U+0026 AMPERSAND (&) Switch to the attribute value (unquoted) state and reconsume this input character. U+0027 APOSTROPHE (') Switch to the attribute value (single-quoted) state . U+003E GREATER-THAN SIGN (>) Parse error . Emit the current tag token. Switch to the data state . U+003D EQUALS SIGN (=) Parse error . Treat it as per the "anything else" entry below. EOF Parse error . Reconsume the EOF character in the data state . Anything else Append the current input character to the current attribute's value. Switch to the attribute value (unquoted) state . 9.2.4.10 Attribute value (double-quoted) state Consume the next input character : U+0022 QUOTATION MARK (") Switch to the after attribute value (quoted) state . U+0026 AMPERSAND (&) Switch to the character reference in attribute value state , with the additional allowed character being U+0022 QUOTATION MARK ("). EOF Parse error . Reconsume the EOF character in the data state . Anything else Append the current input character to the current attribute's value. Stay in the attribute value (double-quoted) state . 9.2.4.11 Attribute value (single-quoted) state Consume the next input character : U+0027 APOSTROPHE (') Switch to the after attribute value (quoted) state . U+0026 AMPERSAND (&) Switch to the character reference in attribute value state , with the additional allowed character being U+0027 APOSTROPHE ('). EOF Parse error . Reconsume the EOF character in the data state . Anything else Append the current input character to the current attribute's value. Stay in the attribute value (single-quoted) state . 9.2.4.12 Attribute value (unquoted) state Consume the next input character : U+0009 CHARACTER TABULATION U+000A LINE FEED (LF) U+000C FORM FEED (FF) U+0020 SPACE Switch to the before attribute name state . U+0026 AMPERSAND (&) Switch to the character reference in attribute value state , with no additional allowed character . U+003E GREATER-THAN SIGN (>) Emit the current tag token. Switch to the data state . U+0022 QUOTATION MARK (") U+0027 APOSTROPHE (') U+003D EQUALS SIGN (=) Parse error . Treat it as per the "anything else" entry below. EOF Parse error . Reconsume the EOF character in the data state . Anything else Append the current input character to the current attribute's value. Stay in the attribute value (unquoted) state . 9.2.4.13 Character reference in attribute value state Attempt to consume a character reference . If nothing is returned, append a U+0026 AMPERSAND character to the current attribute's value. Otherwise, append the returned character token to the current attribute's value. Finally, switch back to the attribute value state that you were in when were switched into this state. 9.2.4.14 After attribute value (quoted) state Consume the next input character : U+0009 CHARACTER TABULATION U+000A LINE FEED (LF) U+000C FORM FEED (FF) U+0020 SPACE Switch to the before attribute name state . U+002F SOLIDUS (/) Switch to the self-closing start tag state . U+003E GREATER-THAN SIGN (>) Emit the current tag token. Switch to the data state . EOF Parse error . Reconsume the EOF character in the data state . Anything else Parse error . Reconsume the character in the before attribute name state . 9.2.4.15 Self-closing start tag state Consume the next input character : U+003E GREATER-THAN SIGN (>) Set the self-closing flag of the current tag token. Emit the current tag token. Switch to the data state . EOF Parse error . Reconsume the EOF character in the data state . Anything else Parse error . Reconsume the character in the before attribute name state . 9.2.4.16 Bogus comment state (This can only happen if the content model flag is set to the PCDATA state.) Consume every character up to and including the first U+003E GREATER-THAN SIGN character (>) or the end of the file (EOF), whichever comes first. Emit a comment token whose data is the concatenation of all the characters starting from and including the character that caused the state machine to switch into the bogus comment state, up to and including the character immediately before the last consumed character (i.e. up to the character just before the U+003E or EOF character). (If the comment was started by the end of the file (EOF), the token is empty.) Switch to the data state . If the end of the file was reached, reconsume the EOF character. 9.2.4.17 Markup declaration open state (This can only happen if the content model flag is set to the PCDATA state.) If the next two characters are both U+002D HYPHEN-MINUS (-) characters, consume those two characters, create a comment token whose data is the empty string, and switch to the comment start state . Otherwise, if the next seven characters are an ASCII case-insensitive match for the word "DOCTYPE", then consume those characters and switch to the DOCTYPE state . Otherwise, if the insertion mode is " in foreign content " and the current node is not an element in the HTML namespace and the next seven characters are an ASCII case-sensitive match for the string "[CDATA[" (the five uppercase letters "CDATA" with a U+005B LEFT SQUARE BRACKET character before and after), then consume those characters and switch to the CDATA section state (which is unrelated to the content model flag 's CDATA state). Otherwise, this is a parse error . Switch to the bogus comment state . The next character that is consumed, if any, is the first character that will be in the comment. 9.2.4.18 Comment start state Consume the next input character : U+002D HYPHEN-MINUS (-) Switch to the comment start dash state . U+003E GREATER-THAN SIGN (>) Parse error . Emit the comment token. Switch to the data state . EOF Parse error . Emit the comment token. Reconsume the EOF character in the data state . Anything else Append the input character to the comment token's data. Switch to the comment state . 9.2.4.19 Comment start dash state Consume the next input character : U+002D HYPHEN-MINUS (-) Switch to the comment end state U+003E GREATER-THAN SIGN (>) Parse error . Emit the comment token. Switch to the data state . EOF Parse error . Emit the comment token. Reconsume the EOF character in the data state . Anything else Append a U+002D HYPHEN-MINUS (-) character and the input character to the comment token's data. Switch to the comment state . 9.2.4.20 Comment state Consume the next input character : U+002D HYPHEN-MINUS (-) Switch to the comment end dash state EOF Parse error . Emit the comment token. Reconsume the EOF character in the data state . Anything else Append the input character to the comment token's data. Stay in the comment state . 9.2.4.21 Comment end dash state Consume the next input character : U+002D HYPHEN-MINUS (-) Switch to the comment end state EOF Parse error . Emit the comment token. Reconsume the EOF character in the data state . Anything else Append a U+002D HYPHEN-MINUS (-) character and the input character to the comment token's data. Switch to the comment state . 9.2.4.22 Comment end state Consume the next input character : U+003E GREATER-THAN SIGN (>) Emit the comment token. Switch to the data state . U+002D HYPHEN-MINUS (-) Parse error . Append a U+002D HYPHEN-MINUS (-) character to the comment token's data. Stay in the comment end state . EOF Parse error . Emit the comment token. Reconsume the EOF character in the data state . Anything else Parse error . Append two U+002D HYPHEN-MINUS (-) characters and the input character to the comment token's data. Switch to the comment state . 9.2.4.23 DOCTYPE state Consume the next input character : U+0009 CHARACTER TABULATION U+000A LINE FEED (LF) U+000C FORM FEED (FF) U+0020 SPACE Switch to the before DOCTYPE name state . EOF Parse error . Create a new DOCTYPE token. Set its force-quirks flag to on . Emit the token. Reconsume the EOF character in the data state . Anything else Parse error . Reconsume the current character in the before DOCTYPE name state . 9.2.4.24 Before DOCTYPE name state Consume the next input character : U+0009 CHARACTER TABULATION U+000A LINE FEED (LF) U+000C FORM FEED (FF) U+0020 SPACE Stay in the before DOCTYPE name state . U+003E GREATER-THAN SIGN (>) Parse error . Create a new DOCTYPE token. Set its force-quirks flag to on . Emit the token. Switch to the data state . U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z Create a new DOCTYPE token. Set the token's name to the lowercase version of the input character (add 0x0020 to the character's code point). Switch to the DOCTYPE name state . EOF Parse error . Create a new DOCTYPE token. Set its force-quirks flag to on . Emit the token. Reconsume the EOF character in the data state . Anything else Create a new DOCTYPE token. Set the token's name to the current input character . Switch to the DOCTYPE name state . 9.2.4.25 DOCTYPE name state Consume the next input character : U+0009 CHARACTER TABULATION U+000A LINE FEED (LF) U+000C FORM FEED (FF) U+0020 SPACE Switch to the after DOCTYPE name state . U+003E GREATER-THAN SIGN (>) Emit the current DOCTYPE token. Switch to the data state . U+0041 LATIN CAPITAL LETTER A through to U+005A LATIN CAPITAL LETTER Z Append the lowercase version of the input character (add 0x0020 to the character's code point) to the current DOCTYPE token's name. Stay in the DOCTYPE name state . EOF Parse error . Set the DOCTYPE token's force-quirks flag to on . Emit that DOCTYPE token. Reconsume the EOF character in the data state . Anything else Append the current input character to the current DOCTYPE token's name. Stay in the DOCTYPE name state . 9.2.4.26 After DOCTYPE name state Consume the next input character : U+0009 CHARACTER TABULATION U+000A LINE FEED (LF) U+000C FORM FEED (FF) U+0020 SPACE Stay in the after DOCTYPE name state . U+003E GREATER-THAN SIGN (>) Emit the current DOCTYPE token. Switch to the data state . EOF Parse error . Set the DOCTYPE token's force-quirks flag to on . Emit that DOCTYPE token. Reconsume the EOF character in the data state . Anything else If the six characters starting from the current input character are an ASCII case-insensitive match for the word "PUBLIC", then consume those characters and switch to the before DOCTYPE public identifier state . Otherwise, if the six characters starting from the current input character are an ASCII case-insensitive match for the word "SYSTEM", then consume those characters and switch to the before DOCTYPE system identifier state . Otherwise, this is the parse error . Set the DOCTYPE token's force-quirks flag to on . Switch to the bogus DOCTYPE state . 9.2.4.27 Before DOCTYPE public identifier state Consume the next input character : U+0009 CHARACTER TABULATION U+000A LINE FEED (LF) U+000C FORM FEED (FF) U+0020 SPACE Stay in the before DOCTYPE public identifier state . U+0022 QUOTATION MARK (") Set the DOCTYPE token's public identifier to the empty string (not missing), then switch to the DOCTYPE public identifier (double-quoted) state . U+0027 APOSTROPHE (') Set the DOCTYPE token's public identifier to the empty string (not missing), then switch to the DOCTYPE public identifier (single-quoted) state . U+003E GREATER-THAN SIGN (>) Parse error . Set the DOCTYPE token's force-quirks flag to on . Emit that DOCTYPE token. Switch to the data state . EOF Parse error . Set the DOCTYPE token's force-quirks flag to on . Emit that DOCTYPE token. Reconsume the EOF character in the data state . Anything else Parse error . Set the DOCTYPE token's force-quirks flag to on . Switch to the bogus DOCTYPE state . 9.2.4.28 DOCTYPE public identifier (double-quoted) state Consume the next input character : U+0022 QUOTATION MARK (") Switch to the after DOCTYPE public identifier state . U+003E GREATER-THAN SIGN (>) Parse error . Set the DOCTYPE token's force-quirks flag to on . Emit that DOCTYPE token. Switch to the data state . EOF Parse error . Set the DOCTYPE token's force-quirks flag to on . Emit that DOCTYPE token. Reconsume the EOF character in the data state . Anything else Append the current input character to the current DOCTYPE token's public identifier. Stay in the DOCTYPE public identifier (double-quoted) state . 9.2.4.29 DOCTYPE public identifier (single-quoted) state Consume the next input character : U+0027 APOSTROPHE (') Switch to the after DOCTYPE public identifier state . U+003E GREATER-THAN SIGN (>) Parse error . Set the DOCTYPE token's force-quirks flag to on . Emit that DOCTYPE token. Switch to the data state . EOF Parse error . Set the DOCTYPE token's force-quirks flag to on . Emit that DOCTYPE token. Reconsume the EOF character in the data state . Anything else Append the current input character to the current DOCTYPE token's public identifier. Stay in the DOCTYPE public identifier (single-quoted) state . 9.2.4.30 After DOCTYPE public identifier state Consume the next input character : U+0009 CHARACTER TABULATION U+000A LINE FEED (LF) U+000C FORM FEED (FF) U+0020 SPACE Stay in the after DOCTYPE public identifier state . U+0022 QUOTATION MARK (") Set the DOCTYPE token's system identifier to the empty string (not missing), then switch to the DOCTYPE system identifier (double-quoted) state . U+0027 APOSTROPHE (') Set the DOCTYPE token's system identifier to the empty string (not missing), then switch to the DOCTYPE system identifier (single-quoted) state . U+003E GREATER-THAN SIGN (>) Emit the current DOCTYPE token. Switch to the data state . EOF Parse error . Set the DOCTYPE token's force-quirks flag to on . Emit that DOCTYPE token. Reconsume the EOF character in the data state . Anything else Parse error . Set the DOCTYPE token's force-quirks flag to on . Switch to the bogus DOCTYPE state . 9.2.4.31 Before DOCTYPE system identifier state Consume the next input character : U+0009 CHARACTER TABULATION U+000A LINE FEED (LF) U+000C FORM FEED (FF) U+0020 SPACE Stay in the before DOCTYPE system identifier state . U+0022 QUOTATION MARK (") Set the DOCTYPE token's system identifier to the empty string (not missing), then switch to the DOCTYPE system identifier (double-quoted) state . U+0027 APOSTROPHE (') Set the DOCTYPE token's system identifier to the empty string (not missing), then switch to the DOCTYPE system identifier (single-quoted) state . U+003E GREATER-THAN SIGN (>) Parse error . Set the DOCTYPE token's force-quirks flag to on . Emit that DOCTYPE token. Switch to the data state . EOF Parse error . Set the DOCTYPE token's force-quirks flag to on . Emit that DOCTYPE token. Reconsume the EOF character in the data state . Anything else Parse error . Set the DOCTYPE token's force-quirks flag to on . Switch to the bogus DOCTYPE state . 9.2.4.32 DOCTYPE system identifier (double-quoted) state Consume the next input character : U+0022 QUOTATION MARK (") Switch to the after DOCTYPE system identifier state . U+003E GREATER-THAN SIGN (>) Parse error . Set the DOCTYPE token's force-quirks flag to on . Emit that DOCTYPE token. Switch to the data state . EOF Parse error . Set the DOCTYPE token's force-quirks flag to on . Emit that DOCTYPE token. Reconsume the EOF character in the data state . Anything else Append the current input character to the current DOCTYPE token's system identifier. Stay in the DOCTYPE system identifier (double-quoted) state . 9.2.4.33 DOCTYPE system identifier (single-quoted) state Consume the next input character : U+0027 APOSTROPHE (') Switch to the after DOCTYPE system identifier state . U+003E GREATER-THAN SIGN (>) Parse error . Set the DOCTYPE token's force-quirks flag to on . Emit that DOCTYPE token. Switch to the data state . EOF Parse error . Set the DOCTYPE token's force-quirks flag to on . Emit that DOCTYPE token. Reconsume the EOF character in the data state . Anything else Append the current input character to the current DOCTYPE token's system identifier. Stay in the DOCTYPE system identifier (single-quoted) state . 9.2.4.34 After DOCTYPE system identifier state Consume the next input character : U+0009 CHARACTER TABULATION U+000A LINE FEED (LF) U+000C FORM FEED (FF) U+0020 SPACE Stay in the after DOCTYPE system identifier state . U+003E GREATER-THAN SIGN (>) Emit the current DOCTYPE token. Switch to the data state . EOF Parse error . Set the DOCTYPE token's force-quirks flag to on . Emit that DOCTYPE token. Reconsume the EOF character in the data state . Anything else Parse error . Switch to the bogus DOCTYPE state . (This does not set the DOCTYPE token's force-quirks flag to on .) 9.2.4.35 Bogus DOCTYPE state Consume the next input character : U+003E GREATER-THAN SIGN (>) Emit the DOCTYPE token. Switch to the data state . EOF Emit the DOCTYPE token. Reconsume the EOF character in the data state . Anything else Stay in the bogus DOCTYPE state . 9.2.4.36 CDATA section state (This can only happen if the content model flag is set to the PCDATA state, and is unrelated to the content model flag 's CDATA state.) Consume every character up to the next occurrence of the three character sequence U+005D RIGHT SQUARE BRACKET U+005D RIGHT SQUARE BRACKET U+003E GREATER-THAN SIGN ( ]]> ), or the end of the file (EOF), whichever comes first. Emit a series of character tokens consisting of all the characters consumed except the matching three character sequence at the end (if one was found before the end of the file). Switch to the data state . If the end of the file was reached, reconsume the EOF character. 9.2.4.37 Tokenizing character references This section defines how to consume a character reference . This definition is used when parsing character references in text and in attributes . The behavior depends on the identity of the next character (the one immediately after the U+0026 AMPERSAND character): U+0009 CHARACTER TABULATION U+000A LINE FEED (LF) U+000C FORM FEED (FF) U+0020 SPACE U+003C LESS-THAN SIGN U+0026 AMPERSAND EOF The additional allowed character , if there is one Not a character reference. No characters are consumed, and nothing is returned. (This is not an error, either.) U+0023 NUMBER SIGN (#) Consume the U+0023 NUMBER SIGN. The behavior further depends on the character after the U+0023 NUMBER SIGN: U+0078 LATIN SMALL LETTER X U+0058 LATIN CAPITAL LETTER X Consume the X. Follow the steps below, but using the range of characters U+0030 DIGIT ZERO through to U+0039 DIGIT NINE, U+0061 LATIN SMALL LETTER A through to U+0066 LATIN SMALL LETTER F, and U+0041 LATIN CAPITAL LETTER A, through to U+0046 LATIN CAPITAL LETTER F (in other words, 0-9, A-F, a-f). When it comes to interpreting the number, interpret it as a hexadecimal number. Anything else Follow the steps below, but using the range of characters U+0030 DIGIT ZERO through to U+0039 DIGIT NINE (i.e. just 0-9). When it comes to interpreting the number, interpret it as a decimal number. Consume as many characters as match the range of characters given above. If no characters match the range, then don't consume any characters (and unconsume the U+0023 NUMBER SIGN character and, if appropriate, the X character). This is a parse error ; nothing is returned. Otherwise, if the next character is a U+003B SEMICOLON, consume that too. If it isn't, there is a parse error . If one or more characters match the range, then take them all and interpret the string of characters as a number (either hexadecimal or decimal as appropriate). If that number is one of the numbers in the first column of the following table, then this is a parse error . Find the row with that number in the first column, and return a character token for the Unicode character given in the second column of that row. Number Unicode character 0x0D U+000A LINE FEED (LF) 0x80 U+20AC EURO SIGN ('€') 0x81 U+FFFD REPLACEMENT CHARACTER 0x82 U+201A SINGLE LOW-9 QUOTATION MARK ('‚') 0x83 U+0192 LATIN SMALL LETTER F WITH HOOK ('ƒ') 0x84 U+201E DOUBLE LOW-9 QUOTATION MARK ('„') 0x85 U+2026 HORIZONTAL ELLIPSIS ('…') 0x86 U+2020 DAGGER ('†') 0x87 U+2021 DOUBLE DAGGER ('‡') 0x88 U+02C6 MODIFIER LETTER CIRCUMFLEX ACCENT ('ˆ') 0x89 U+2030 PER MILLE SIGN ('‰') 0x8A U+0160 LATIN CAPITAL LETTER S WITH CARON ('Š') 0x8B U+2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK ('‹') 0x8C U+0152 LATIN CAPITAL LIGATURE OE ('Œ') 0x8D U+FFFD REPLACEMENT CHARACTER 0x8E U+017D LATIN CAPITAL LETTER Z WITH CARON ('Ž') 0x8F U+FFFD REPLACEMENT CHARACTER 0x90 U+FFFD REPLACEMENT CHARACTER 0x91 U+2018 LEFT SINGLE QUOTATION MARK ('‘') 0x92 U+2019 RIGHT SINGLE QUOTATION MARK ('’') 0x93 U+201C LEFT DOUBLE QUOTATION MARK ('“') 0x94 U+201D RIGHT DOUBLE QUOTATION MARK ('”') 0x95 U+2022 BULLET ('•') 0x96 U+2013 EN DASH ('–') 0x97 U+2014 EM DASH ('—') 0x98 U+02DC SMALL TILDE ('˜') 0x99 U+2122 TRADE MARK SIGN ('™') 0x9A U+0161 LATIN SMALL LETTER S WITH CARON ('š') 0x9B U+203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK ('›') 0x9C U+0153 LATIN SMALL LIGATURE OE ('œ') 0x9D U+FFFD REPLACEMENT CHARACTER 0x9E U+017E LATIN SMALL LETTER Z WITH CARON ('ž') 0x9F U+0178 LATIN CAPITAL LETTER Y WITH DIAERESIS ('Ÿ') Otherwise, if the number is in the range 0x0000 to 0x0008, 0x007F to 0x009F, 0xD800 to 0xDFFF, 0xFDD0 to 0xFDEF, or is one of 0x000B, 0xFFFE, 0xFFFF, 0x1FFFE, 0x1FFFF, 0x2FFFE, 0x2FFFF, 0x3FFFE, 0x3FFFF, 0x4FFFE, 0x4FFFF, 0x5FFFE, 0x5FFFF, 0x6FFFE, 0x6FFFF, 0x7FFFE, 0x7FFFF, 0x8FFFE, 0x8FFFF, 0x9FFFE, 0x9FFFF, 0xAFFFE, 0xAFFFF, 0xBFFFE, 0xBFFFF, 0xCFFFE, 0xCFFFF, 0xDFFFE, 0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE, 0xFFFFF, 0x10FFFE, or 0x10FFFF, or is higher than 0x10FFFF, then this is a parse error ; return a character token for the U+FFFD REPLACEMENT CHARACTER character instead. Otherwise, return a character token for the Unicode character whose code point is that number. Anything else Consume the maximum number of characters possible, with the consumed characters matching one of the identifiers in the first column of the named character references table (in a case-sensitive manner). If no match can be made, then this is a parse error . No characters are consumed, and nothing is returned. If the last character matched is not a U+003B SEMICOLON ( ; ), there is a parse error . If the character reference is being consumed as part of an attribute , and the last character matched is not a U+003B SEMICOLON ( ; ), and the next character is in the range U+0030 DIGIT ZERO to U+0039 DIGIT NINE, U+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL LETTER Z, or U+0061 LATIN SMALL LETTER A to U+007A LATIN SMALL LETTER Z, then, for historical reasons, all the characters that were matched after the U+0026 AMPERSAND (&) must be unconsumed, and nothing is returned. Otherwise, return a character token for the character corresponding to the character reference name (as given by the second column of the named character references table). If the markup contains I'm &notit; I tell you , the character reference is parsed as "not", as in, I'm ¬it; I tell you . But if the markup was I'm &notin; I tell you , the character reference would be parsed as "notin;", resulting in I'm ∉ I tell you . 9.2.5 Tree construction The input to the tree construction stage is a sequence of tokens from the tokenization stage. The tree construction stage is associated with a DOM Document object when a parser is created. The "output" of this stage consists of dynamically modifying or extending that document's DOM tree. This specification does not define when an interactive user agent has to render the Document so that it is available to the user, or when it has to begin accepting user input. As each token is emitted from the tokenizer, the user agent must process the token according to the rules given in the section corresponding to the current insertion mode . When the steps below require the UA to insert a character into a node, if that node has a child immediately before where the character is to be inserted, and that child is a Text node, and that Text node was the last node that the parser inserted into the document, then the character must be appended to that Text node; otherwise, a new Text node whose data is just that character must be inserted in the appropriate place. Here are some sample inputs to the parser and the corresponding number of text nodes that they result in, assuming a user agent that executes scripts. Input Number of text nodes A<script> var script = document.getElementsByTagName('script')[0]; document.body.removeChild(script); </script>B Two adjacent text nodes in the document, containing "A" and "B". A<script> var text = document.createTextNode('B'); document.body.appendChild(text); </script>C Four text nodes; "A" before the script, the script's contents, "B" after the script, and then, immediately after that, "C". A<script> var text = document.getElementsByTagName('script')[0].firstChild; text.data = 'B'; document.body.appendChild(text); </script>B Two adjacent text nodes in the document, containing "A" and "BB". A<table>B<tr>C</tr>C</table> Three adjacent text nodes before the table, containing "A", "B", and "CC" respectively. (This is caused by foster parenting .) A<table><tr> B</tr> B</table> Two adjacent text nodes before the table, containing "A" and "B B" respectively, and one text node inside the table with a single space character. (This is caused by foster parenting and tainting .) DOM mutation events must not fire for changes caused by the UA parsing the document. (Conceptually, the parser is not mutating the DOM, it is constructing it.) This includes the parsing of any content inserted using document.write() and document.writeln() calls. [DOM3EVENTS] Not all of the tag names mentioned below are conformant tag names in this specification; many are included to handle legacy content. They still form part of the algorithm that implementations are required to implement to claim conformance. The algorithm described below places no limit on the depth of the DOM tree generated, or on the length of tag names, attribute names, attribute values, text nodes, etc. While implementors are encouraged to avoid arbitrary limits, it is recognized that practical concerns will likely force user agents to impose nesting depths. 9.2.5.1 Creating and inserting elements When the steps below require the UA to create an element for a token in a particular namespace, the UA must create a node implementing the interface appropriate for the element type corresponding to the tag name of the token in the given namespace (as given in the specification that defines that element, e.g. for an a element in the HTML namespace , this specification defines it to be the HTMLAnchorElement interface), with the tag name being the name of that element, with the node being in the given namespace, and with the attributes on the node being those given in the given token. The interface appropriate for an element in the HTML namespace that is not defined in this specification is HTMLElement . Element in other namespaces whose interface is not defined by that namespace's specification must use the interface Element . When a resettable element is created in this manner, its reset algorithm must be invoked once the attributes are set. (This initializes the element's value and checkedness based on the element's attributes.) When the steps below require the UA to insert an HTML element for a token, the UA must first create an element for the token in the HTML namespace , and then append this node to the current node , and push it onto the stack of open elements so that it is the new current node . The steps below may also require that the UA insert an HTML element in a particular place, in which case the UA must follow the same steps except that it must insert or append the new node in the location specified instead of appending it to the current node . (This happens in particular during the parsing of tables with invalid content.) If an element created by the insert an HTML element algorithm is a form-associated element , and the form element pointer is not null, and the newly created element doesn't have a form attribute, the user agent must associate the newly created element with the form element pointed to by the form element pointer before inserting it wherever it is to be inserted. When the steps below require the UA to insert a foreign element for a token, the UA must first create an element for the token in the given namespace, and then append this node to the current node , and push it onto the stack of open elements so that it is the new current node . If the newly created element has an xmlns attribute in the XMLNS namespace whose value is not exactly the same as the element's namespace, that is a parse error . Similarly, if the newly created element has an xmlns:xlink attribute in the XMLNS namespace whose value is not the XLink Namespace , that is a parse error . When the steps below require the user agent to adjust MathML attributes for a token, then, if the token has an attribute named definitionurl , change its name to definitionURL (note the case difference). When the steps below require the user agent to adjust SVG attributes for a token, then, for each attribute on the token whose attribute name is one of the ones in the first column of the following table, change the attribute's name to the name given in the corresponding cell in the second column. (This fixes the case of SVG attributes that are not all lowercase.) Attribute name on token Attribute name on element attributename attributeName attributetype attributeType basefrequency baseFrequency baseprofile baseProfile calcmode calcMode clippathunits clipPathUnits contentscripttype contentScriptType contentstyletype contentStyleType diffuseconstant diffuseConstant edgemode edgeMode externalresourcesrequired externalResourcesRequired filterres filterRes filterunits filterUnits glyphref glyphRef gradienttransform gradientTransform gradientunits gradientUnits kernelmatrix kernelMatrix kernelunitlength kernelUnitLength keypoints keyPoints keysplines keySplines keytimes keyTimes lengthadjust lengthAdjust limitingconeangle limitingConeAngle markerheight markerHeight markerunits markerUnits markerwidth markerWidth maskcontentunits maskContentUnits maskunits maskUnits numoctaves numOctaves pathlength pathLength patterncontentunits patternContentUnits patterntransform patternTransform patternunits patternUnits pointsatx pointsAtX pointsaty pointsAtY pointsatz pointsAtZ preservealpha preserveAlpha preserveaspectratio preserveAspectRatio primitiveunits primitiveUnits refx refX refy refY repeatcount repeatCount repeatdur repeatDur requiredextensions requiredExtensions requiredfeatures requiredFeatures specularconstant specularConstant specularexponent specularExponent spreadmethod spreadMethod startoffset startOffset stddeviation stdDeviation stitchtiles stitchTiles surfacescale surfaceScale systemlanguage systemLanguage tablevalues tableValues targetx targetX targety targetY textlength textLength viewbox viewBox viewtarget viewTarget xchannelselector xChannelSelector ychannelselector yChannelSelector zoomandpan zoomAndPan When the steps below require the user agent to adjust foreign attributes for a token, then, if any of the attributes on the token match the strings given in the first column of the following table, let the attribute be a namespaced attribute, with the prefix being the string given in the corresponding cell in the second column, the local name being the string given in the corresponding cell in the third column, and the namespace being the namespace given in the corresponding cell in the fourth column. (This fixes the use of namespaced attributes, in particular xml:lang .) Attribute name Prefix Local name Namespace xlink:actuate xlink actuate XLink namespace xlink:arcrole xlink arcrole XLink namespace xlink:href xlink href XLink namespace xlink:role xlink role XLink namespace xlink:show xlink show XLink namespace xlink:title xlink title XLink namespace xlink:type xlink type XLink namespace xml:base xml base XML namespace xml:lang xml lang XML namespace xml:space xml space XML namespace xmlns (none) xmlns XMLNS namespace xmlns:xlink xmlns xlink XMLNS namespace The generic CDATA element parsing algorithm and the generic RCDATA element parsing algorithm consist of the following steps. These algorithms are always invoked in response to a start tag token. Insert an HTML element for the token. If the algorithm that was invoked is the generic CDATA element parsing algorithm , switch the tokenizer's content model flag to the CDATA state; otherwise the algorithm invoked was the generic RCDATA element parsing algorithm , switch the tokenizer's content model flag to the RCDATA state. Let the original insertion mode be the current insertion mode . Then, switch the insertion mode to " in CDATA/RCDATA ". 9.2.5.2 Closing elements that have implied end tags When the steps below require the UA to generate implied end tags , then, while the current node is a dd element, a dt element, an li element, an option element, an optgroup element, a p element, an rp element, or an rt element, the UA must pop the current node off the stack of open elements . If a step requires the UA to generate implied end tags but lists an element to exclude from the process, then the UA must perform the above steps as if that element was not in the above list. 9.2.5.3 Foster parenting Foster parenting happens when content is misnested in tables. When a node node is to be foster parented , the node node must be inserted into the foster parent element , and the current table must be marked as tainted . (Once the current table has been tainted , whitespace characters are inserted into the foster parent element instead of the current node .) The foster parent element is the parent element of the last table element in the stack of open elements , if there is a table element and it has such a parent element. If there is no table element in the stack of open elements ( fragment case ), then the foster parent element is the first element in the stack of open elements (the html element). Otherwise, if there is a table element in the stack of open elements , but the last table element in the stack of open elements has no parent, or its parent node is not an element, then the foster parent element is the element before the last table element in the stack of open elements . If the foster parent element is the parent element of the last table element in the stack of open elements , then node must be inserted immediately before the last table element in the stack of open elements in the foster parent element ; otherwise, node must be appended to the foster parent element . 9.2.5.4 The " initial " insertion mode When the insertion mode is " initial ", tokens must be handled as follows: A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE Ignore the token. A comment token Append a Comment node to the Document object with the data attribute set to the data given in the comment token. A DOCTYPE token If the DOCTYPE token's name is not a case-sensitive match for the string " html ", or if the token's public identifier is not missing, or if the token's system identifier is neither missing nor a case-sensitive match for the string " about:legacy-compat ", then there is a parse error (this is the DOCTYPE parse error ). Conformance checkers may, instead of reporting this error, switch to a conformance checking mode for another language (e.g. based on the DOCTYPE token a conformance checker could recognize that the document is an HTML4-era document, and defer to an HTML4 conformance checker.) Append a DocumentType node to the Document node, with the name attribute set to the name given in the DOCTYPE token, or the empty string if the name was missing; the publicId attribute set to the public identifier given in the DOCTYPE token, or the empty string if the public identifier was missing; the systemId attribute set to the system identifier given in the DOCTYPE token, or the empty string if the system identifier was missing; and the other attributes specific to DocumentType objects set to null and empty lists as appropriate. Associate the DocumentType node with the Document object so that it is returned as the value of the doctype attribute of the Document object. Then, if the DOCTYPE token matches one of the conditions in the following list, then set the document to quirks mode : The force-quirks flag is set to on . The name is set to anything other than " HTML ". The public identifier starts with: " +//Silmaril//dtd html Pro v0r11 19970101// " The public identifier starts with: " -//AdvaSoft Ltd//DTD HTML 3.0 asWedit + extensions// " The public identifier starts with: " -//AS//DTD HTML 3.0 asWedit + extensions// " The public identifier starts with: " -//IETF//DTD HTML 2.0 Level 1// " The public identifier starts with: " -//IETF//DTD HTML 2.0 Level 2// " The public identifier starts with: " -//IETF//DTD HTML 2.0 Strict Level 1// " The public identifier starts with: " -//IETF//DTD HTML 2.0 Strict Level 2// " The public identifier starts with: " -//IETF//DTD HTML 2.0 Strict// " The public identifier starts with: " -//IETF//DTD HTML 2.0// " The public identifier starts with: " -//IETF//DTD HTML 2.1E// " The public identifier starts with: " -//IETF//DTD HTML 3.0// " The public identifier starts with: " -//IETF//DTD HTML 3.2 Final// " The public identifier starts with: " -//IETF//DTD HTML 3.2// " The public identifier starts with: " -//IETF//DTD HTML 3// " The public identifier starts with: " -//IETF//DTD HTML Level 0// " The public identifier starts with: " -//IETF//DTD HTML Level 1// " The public identifier starts with: " -//IETF//DTD HTML Level 2// " The public identifier starts with: " -//IETF//DTD HTML Level 3// " The public identifier starts with: " -//IETF//DTD HTML Strict Level 0// " The public identifier starts with: " -//IETF//DTD HTML Strict Level 1// " The public identifier starts with: " -//IETF//DTD HTML Strict Level 2// " The public identifier starts with: " -//IETF//DTD HTML Strict Level 3// " The public identifier starts with: " -//IETF//DTD HTML Strict// " The public identifier starts with: " -//IETF//DTD HTML// " The public identifier starts with: " -//Metrius//DTD Metrius Presentational// " The public identifier starts with: " -//Microsoft//DTD Internet Explorer 2.0 HTML Strict// " The public identifier starts with: " -//Microsoft//DTD Internet Explorer 2.0 HTML// " The public identifier starts with: " -//Microsoft//DTD Internet Explorer 2.0 Tables// " The public identifier starts with: " -//Microsoft//DTD Internet Explorer 3.0 HTML Strict// " The public identifier starts with: " -//Microsoft//DTD Internet Explorer 3.0 HTML// " The public identifier starts with: " -//Microsoft//DTD Internet Explorer 3.0 Tables// " The public identifier starts with: " -//Netscape Comm. Corp.//DTD HTML// " The public identifier starts with: " -//Netscape Comm. Corp.//DTD Strict HTML// " The public identifier starts with: " -//O'Reilly and Associates//DTD HTML 2.0// " The public identifier starts with: " -//O'Reilly and Associates//DTD HTML Extended 1.0// " The public identifier starts with: " -//O'Reilly and Associates//DTD HTML Extended Relaxed 1.0// " The public identifier starts with: " -//SoftQuad Software//DTD HoTMetaL PRO 6.0::19990601::extensions to HTML 4.0// " The public identifier starts with: " -//SoftQuad//DTD HoTMetaL PRO 4.0::19971010::extensions to HTML 4.0// " The public identifier starts with: " -//Spyglass//DTD HTML 2.0 Extended// " The public identifier starts with: " -//SQ//DTD HTML 2.0 HoTMetaL + extensions// " The public identifier starts with: " -//Sun Microsystems Corp.//DTD HotJava HTML// " The public identifier starts with: " -//Sun Microsystems Corp.//DTD HotJava Strict HTML// " The public identifier starts with: " -//W3C//DTD HTML 3 1995-03-24// " The public identifier starts with: " -//W3C//DTD HTML 3.2 Draft// " The public identifier starts with: " -//W3C//DTD HTML 3.2 Final// " The public identifier starts with: " -//W3C//DTD HTML 3.2// " The public identifier starts with: " -//W3C//DTD HTML 3.2S Draft// " The public identifier starts with: " -//W3C//DTD HTML 4.0 Frameset// " The public identifier starts with: " -//W3C//DTD HTML 4.0 Transitional// " The public identifier starts with: " -//W3C//DTD HTML Experimental 19960712// " The public identifier starts with: " -//W3C//DTD HTML Experimental 970421// " The public identifier starts with: " -//W3C//DTD W3 HTML// " The public identifier starts with: " -//W3O//DTD W3 HTML 3.0// " The public identifier is set to: " -//W3O//DTD W3 HTML Strict 3.0//EN// " The public identifier starts with: " -//WebTechs//DTD Mozilla HTML 2.0// " The public identifier starts with: " -//WebTechs//DTD Mozilla HTML// " The public identifier is set to: " -/W3C/DTD HTML 4.0 Transitional/EN " The public identifier is set to: " HTML " The system identifier is set to: " http://www.ibm.com/data/dtd/v11/ibmxhtml1-transitional.dtd " The system identifier is missing and the public identifier starts with: " -//W3C//DTD HTML 4.01 Frameset// " The system identifier is missing and the public identifier starts with: " -//W3C//DTD HTML 4.01 Transitional// " Otherwise, if the DOCTYPE token matches one of the conditions in the following list, then set the document to limited quirks mode : The public identifier starts with: " -//W3C//DTD XHTML 1.0 Frameset// " The public identifier starts with: " -//W3C//DTD XHTML 1.0 Transitional// " The system identifier is not missing and the public identifier starts with: " -//W3C//DTD HTML 4.01 Frameset// " The system identifier is not missing and the public identifier starts with: " -//W3C//DTD HTML 4.01 Transitional// " The name, system identifier, and public identifier strings must be compared to the values given in the lists above in an ASCII case-insensitive manner. A system identifier whose value is the empty string is not considered missing for the purposes of the conditions above. Then, switch the insertion mode to " before html ". Anything else Parse error . Set the document to quirks mode . Switch the insertion mode to " before html ", then reprocess the current token. 9.2.5.5 The " before html " insertion mode When the insertion mode is " before html ", tokens must be handled as follows: A DOCTYPE token Parse error . Ignore the token. A comment token Append a Comment node to the Document object with the data attribute set to the data given in the comment token. A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE Ignore the token. A start tag whose tag name is "html" Create an element for the token in the HTML namespace . Append it to the Document object. Put this element in the stack of open elements . If the Document is being loaded as part of navigation of a browsing context , then: if the newly created element has a manifest attribute, then resolve the value of that attribute to an absolute URL , relative to the newly created element, and if that is successful, run the application cache selection algorithm with the resulting absolute URL ; otherwise, if there is no such attribute or resolving it fails, run the application cache selection algorithm with no manifest. The algorithm must be passed the Document object. Switch the insertion mode to " before head ". Anything else Create an html element. Append it to the Document object. Put this element in the stack of open elements . If the Document is being loaded as part of navigation of a browsing context , then: run the application cache selection algorithm with no manifest, passing it the Document object. Switch the insertion mode to " before head ", then reprocess the current token. Should probably make end tags be ignored, so that "</head><!-- --><html>" puts the comment before the root node (or should we?) The root element can end up being removed from the Document object, e.g. by scripts; nothing in particular happens in such cases, content continues being appended to the nodes as described in the next section. 9.2.5.6 The " before head " insertion mode When the insertion mode is " before head ", tokens must be handled as follows: A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE Ignore the token. A comment token Append a Comment node to the current node with the data attribute set to the data given in the comment token. A DOCTYPE token Parse error . Ignore the token. A start tag whose tag name is "html" Process the token using the rules for the " in body " insertion mode . A start tag whose tag name is "head" Insert an HTML element for the token. Set the head element pointer to the newly created head element. Switch the insertion mode to " in head ". An end tag whose tag name is one of: "head", "body", "html", "br" Act as if a start tag token with the tag name "head" and no attributes had been seen, then reprocess the current token. Any other end tag Parse error . Ignore the token. Anything else Act as if a start tag token with the tag name "head" and no attributes had been seen, then reprocess the current token. This will result in an empty head element being generated, with the current token being reprocessed in the " after head " insertion mode . 9.2.5.7 The " in head " insertion mode When the insertion mode is " in head ", tokens must be handled as follows: A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE Insert the character into the current node . A comment token Append a Comment node to the current node with the data attribute set to the data given in the comment token. A DOCTYPE token Parse error . Ignore the token. A start tag whose tag name is "html" Process the token using the rules for the " in body " insertion mode . A start tag whose tag name is one of: "base", "command", "link" Insert an HTML element for the token. Immediately pop the current node off the stack of open elements . Acknowledge the token's self-closing flag , if it is set. A start tag whose tag name is "meta" Insert an HTML element for the token. Immediately pop the current node off the stack of open elements . Acknowledge the token's self-closing flag , if it is set. If the element has a charset attribute, and its value is a supported encoding, and the confidence is currently tentative , then change the encoding to the encoding given by the value of the charset attribute. Otherwise, if the element has a content attribute, and applying the algorithm for extracting an encoding from a Content-Type to its value returns a supported encoding encoding , and the confidence is currently tentative , then change the encoding to the encoding encoding . A start tag whose tag name is "title" Follow the generic RCDATA element parsing algorithm . A start tag whose tag name is "noscript", if the scripting flag is enabled A start tag whose tag name is one of: "noframes", "style" Follow the generic CDATA element parsing algorithm . A start tag whose tag name is "noscript", if the scripting flag is disabled Insert an HTML element for the token. Switch the insertion mode to " in head noscript ". A start tag whose tag name is "script" Create an element for the token in the HTML namespace . Mark the element as being "parser-inserted" . This ensures that, if the script is external, any document.write() calls in the script will execute in-line, instead of blowing the document away, as would happen in most other cases. It also prevents the script from executing until the end tag is seen. If the parser was originally created for the HTML fragment parsing algorithm , then mark the script element as "already executed" . ( fragment case ) Append the new element to the current node and push it onto the stack of open elements . Switch the tokenizer's content model flag to the CDATA state. Let the original insertion mode be the current insertion mode . Switch the insertion mode to " in CDATA/RCDATA ". An end tag whose tag name is "head" Pop the current node (which will be the head element) off the stack of open elements . Switch the insertion mode to " after head ". An end tag whose tag name is one of: "body", "html", "br" Act as described in the "anything else" entry below. A start tag whose tag name is "head" Any other end tag Parse error . Ignore the token. Anything else Act as if an end tag token with the tag name "head" had been seen, and reprocess the current token. In certain UAs, some elements don't trigger the "in body" mode straight away, but instead get put into the head. Do we want to copy that? 9.2.5.8 The " in head noscript " insertion mode When the insertion mode is " in head noscript ", tokens must be handled as follows: A DOCTYPE token Parse error . Ignore the token. A start tag whose tag name is "html" Process the token using the rules for the " in body " insertion mode . An end tag whose tag name is "noscript" Pop the current node (which will be a noscript element) from the stack of open elements ; the new current node will be a head element. Switch the insertion mode to " in head ". A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE A comment token A start tag whose tag name is one of: "link", "meta", "noframes", "style" Process the token using the rules for the " in head " insertion mode . An end tag whose tag name is "br" Act as described in the "anything else" entry below. A start tag whose tag name is one of: "head", "noscript" Any other end tag Parse error . Ignore the token. Anything else Parse error . Act as if an end tag with the tag name "noscript" had been seen and reprocess the current token. 9.2.5.9 The " after head " insertion mode When the insertion mode is " after head ", tokens must be handled as follows: A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE Insert the character into the current node . A comment token Append a Comment node to the current node with the data attribute set to the data given in the comment token. A DOCTYPE token Parse error . Ignore the token. A start tag whose tag name is "html" Process the token using the rules for the " in body " insertion mode . A start tag whose tag name is "body" Insert an HTML element for the token. Set the frameset-ok flag to "not ok". Switch the insertion mode to " in body ". A start tag whose tag name is "frameset" Insert an HTML element for the token. Switch the insertion mode to " in frameset ". A start tag token whose tag name is one of: "base", "link", "meta", "noframes", "script", "style", "title" Parse error . Push the node pointed to by the head element pointer onto the stack of open elements . Process the token using the rules for the " in head " insertion mode . Remove the node pointed to by the head element pointer from the stack of open elements . The head element pointer cannot be null at this point. An end tag whose tag name is one of: "body", "html", "br" Act as described in the "anything else" entry below. A start tag whose tag name is "head" Any other end tag Parse error . Ignore the token. Anything else Act as if a start tag token with the tag name "body" and no attributes had been seen, then set the frameset-ok flag back to "ok", and then reprocess the current token. 9.2.5.10 The " in body " insertion mode When the insertion mode is " in body ", tokens must be handled as follows: A character token Reconstruct the active formatting elements , if any. Insert the token's character into the current node . If the token is not one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE, then set the frameset-ok flag to "not ok". A comment token Append a Comment node to the current node with the data attribute set to the data given in the comment token. A DOCTYPE token Parse error . Ignore the token. A start tag whose tag name is "html" Parse error . For each attribute on the token, check to see if the attribute is already present on the top element of the stack of open elements . If it is not, add the attribute and its corresponding value to that element. A start tag token whose tag name is one of: "base", "command", "link", "meta", "noframes", "script", "style", "title" Process the token using the rules for the " in head " insertion mode . A start tag whose tag name is "body" Parse error . If the second element on the stack of open elements is not a body element, or, if the stack of open elements has only one node on it, then ignore the token. ( fragment case ) Otherwise, for each attribute on the token, check to see if the attribute is already present on the body element (the second element) on the stack of open elements . If it is not, add the attribute and its corresponding value to that element. A start tag whose tag name is "frameset" Parse error . If the second element on the stack of open elements is not a body element, or, if the stack of open elements has only one node on it, then ignore the token. ( fragment case ) If the frameset-ok flag is set to "not ok", ignore the token. Otherwise, run the following steps: Remove the second element on the stack of open elements from its parent node, if it has one. Pop all the nodes from the bottom of the stack of open elements , from the current node up to the root html element. Insert an HTML element for the token. Switch the insertion mode to " in frameset ". An end-of-file token If there is a node in the stack of open elements that is not either a dd element, a dt element, an li element, a p element, a tbody element, a td element, a tfoot element, a th element, a thead element, a tr element, the body element, or the html element, then this is a parse error . Stop parsing . An end tag whose tag name is "body" If the stack of open elements does not have a body element in scope , this is a parse error ; ignore the token. Otherwise, if there is a node in the stack of open elements that is not either a dd element, a dt element, an li element, an optgroup element, an option element, a p element, an rp element, an rt element, a tbody element, a td element, a tfoot element, a th element, a thead element, a tr element, the body element, or the html element, then this is a parse error . Switch the insertion mode to " after body ". An end tag whose tag name is "html" Act as if an end tag with tag name "body" had been seen, then, if that token wasn't ignored, reprocess the current token. The fake end tag token here can only be ignored in the fragment case . A start tag whose tag name is one of: "address", "article", "aside", "blockquote", "center", "datagrid", "details", "dialog", "dir", "div", "dl", "fieldset", "figure", "footer", "header", "hgroup", "menu", "nav", "ol", "p", "section", "ul" If the stack of open elements has a p element in scope , then act as if an end tag with the tag name "p" had been seen. Insert an HTML element for the token. A start tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5", "h6" If the stack of open elements has a p element in scope , then act as if an end tag with the tag name "p" had been seen. If the current node is an element whose tag name is one of "h1", "h2", "h3", "h4", "h5", or "h6", then this is a parse error ; pop the current node off the stack of open elements . Insert an HTML element for the token. A start tag whose tag name is one of: "pre", "listing" If the stack of open elements has a p element in scope , then act as if an end tag with the tag name "p" had been seen. Insert an HTML element for the token. If the next token is a U+000A LINE FEED (LF) character token, then ignore that token and move on to the next one. (Newlines at the start of pre blocks are ignored as an authoring convenience.) Set the frameset-ok flag to "not ok". A start tag whose tag name is "form" If the form element pointer is not null, then this is a parse error ; ignore the token. Otherwise: If the stack of open elements has a p element in scope , then act as if an end tag with the tag name "p" had been seen. Insert an HTML element for the token, and set the form element pointer to point to the element created. A start tag whose tag name is "li" Run the following algorithm: Set the frameset-ok flag to "not ok". Initialize node to be the current node (the bottommost node of the stack). If node is an li element, then act as if an end tag with the tag name "li" had been seen, then jump to the last step. If node is not in the formatting category, and is not in the phrasing category, and is not an address , div , or p element, then jump to the last step. Otherwise, set node to the previous entry in the stack of open elements and return to step 2. This is the last step. If the stack of open elements has a p element in scope , then act as if an end tag with the tag name "p" had been seen. Finally, insert an HTML element for the token. A start tag whose tag name is one of: "dd", "dt" Run the following algorithm: Set the frameset-ok flag to "not ok". Initialize node to be the current node (the bottommost node of the stack). If node is a dd or dt element, then act as if an end tag with the same tag name as node had been seen, then jump to the last step. If node is not in the formatting category, and is not in the phrasing category, and is not an address , div , or p element, then jump to the last step. Otherwise, set node to the previous entry in the stack of open elements and return to step 2. This is the last step. If the stack of open elements has a p element in scope , then act as if an end tag with the tag name "p" had been seen. Finally, insert an HTML element for the token. A start tag whose tag name is "plaintext" If the stack of open elements has a p element in scope , then act as if an end tag with the tag name "p" had been seen. Insert an HTML element for the token. Switch the content model flag to the PLAINTEXT state. Once a start tag with the tag name "plaintext" has been seen, that will be the last token ever seen other than character tokens (and the end-of-file token), because there is no way to switch the content model flag out of the PLAINTEXT state. An end tag whose tag name is one of: "address", "article", "aside", "blockquote", "center", "datagrid", "details", "dialog", "dir", "div", "dl", "fieldset", "figure", "footer", "header", "hgroup", "listing", "menu", "nav", "ol", "pre", "section", "ul" If the stack of open elements does not have an element in scope with the same tag name as that of the token, then this is a parse error ; ignore the token. Otherwise, run these steps: Generate implied end tags . If the current node is not an element with the same tag name as that of the token, then this is a parse error . Pop elements from the stack of open elements until an element with the same tag name as the token has been popped from the stack. An end tag whose tag name is "form" Let node be the element that the form element pointer is set to. Set the form element pointer to null. If node is null or the stack of open elements does not have node in scope , then this is a parse error ; ignore the token. Otherwise, run these steps: Generate implied end tags . If the current node is not node , then this is a parse error . Remove node from the stack of open elements . An end tag whose tag name is "p" If the stack of open elements does not have an element in scope with the same tag name as that of the token, then this is a parse error ; act as if a start tag with the tag name "p" had been seen, then reprocess the current token. Otherwise, run these steps: Generate implied end tags , except for elements with the same tag name as the token. If the current node is not an element with the same tag name as that of the token, then this is a parse error . Pop elements from the stack of open elements until an element with the same tag name as the token has been popped from the stack. An end tag whose tag name is one of: "dd", "dt", "li" If the stack of open elements does not have an element in scope with the same tag name as that of the token, then this is a parse error ; ignore the token. Otherwise, run these steps: Generate implied end tags , except for elements with the same tag name as the token. If the current node is not an element with the same tag name as that of the token, then this is a parse error . Pop elements from the stack of open elements until an element with the same tag name as the token has been popped from the stack. An end tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5", "h6" If the stack of open elements does not have an element in scope whose tag name is one of "h1", "h2", "h3", "h4", "h5", or "h6", then this is a parse error ; ignore the token. Otherwise, run these steps: Generate implied end tags . If the current node is not an element with the same tag name as that of the token, then this is a parse error . Pop elements from the stack of open elements until an element whose tag name is one of "h1", "h2", "h3", "h4", "h5", or "h6" has been popped from the stack. An end tag whose tag name is "sarcasm" Take a deep breath, then act as described in the "any other end tag" entry below. A start tag whose tag name is "a" If the list of active formatting elements contains an element whose tag name is "a" between the end of the list and the last marker on the list (or the start of the list if there is no marker on the list), then this is a parse error ; act as if an end tag with the tag name "a" had been seen, then remove that element from the list of active formatting elements and the stack of open elements if the end tag didn't already remove it (it might not have if the element is not in table scope ). In the non-conforming stream <a href="a">a<table><a href="b">b</table>x , the first a element would be closed upon seeing the second one, and the "x" character would be inside a link to "b", not to "a". This is despite the fact that the outer a element is not in table scope (meaning that a regular </a> end tag at the start of the table wouldn't close the outer a element). Reconstruct the active formatting elements , if any. Insert an HTML element for the token. Add that element to the list of active formatting elements . A start tag whose tag name is one of: "b", "big", "code", "em", "font", "i", "s", "small", "strike", "strong", "tt", "u" Reconstruct the active formatting elements , if any. Insert an HTML element for the token. Add that element to the list of active formatting elements . A start tag whose tag name is "nobr" Reconstruct the active formatting elements , if any. If the stack of open elements has a nobr element in scope , then this is a parse error ; act as if an end tag with the tag name "nobr" had been seen, then once again reconstruct the active formatting elements , if any. Insert an HTML element for the token. Add that element to the list of active formatting elements . An end tag whose tag name is one of: "a", "b", "big", "code", "em", "font", "i", "nobr", "s", "small", "strike", "strong", "tt", "u" Follow these steps: Let the formatting element be the last element in the list of active formatting elements that: is between the end of the list and the last scope marker in the list, if any, or the start of the list otherwise, and has the same tag name as the token. If there is no such node, or, if that node is also in the stack of open elements but the element is not in scope , then this is a parse error ; ignore the token, and abort these steps. Otherwise, if there is such a node, but that node is not in the stack of open elements , then this is a parse error ; remove the element from the list, and abort these steps. Otherwise, there is a formatting element and that element is in the stack and is in scope . If the element is not the current node , this is a parse error . In any case, proceed with the algorithm as written in the following steps. Let the furthest block be the topmost node in the stack of open elements that is lower in the stack than the formatting element , and is not an element in the phrasing or formatting categories. There might not be one. If there is no furthest block , then the UA must skip the subsequent steps and instead just pop all the nodes from the bottom of the stack of open elements , from the current node up to and including the formatting element , and remove the formatting element from the list of active formatting elements . Let the common ancestor be the element immediately above the formatting element in the stack of open elements . Let a bookmark note the position of the formatting element in the list of active formatting elements relative to the elements on either side of it in the list. Let node and last node be the furthest block . Follow these steps: Let node be the element immediately above node in the stack of open elements . If node is not in the list of active formatting elements , then remove node from the stack of open elements and then go back to step 1. Otherwise, if node is the formatting element , then go to the next step in the overall algorithm. Otherwise, if last node is the furthest block , then move the aforementioned bookmark to be immediately after the node in the list of active formatting elements . Create an element for the token for which the element node was created, replace the entry for node in the list of active formatting elements with an entry for the new element, replace the entry for node in the stack of open elements with an entry for the new element, and let node be the new element. Insert last node into node , first removing it from its previous parent node if any. Let last node be node . Return to step 1 of this inner set of steps. If the common ancestor node is a table , tbody , tfoot , thead , or tr element, then, foster parent whatever last node ended up being in the previous step, first removing it from its previous parent node if any. Otherwise, append whatever last node ended up being in the previous step to the common ancestor node, first removing it from its previous parent node if any. Create an element for the token for which the formatting element was created. Take all of the child nodes of the furthest block and append them to the element created in the last step. Append that new element to the furthest block . Remove the formatting element from the list of active formatting elements , and insert the new element into the list of active formatting elements at the position of the aforementioned bookmark. Remove the formatting element from the stack of open elements , and insert the new element into the stack of open elements immediately below the position of the furthest block in that stack. Jump back to step 1 in this series of steps. Because of the way this algorithm causes elements to change parents, it has been dubbed the "adoption agency algorithm" (in contrast with other possibly algorithms for dealing with misnested content, which included the "incest algorithm", the "secret affair algorithm", and the "Heisenberg algorithm"). A start tag whose tag name is "button" If the stack of open elements has a button element in scope , then this is a parse error ; act as if an end tag with the tag name "button" had been seen, then reprocess the token. Otherwise: Reconstruct the active formatting elements , if any. Insert an HTML element for the token. Insert a marker at the end of the list of active formatting elements . Set the frameset-ok flag to "not ok". A start tag token whose tag name is one of: "applet", "marquee", "object" Reconstruct the active formatting elements , if any. Insert an HTML element for the token. Insert a marker at the end of the list of active formatting elements . Set the frameset-ok flag to "not ok". An end tag token whose tag name is one of: "applet", "button", "marquee", "object" If the stack of open elements does not have an element in scope with the same tag name as that of the token, then this is a parse error ; ignore the token. Otherwise, run these steps: Generate implied end tags . If the current node is not an element with the same tag name as that of the token, then this is a parse error . Pop elements from the stack of open elements until an element with the same tag name as the token has been popped from the stack. Clear the list of active formatting elements up to the last marker . A start tag whose tag name is "table" If the stack of open elements has a p element in scope , then act as if an end tag with the tag name "p" had been seen. Insert an HTML element for the token. Set the frameset-ok flag to "not ok". Switch the insertion mode to " in table ". A start tag whose tag name is one of: "area", "basefont", "bgsound", "br", "embed", "img", "input", "keygen", "spacer", "wbr" Reconstruct the active formatting elements , if any. Insert an HTML element for the token. Immediately pop the current node off the stack of open elements . Acknowledge the token's self-closing flag , if it is set. Set the frameset-ok flag to "not ok". A start tag whose tag name is one of: "param", "source" Insert an HTML element for the token. Immediately pop the current node off the stack of open elements . Acknowledge the token's self-closing flag , if it is set. A start tag whose tag name is "hr" If the stack of open elements has a p element in scope , then act as if an end tag with the tag name "p" had been seen. Insert an HTML element for the token. Immediately pop the current node off the stack of open elements . Acknowledge the token's self-closing flag , if it is set. Set the frameset-ok flag to "not ok". A start tag whose tag name is "image" Parse error . Change the token's tag name to "img" and reprocess it. (Don't ask.) A start tag whose tag name is "isindex" Parse error . If the form element pointer is not null, then ignore the token. Otherwise: Acknowledge the token's self-closing flag , if it is set. Act as if a start tag token with the tag name "form" had been seen. If the token has an attribute called "action", set the action attribute on the resulting form element to the value of the "action" attribute of the token. Act as if a start tag token with the tag name "hr" had been seen. Act as if a start tag token with the tag name "p" had been seen. Act as if a start tag token with the tag name "label" had been seen. Act as if a stream of character tokens had been seen (see below for what they should say). Act as if a start tag token with the tag name "input" had been seen, with all the attributes from the "isindex" token except "name", "action", and "prompt". Set the name attribute of the resulting input element to the value " isindex ". Act as if a stream of character tokens had been seen (see below for what they should say). Act as if an end tag token with the tag name "label" had been seen. Act as if an end tag token with the tag name "p" had been seen. Act as if a start tag token with the tag name "hr" had been seen. Act as if an end tag token with the tag name "form" had been seen. If the token has an attribute with the name "prompt", then the first stream of characters must be the same string as given in that attribute, and the second stream of characters must be empty. Otherwise, the two streams of character tokens together should, together with the input element, express the equivalent of "This is a searchable index. Insert your search keywords here: (input field)" in the user's preferred language. A start tag whose tag name is "textarea" Insert an HTML element for the token. If the next token is a U+000A LINE FEED (LF) character token, then ignore that token and move on to the next one. (Newlines at the start of textarea elements are ignored as an authoring convenience.) Switch the tokenizer's content model flag to the RCDATA state. Let the original insertion mode be the current insertion mode . Set the frameset-ok flag to "not ok". Switch the insertion mode to " in CDATA/RCDATA ". A start tag whose tag name is "xmp" Reconstruct the active formatting elements , if any. Set the frameset-ok flag to "not ok". Follow the generic CDATA element parsing algorithm . A start tag whose tag name is "iframe" Set the frameset-ok flag to "not ok". Follow the generic CDATA element parsing algorithm . A start tag whose tag name is "noembed" A start tag whose tag name is "noscript", if the scripting flag is enabled Follow the generic CDATA element parsing algorithm . A start tag whose tag name is "select" Reconstruct the active formatting elements , if any. Insert an HTML element for the token. Set the frameset-ok flag to "not ok". If the insertion mode is one of in table ", " in caption ", " in column group ", " in table body ", " in row ", or " in cell ", then switch the insertion mode to " in select in table ". Otherwise, switch the insertion mode to " in select ". A start tag whose tag name is one of: "optgroup", "option" If the stack of open elements has an option element in scope , then act as if an end tag with the tag name "option" had been seen. Reconstruct the active formatting elements , if any. Insert an HTML element for the token. A start tag whose tag name is one of: "rp", "rt" If the stack of open elements has a ruby element in scope , then generate implied end tags . If the current node is not then a ruby element, this is a parse error ; pop all the nodes from the current node up to the node immediately before the bottommost ruby element on the stack of open elements . Insert an HTML element for the token. An end tag whose tag name is "br" Parse error . Act as if a start tag token with the tag name "br" had been seen. Ignore the end tag token. A start tag whose tag name is "math" Reconstruct the active formatting elements , if any. Adjust MathML attributes for the token. (This fixes the case of MathML attributes that are not all lowercase.) Adjust foreign attributes for the token. (This fixes the use of namespaced attributes, in particular XLink.) Insert a foreign element for the token, in the MathML namespace . If the token has its self-closing flag set, pop the current node off the stack of open elements and acknowledge the token's self-closing flag . Otherwise, if the insertion mode is not already " in foreign content ", let the secondary insertion mode be the current insertion mode , and then switch the insertion mode to " in foreign content ". A start tag whose tag name is "svg" Reconstruct the active formatting elements , if any. Adjust SVG attributes for the token. (This fixes the case of SVG attributes that are not all lowercase.) Adjust foreign attributes for the token. (This fixes the use of namespaced attributes, in particular XLink in SVG.) Insert a foreign element for the token, in the SVG namespace . If the token has its self-closing flag set, pop the current node off the stack of open elements and acknowledge the token's self-closing flag . Otherwise, if the insertion mode is not already " in foreign content ", let the secondary insertion mode be the current insertion mode , and then switch the insertion mode to " in foreign content ". A start tag whose tag name is one of: "caption", "col", "colgroup", "frame", "head", "tbody", "td", "tfoot", "th", "thead", "tr" Parse error . Ignore the token. Any other start tag Reconstruct the active formatting elements , if any. Insert an HTML element for the token. This element will be a phrasing element. Any other end tag Run the following steps: Initialize node to be the current node (the bottommost node of the stack). If node has the same tag name as the end tag token, then: Generate implied end tags . If the tag name of the end tag token does not match the tag name of the current node , this is a parse error . Pop all the nodes from the current node up to node , including node , then stop these steps. Otherwise, if node is in neither the formatting category nor the phrasing category, then this is a parse error ; ignore the token, and abort these steps. Set node to the previous entry in the stack of open elements . Return to step 2. 9.2.5.11 The " in CDATA/RCDATA " insertion mode When the insertion mode is " in CDATA/RCDATA ", tokens must be handled as follows: A character token Insert the token's character into the current node . An end-of-file token Parse error . If the current node is a script element, mark the script element as "already executed" . Pop the current node off the stack of open elements . Switch the insertion mode to the original insertion mode and reprocess the current token. An end tag whose tag name is "script" Let script be the current node (which will be a script element). Pop the current node off the stack of open elements . Switch the insertion mode to the original insertion mode . Let the old insertion point have the same value as the current insertion point . Let the insertion point be just before the next input character . Increment the parser's script nesting level by one. Run the script . This might cause some script to execute, which might cause new characters to be inserted into the tokenizer , and might cause the tokenizer to output more tokens, resulting in a reentrant invocation of the parser . Decrement the parser's script nesting level by one. If the parser's script nesting level is zero, then set the parser pause flag to false. Let the insertion point have the value of the old insertion point . (In other words, restore the insertion point to its previous value. This value might be the "undefined" value.) At this stage, if there is a pending external script , then: If the tree construction stage is being called reentrantly , say from a call to document.write() : Set the parser pause flag to true, and abort the processing of any nested invocations of the tokenizer, yielding control back to the caller. (Tokenization will resume when the caller returns to the "outer" tree construction stage.) Otherwise: Follow these steps: Let the script be the pending external script . There is no longer a pending external script . Pause until the script has completed loading . Let the insertion point be just before the next input character . Increment the parser's script nesting level by one (it should be zero before this step, so this sets it to one). Execute the script . Decrement the parser's script nesting level by one. If the parser's script nesting level is zero (which it always should be at this point), then set the parser pause flag to false. Let the insertion point be undefined again. If there is once again a pending external script , then repeat these steps from step 1. Any other end tag Pop the current node off the stack of open elements . Switch the insertion mode to the original insertion mode . 9.2.5.12 The " in table " insertion mode When the insertion mode is " in table ", tokens must be handled as follows: A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE If the current table is tainted , then act as described in the "anything else" entry below. Otherwise, insert the character into the current node . A comment token Append a Comment node to the current node with the data attribute set to the data given in the comment token. A DOCTYPE token Parse error . Ignore the token. A start tag whose tag name is "caption" Clear the stack back to a table context . (See below.) Insert a marker at the end of the list of active formatting elements . Insert an HTML element for the token, then switch the insertion mode to " in caption ". A start tag whose tag name is "colgroup" Clear the stack back to a table context . (See below.) Insert an HTML element for the token, then switch the insertion mode to " in column group ". A start tag whose tag name is "col" Act as if a start tag token with the tag name "colgroup" had been seen, then reprocess the current token. A start tag whose tag name is one of: "tbody", "tfoot", "thead" Clear the stack back to a table context . (See below.) Insert an HTML element for the token, then switch the insertion mode to " in table body ". A start tag whose tag name is one of: "td", "th", "tr" Act as if a start tag token with the tag name "tbody" had been seen, then reprocess the current token. A start tag whose tag name is "table" Parse error . Act as if an end tag token with the tag name "table" had been seen, then, if that token wasn't ignored, reprocess the current token. The fake end tag token here can only be ignored in the fragment case . An end tag whose tag name is "table" If the stack of open elements does not have an element in table scope with the same tag name as the token, this is a parse error . Ignore the token. ( fragment case ) Otherwise: Pop elements from this stack until a table element has been popped from the stack. Reset the insertion mode appropriately . An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html", "tbody", "td", "tfoot", "th", "thead", "tr" Parse error . Ignore the token. A start tag whose tag name is one of: "style", "script" Process the token using the rules for the " in head " insertion mode . A start tag whose tag name is "input" If the token does not have an attribute with the name "type", or if it does, but that attribute's value is not an ASCII case-insensitive match for the string " hidden ", then: act as described in the "anything else" entry below. Otherwise: Parse error . Insert an HTML element for the token. Pop that input element off the stack of open elements . An end-of-file token If the current node is not the root html element, then this is a parse error . It can only be the current node in the fragment case . Stop parsing . Anything else Parse error . Process the token using the rules for the " in body " insertion mode , except that if the current node is a table , tbody , tfoot , thead , or tr element, then, whenever a node would be inserted into the current node , it must instead be foster parented . When the steps above require the UA to clear the stack back to a table context , it means that the UA must, while the current node is not a table element or an html element, pop elements from the stack of open elements . The current node being an html element after this process is a fragment case . 9.2.5.13 The " in caption " insertion mode When the insertion mode is " in caption ", tokens must be handled as follows: An end tag whose tag name is "caption" If the stack of open elements does not have an element in table scope with the same tag name as the token, this is a parse error . Ignore the token. ( fragment case ) Otherwise: Generate implied end tags . Now, if the current node is not a caption element, then this is a parse error . Pop elements from this stack until a caption element has been popped from the stack. Clear the list of active formatting elements up to the last marker . Switch the insertion mode to " in table ". A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "td", "tfoot", "th", "thead", "tr" An end tag whose tag name is "table" Parse error . Act as if an end tag with the tag name "caption" had been seen, then, if that token wasn't ignored, reprocess the current token. The fake end tag token here can only be ignored in the fragment case . An end tag whose tag name is one of: "body", "col", "colgroup", "html", "tbody", "td", "tfoot", "th", "thead", "tr" Parse error . Ignore the token. Anything else Process the token using the rules for the " in body " insertion mode . 9.2.5.14 The " in column group " insertion mode When the insertion mode is " in column group ", tokens must be handled as follows: A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE Insert the character into the current node . A comment token Append a Comment node to the current node with the data attribute set to the data given in the comment token. A DOCTYPE token Parse error . Ignore the token. A start tag whose tag name is "html" Process the token using the rules for the " in body " insertion mode . A start tag whose tag name is "col" Insert an HTML element for the token. Immediately pop the current node off the stack of open elements . Acknowledge the token's self-closing flag , if it is set. An end tag whose tag name is "colgroup" If the current node is the root html element, then this is a parse error ; ignore the token. ( fragment case ) Otherwise, pop the current node (which will be a colgroup element) from the stack of open elements . Switch the insertion mode to " in table ". An end tag whose tag name is "col" Parse error . Ignore the token. An end-of-file token If the current node is the root html element, then stop parsing . ( fragment case ) Otherwise, act as described in the "anything else" entry below. Anything else Act as if an end tag with the tag name "colgroup" had been seen, and then, if that token wasn't ignored, reprocess the current token. The fake end tag token here can only be ignored in the fragment case . 9.2.5.15 The " in table body " insertion mode When the insertion mode is " in table body ", tokens must be handled as follows: A start tag whose tag name is "tr" Clear the stack back to a table body context . (See below.) Insert an HTML element for the token, then switch the insertion mode to " in row ". A start tag whose tag name is one of: "th", "td" Parse error . Act as if a start tag with the tag name "tr" had been seen, then reprocess the current token. An end tag whose tag name is one of: "tbody", "tfoot", "thead" If the stack of open elements does not have an element in table scope with the same tag name as the token, this is a parse error . Ignore the token. Otherwise: Clear the stack back to a table body context . (See below.) Pop the current node from the stack of open elements . Switch the insertion mode to " in table ". A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "tfoot", "thead" An end tag whose tag name is "table" If the stack of open elements does not have a tbody , thead , or tfoot element in table scope , this is a parse error . Ignore the token. ( fragment case ) Otherwise: Clear the stack back to a table body context . (See below.) Act as if an end tag with the same tag name as the current node ("tbody", "tfoot", or "thead") had been seen, then reprocess the current token. An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html", "td", "th", "tr" Parse error . Ignore the token. Anything else Process the token using the rules for the " in table " insertion mode . When the steps above require the UA to clear the stack back to a table body context , it means that the UA must, while the current node is not a tbody , tfoot , thead , or html element, pop elements from the stack of open elements . The current node being an html element after this process is a fragment case . 9.2.5.16 The " in row " insertion mode When the insertion mode is " in row ", tokens must be handled as follows: A start tag whose tag name is one of: "th", "td" Clear the stack back to a table row context . (See below.) Insert an HTML element for the token, then switch the insertion mode to " in cell ". Insert a marker at the end of the list of active formatting elements . An end tag whose tag name is "tr" If the stack of open elements does not have an element in table scope with the same tag name as the token, this is a parse error . Ignore the token. ( fragment case ) Otherwise: Clear the stack back to a table row context . (See below.) Pop the current node (which will be a tr element) from the stack of open elements . Switch the insertion mode to " in table body ". A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "tfoot", "thead", "tr" An end tag whose tag name is "table" Act as if an end tag with the tag name "tr" had been seen, then, if that token wasn't ignored, reprocess the current token. The fake end tag token here can only be ignored in the fragment case . An end tag whose tag name is one of: "tbody", "tfoot", "thead" If the stack of open elements does not have an element in table scope with the same tag name as the token, this is a parse error . Ignore the token. Otherwise, act as if an end tag with the tag name "tr" had been seen, then reprocess the current token. An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html", "td", "th" Parse error . Ignore the token. Anything else Process the token using the rules for the " in table " insertion mode . When the steps above require the UA to clear the stack back to a table row context , it means that the UA must, while the current node is not a tr element or an html element, pop elements from the stack of open elements . The current node being an html element after this process is a fragment case . 9.2.5.17 The " in cell " insertion mode When the insertion mode is " in cell ", tokens must be handled as follows: An end tag whose tag name is one of: "td", "th" If the stack of open elements does not have an element in table scope with the same tag name as that of the token, then this is a parse error and the token must be ignored. Otherwise: Generate implied end tags . Now, if the current node is not an element with the same tag name as the token, then this is a parse error . Pop elements from this stack until an element with the same tag name as the token has been popped from the stack. Clear the list of active formatting elements up to the last marker . Switch the insertion mode to " in row ". (The current node will be a tr element at this point.) A start tag whose tag name is one of: "caption", "col", "colgroup", "tbody", "td", "tfoot", "th", "thead", "tr" If the stack of open elements does not have a td or th element in table scope , then this is a parse error ; ignore the token. ( fragment case ) Otherwise, close the cell (see below) and reprocess the current token. An end tag whose tag name is one of: "body", "caption", "col", "colgroup", "html" Parse error . Ignore the token. An end tag whose tag name is one of: "table", "tbody", "tfoot", "thead", "tr" If the stack of open elements does not have an element in table scope with the same tag name as that of the token (which can only happen for "tbody", "tfoot" and "thead", or, in the fragment case ), then this is a parse error and the token must be ignored. Otherwise, close the cell (see below) and reprocess the current token. Anything else Process the token using the rules for the " in body " insertion mode . Where the steps above say to close the cell , they mean to run the following algorithm: If the stack of open elements has a td element in table scope , then act as if an end tag token with the tag name "td" had been seen. Otherwise, the stack of open elements will have a th element in table scope ; act as if an end tag token with the tag name "th" had been seen. The stack of open elements cannot have both a td and a th element in table scope at the same time, nor can it have neither when the insertion mode is " in cell ". 9.2.5.18 The " in select " insertion mode When the insertion mode is " in select ", tokens must be handled as follows: A character token Insert the token's character into the current node . A comment token Append a Comment node to the current node with the data attribute set to the data given in the comment token. A DOCTYPE token Parse error . Ignore the token. A start tag whose tag name is "html" Process the token using the rules for the " in body " insertion mode . A start tag whose tag name is "option" If the current node is an option element, act as if an end tag with the tag name "option" had been seen. Insert an HTML element for the token. A start tag whose tag name is "optgroup" If the current node is an option element, act as if an end tag with the tag name "option" had been seen. If the current node is an optgroup element, act as if an end tag with the tag name "optgroup" had been seen. Insert an HTML element for the token. An end tag whose tag name is "optgroup" First, if the current node is an option element, and the node immediately before it in the stack of open elements is an optgroup element, then act as if an end tag with the tag name "option" had been seen. If the current node is an optgroup element, then pop that node from the stack of open elements . Otherwise, this is a parse error ; ignore the token. An end tag whose tag name is "option" If the current node is an option element, then pop that node from the stack of open elements . Otherwise, this is a parse error ; ignore the token. An end tag whose tag name is "select" If the stack of open elements does not have an element in table scope with the same tag name as the token, this is a parse error . Ignore the token. ( fragment case ) Otherwise: Pop elements from the stack of open elements until a select element has been popped from the stack. Reset the insertion mode appropriately . A start tag whose tag name is "select" Parse error . Act as if the token had been an end tag with the tag name "select" instead. A start tag whose tag name is one of: "input", "textarea" Parse error . Act as if an end tag with the tag name "select" had been seen, and reprocess the token. A start tag token whose tag name is "script" Process the token using the rules for the " in head " insertion mode . An end-of-file token If the current node is not the root html element, then this is a parse error . It can only be the current node in the fragment case . Stop parsing . Anything else Parse error . Ignore the token. 9.2.5.19 The " in select in table " insertion mode When the insertion mode is " in select in table ", tokens must be handled as follows: A start tag whose tag name is one of: "caption", "table", "tbody", "tfoot", "thead", "tr", "td", "th" Parse error . Act as if an end tag with the tag name "select" had been seen, and reprocess the token. An end tag whose tag name is one of: "caption", "table", "tbody", "tfoot", "thead", "tr", "td", "th" Parse error . If the stack of open elements has an element in table scope with the same tag name as that of the token, then act as if an end tag with the tag name "select" had been seen, and reprocess the token. Otherwise, ignore the token. Anything else Process the token using the rules for the " in select " insertion mode . 9.2.5.20 The " in foreign content " insertion mode When the insertion mode is " in foreign content ", tokens must be handled as follows: A character token Insert the token's character into the current node . If the token is not one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE, then set the frameset-ok flag to "not ok". A comment token Append a Comment node to the current node with the data attribute set to the data given in the comment token. A DOCTYPE token Parse error . Ignore the token. An end tag whose tag name is "script", if the current node is a script element in the SVG namespace . Pop the current node off the stack of open elements . Let the old insertion point have the same value as the current insertion point . Let the insertion point be just before the next input character . Increment the parser's script nesting level by one. Set the parser pause flag to true. Process the script element according to the SVG rules. [SVG] Even if this causes new characters to be inserted into the tokenizer , the parser will not be executed reentrantly, since the parser pause flag is true. Decrement the parser's script nesting level by one. If the parser's script nesting level is zero, then set the parser pause flag to false. Let the insertion point have the value of the old insertion point . (In other words, restore the insertion point to its previous value. This value might be the "undefined" value.) A start tag whose tag name is neither "mglyph" nor "malignmark", if the current node is an mi element in the MathML namespace . A start tag whose tag name is neither "mglyph" nor "malignmark", if the current node is an mo element in the MathML namespace . A start tag whose tag name is neither "mglyph" nor "malignmark", if the current node is an mn element in the MathML namespace . A start tag whose tag name is neither "mglyph" nor "malignmark", if the current node is an ms element in the MathML namespace . A start tag whose tag name is neither "mglyph" nor "malignmark", if the current node is an mtext element in the MathML namespace . A start tag whose tag name is "svg", if the current node is an annotation-xml element in the MathML namespace . A start tag, if the current node is a foreignObject element in the SVG namespace . A start tag, if the current node is a desc element in the SVG namespace . A start tag, if the current node is a title element in the SVG namespace . A start tag, if the current node is an element in the HTML namespace . An end tag Process the token using the rules for the secondary insertion mode . If, after doing so, the insertion mode is still " in foreign content ", but there is no element in scope that has a namespace other than the HTML namespace , switch the insertion mode to the secondary insertion mode . A start tag whose tag name is one of: "b", "big", "blockquote", "body", "br", "center", "code", "dd", "div", "dl", "dt" , "em", "embed", "h1", "h2", "h3", "h4", "h5", "h6" , "hr", "i", "img", "li", "listing" , "menu", "meta", "nobr", "ol" "small", "span", "strong", "strike" "sub", "sup", "table", "tt", "u", "ul", "var" A start tag whose tag name is "font", if the token has any attributes named "color", "face", or "size" An end-of-file token Parse error . Pop elements from the stack of open elements until the current node is in the HTML namespace . Switch the insertion mode to the secondary insertion mode , and reprocess the token. Any other start tag If the current node is an element in the MathML namespace , adjust MathML attributes for the token. (This fixes the case of MathML attributes that are not all lowercase.) If the current node is an element in the SVG namespace , and the token's tag name is one of the ones in the first column of the following table, change the tag name to the name given in the corresponding cell in the second column. (This fixes the case of SVG elements that are not all lowercase.) Tag name Element name altglyph altGlyph altglyphdef altGlyphDef altglyphitem altGlyphItem animatecolor animateColor animatemotion animateMotion animatetransform animateTransform clippath clipPath feblend feBlend fecolormatrix feColorMatrix fecomponenttransfer feComponentTransfer fecomposite feComposite feconvolvematrix feConvolveMatrix fediffuselighting feDiffuseLighting fedisplacementmap feDisplacementMap fedistantlight feDistantLight feflood feFlood fefunca feFuncA fefuncb feFuncB fefuncg feFuncG fefuncr feFuncR fegaussianblur feGaussianBlur feimage feImage femerge feMerge femergenode feMergeNode femorphology feMorphology feoffset feOffset fepointlight fePointLight fespecularlighting feSpecularLighting fespotlight feSpotLight fetile feTile feturbulence feTurbulence foreignobject foreignObject glyphref glyphRef lineargradient linearGradient radialgradient radialGradient textpath textPath If the current node is an element in the SVG namespace , adjust SVG attributes for the token. (This fixes the case of SVG attributes that are not all lowercase.) Adjust foreign attributes for the token. (This fixes the use of namespaced attributes, in particular XLink in SVG.) Insert a foreign element for the token, in the same namespace as the current node . If the token has its self-closing flag set, pop the current node off the stack of open elements and acknowledge the token's self-closing flag . 9.2.5.21 The " after body " insertion mode When the insertion mode is " after body ", tokens must be handled as follows: A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE Process the token using the rules for the " in body " insertion mode . A comment token Append a Comment node to the first element in the stack of open elements (the html element), with the data attribute set to the data given in the comment token. A DOCTYPE token Parse error . Ignore the token. A start tag whose tag name is "html" Process the token using the rules for the " in body " insertion mode . An end tag whose tag name is "html" If the parser was originally created as part of the HTML fragment parsing algorithm , this is a parse error ; ignore the token. ( fragment case ) Otherwise, switch the insertion mode to " after after body ". An end-of-file token Stop parsing . Anything else Parse error . Switch the insertion mode to " in body " and reprocess the token. 9.2.5.22 The " in frameset " insertion mode When the insertion mode is " in frameset ", tokens must be handled as follows: A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE Insert the character into the current node . A comment token Append a Comment node to the current node with the data attribute set to the data given in the comment token. A DOCTYPE token Parse error . Ignore the token. A start tag whose tag name is "html" Process the token using the rules for the " in body " insertion mode . A start tag whose tag name is "frameset" Insert an HTML element for the token. An end tag whose tag name is "frameset" If the current node is the root html element, then this is a parse error ; ignore the token. ( fragment case ) Otherwise, pop the current node from the stack of open elements . If the parser was not originally created as part of the HTML fragment parsing algorithm ( fragment case ), and the current node is no longer a frameset element, then switch the insertion mode to " after frameset ". A start tag whose tag name is "frame" Insert an HTML element for the token. Immediately pop the current node off the stack of open elements . Acknowledge the token's self-closing flag , if it is set. A start tag whose tag name is "noframes" Process the token using the rules for the " in head " insertion mode . An end-of-file token If the current node is not the root html element, then this is a parse error . It can only be the current node in the fragment case . Stop parsing . Anything else Parse error . Ignore the token. 9.2.5.23 The " after frameset " insertion mode When the insertion mode is " after frameset ", tokens must be handled as follows: A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE Insert the character into the current node . A comment token Append a Comment node to the current node with the data attribute set to the data given in the comment token. A DOCTYPE token Parse error . Ignore the token. A start tag whose tag name is "html" Process the token using the rules for the " in body " insertion mode . An end tag whose tag name is "html" Switch the insertion mode to " after after frameset ". A start tag whose tag name is "noframes" Process the token using the rules for the " in head " insertion mode . An end-of-file token Stop parsing . Anything else Parse error . Ignore the token. This doesn't handle UAs that don't support frames, or that do support frames but want to show the NOFRAMES content. Supporting the former is easy; supporting the latter is harder. 9.2.5.24 The " after after body " insertion mode When the insertion mode is " after after body ", tokens must be handled as follows: A comment token Append a Comment node to the Document object with the data attribute set to the data given in the comment token. A DOCTYPE token A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE A start tag whose tag name is "html" Process the token using the rules for the " in body " insertion mode . An end-of-file token Stop parsing . Anything else Parse error . Switch the insertion mode to " in body " and reprocess the token. 9.2.5.25 The " after after frameset " insertion mode When the insertion mode is " after after frameset ", tokens must be handled as follows: A comment token Append a Comment node to the Document object with the data attribute set to the data given in the comment token. A DOCTYPE token A character token that is one of U+0009 CHARACTER TABULATION, U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE A start tag whose tag name is "html" Process the token using the rules for the " in body " insertion mode . An end-of-file token Stop parsing . A start tag whose tag name is "noframes" Process the token using the rules for the " in head " insertion mode . Anything else Parse error . Ignore the token. 9.2.6 The end Once the user agent stops parsing the document, the user agent must follow the steps in this section. First, the current document readiness must be set to "interactive". Then, the rules for when a script completes loading start applying (script execution is no longer managed by the parser). If any of the scripts in the list of scripts that will execute as soon as possible have completed loading , or if the list of scripts that will execute asynchronously is not empty and the first script in that list has completed loading , then the user agent must act as if those scripts just completed loading, following the rules given for that in the script element definition. Then, if the list of scripts that will execute when the document has finished parsing is not empty, and the first item in this list has already completed loading , then the user agent must act as if that script just finished loading. By this point, there will be no scripts that have loaded but have not yet been executed. The user agent must then fire a simple event called DOMContentLoaded at the Document . Once everything that delays the load event of the document has completed, the user agent must run the following steps: Queue a task to set the current document readiness to "complete". If the Document is in a browsing context , then queue a task to fire a simple event called load at the Document 's Window object, but with its target set to the Document object (and the currentTarget set to the Window object). If the Document has a pending state object , then queue a task to fire a popstate event in no namespace on the Document 's Window object using the PopStateEvent interface, with the state attribute set to the current value of the pending state object . This event must bubble but not be cancelable and has no default action. The task source for these tasks is the DOM manipulation task source . delaying the load event for things like image loads allows for intranet port scans (even without javascript!). Should we really encode that into the spec? 9.2.7 Coercing an HTML DOM into an infoset When an application uses an HTML parser in conjunction with an XML pipeline, it is possible that the constructed DOM is not compatible with the XML tool chain in certain subtle ways. For example, an XML toolchain might not be able to represent attributes with the name xmlns , since they conflict with the Namespaces in XML syntax. There is also some data that the HTML parser generates that isn't included in the DOM itself. This section specifies some rules for handling these issues. If the XML API being used doesn't support DOCTYPEs, the tool may drop DOCTYPEs altogether. If the XML API doesn't support attributes in no namespace that are named " xmlns ", attributes whose names start with " xmlns: ", or attributes in the XMLNS namespace , then the tool may drop such attributes. The tool may annotate the output with any namespace declarations required for proper operation. If the XML API being used restricts the allowable characters in the local names of elements and attributes, then the tool may map all element and attribute local names that the API wouldn't support to a set of names that are allowed, by replacing any character that isn't supported with the uppercase letter U and the six digits of the character's Unicode code point when expressed in hexadecimal, using digits 0-9 and capital letters A-F as the symbols, in increasing numeric order. For example, the element name foo<bar , which can be output by the HTML parser , though it is neither a legal HTML element name nor a well-formed XML element name, would be converted into fooU00003Cbar , which is a well-formed XML element name (though it's still not legal in HTML by any means). As another example, consider the attribute xlink:href . Used on a MathML element, it becomes, after being adjusted , an attribute with a prefix " xlink " and a local name " href ". However, used on an HTML element, it becomes an attribute with no prefix and the local name " xlink:href ", which is not a valid NCName, and thus might not be accepted by an XML API. It could thus get converted, becoming " xlinkU00003Ahref ". The resulting names from this conversion conveniently can't clash with any attribute generated by the HTML parser , since those are all either lowercase or those listed in the adjust foreign attributes algorithm's table. If the XML API restricts comments from having two consecutive U+002D HYPHEN-MINUS characters (--), the tool may insert a single U+0020 SPACE character between any such offending characters. If the XML API restricts comments from ending in a U+002D HYPHEN-MINUS character (-), the tool may insert a single U+0020 SPACE character at the end of such comments. If the XML API restricts allowed characters in character data, the tool may replace any U+000C FORM FEED (FF) character with a U+0020 SPACE character, and any other literal non-XML character with a U+FFFD REPLACEMENT CHARACTER. If the tool has no way to convey out-of-band information, then the tool may drop the following information: Whether the document is set to no quirks mode , limited quirks mode , or quirks mode The association between form controls and forms that aren't their nearest form element ancestor (use of the form element pointer in the parser) The mutations allowed by this section apply after the HTML parser 's rules have been applied. For example, a <a::> start tag will be closed by a </a::> end tag, and never by a </aU00003AU00003A> end tag, even if the user agent is using the rules above to then generate an actual element in the DOM with the name aU00003AU00003A for that start tag. 9.3 Namespaces The HTML namespace is: http://www.w3.org/1999/xhtml The MathML namespace is: http://www.w3.org/1998/Math/MathML The SVG namespace is: http://www.w3.org/2000/svg The XLink namespace is: http://www.w3.org/1999/xlink The XML namespace is: http://www.w3.org/XML/1998/namespace The XMLNS namespace is: http://www.w3.org/2000/xmlns/ Data mining tools and other user agents that perform operations on text/html content without running scripts, evaluating CSS or XPath expressions, or otherwise exposing the resulting DOM to arbitrary content, may "support namespaces" by just asserting that their DOM node analogues are in certain namespaces, without actually exposing the above strings. 9.4 Serializing HTML fragments The following steps form the HTML fragment serialization algorithm . The algorithm takes as input a DOM Element or Document , referred to as the node , and either returns a string or raises an exception. This algorithm serializes the children of the node being serialized, not the node itself. Let s be a string, and initialize it to the empty string. For each child node of the node , in tree order , run the following steps: Let current node be the child node being processed. Append the appropriate string from the following list to s : If current node is an Element Append a U+003C LESS-THAN SIGN ( < ) character, followed by the element's tag name. (For nodes created by the HTML parser or Document.createElement() , the tag name will be lowercase.) For each attribute that the element has, append a U+0020 SPACE character, the attribute's name (which, for attributes set by the HTML parser or by Element.setAttributeNode() or Element.setAttribute() , will be lowercase), a U+003D EQUALS SIGN ( = ) character, a U+0022 QUOTATION MARK ( " ) character, the attribute's value, escaped as described below in attribute mode , and a second U+0022 QUOTATION MARK ( " ) character. While the exact order of attributes is UA-defined, and may depend on factors such as the order that the attributes were given in the original markup, the sort order must be stable, such that consecutive invocations of this algorithm serialize an element's attributes in the same order. Append a U+003E GREATER-THAN SIGN ( > ) character. If current node is an area , base , basefont , bgsound , br , col , embed , frame , hr , img , input , keygen , link , meta , param , spacer , or wbr element, then continue on to the next child node at this point. If current node is a pre , textarea , or listing element, append a U+000A LINE FEED (LF) character. Append the value of running the HTML fragment serialization algorithm on the current node element (thus recursing into this algorithm for that element), followed by a U+003C LESS-THAN SIGN ( < ) character, a U+002F SOLIDUS ( / ) character, the element's tag name again, and finally a U+003E GREATER-THAN SIGN ( > ) character. If current node is a Text or CDATASection node If one of the ancestors of current node is a style , script , xmp , iframe , noembed , noframes , noscript , or plaintext element, then append the value of current node 's data DOM attribute literally. Otherwise, append the value of current node 's data DOM attribute, escaped as described below . If current node is a Comment Append the literal string <!-- (U+003C LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS), followed by the value of current node 's data DOM attribute, followed by the literal string --> (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN). If current node is a ProcessingInstruction Append the literal string <? (U+003C LESS-THAN SIGN, U+003F QUESTION MARK), followed by the value of current node 's target DOM attribute, followed by a single U+0020 SPACE character, followed by the value of current node 's data DOM attribute, followed by a single U+003E GREATER-THAN SIGN character ('>'). If current node is a DocumentType Append the literal string <!DOCTYPE (U+003C LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+0044 LATIN CAPITAL LETTER D, U+004F LATIN CAPITAL LETTER O, U+0043 LATIN CAPITAL LETTER C, U+0054 LATIN CAPITAL LETTER T, U+0059 LATIN CAPITAL LETTER Y, U+0050 LATIN CAPITAL LETTER P, U+0045 LATIN CAPITAL LETTER E), followed by a space (U+0020 SPACE), followed by the value of current node 's name DOM attribute, followed by the literal string > (U+003E GREATER-THAN SIGN). Other node types (e.g. Attr ) cannot occur as children of elements. If, despite this, they somehow do occur, this algorithm must raise an INVALID_STATE_ERR exception. The result of the algorithm is the string s . Escaping a string (for the purposes of the algorithm above) consists of replacing any occurrences of the " & " character by the string " &amp; ", any occurrences of the U+00A0 NO-BREAK SPACE character by the string " &nbsp; ", and, if the algorithm was invoked in the attribute mode , any occurrences of the " " " character by the string " &quot; ", or if it was not, any occurrences of the " < " character by the string " &lt; ", any occurrences of the " > " character by the string " &gt; ". Entity reference nodes are assumed to be expanded by the user agent, and are therefore not covered in the algorithm above. It is possible that the output of this algorithm, if parsed with an HTML parser , will not return the original tree structure. For instance, if a textarea element to which a Comment node has been appended is serialized and the output is then reparsed, the comment will end up being displayed in the text field. Similarly, if, as a result of DOM manipulation, an element contains a comment that contains the literal string " --> ", then when the result of serializing the element is parsed, the comment will be truncated at that point and the rest of the comment will be interpreted as markup. More examples would be making a script element contain a text node with the text string " </script> ", or having a p element that contains a ul element (as the ul element's start tag would imply the end tag for the p ). 9.5 Parsing HTML fragments The following steps form the HTML fragment parsing algorithm . The algorithm optionally takes as input an Element node, referred to as the context element, which gives the context for the parser, as well as input , a string to parse, and returns a list of zero or more nodes. Parts marked fragment case in algorithms in the parser section are parts that only occur if the parser was created for the purposes of this algorithm (and with a context element). The algorithms have been annotated with such markings for informational purposes only; such markings have no normative weight. If it is possible for a condition described as a fragment case to occur even when the parser wasn't created for the purposes of handling this algorithm, then that is an error in the specification. Create a new Document node, and mark it as being an HTML document . If there is a context element, and the Document of the context element is in quirks mode , then let the Document be in quirks mode . Otherwise, if there is a context element, and the Document of the context element is in limited quirks mode , then let the Document be in limited quirks mode . Otherwise, leave the Document in no quirks mode . Create a new HTML parser , and associate it with the just created Document node. If there is a context element, run these substeps: Set the HTML parser 's tokenization stage's content model flag according to the context element, as follows: If it is a title or textarea element Set the content model flag to the RCDATA state. If it is a style , script , xmp , iframe , noembed , or noframes element Set the content model flag to the CDATA state. If it is a noscript element If the scripting flag is enabled, set the content model flag to the CDATA state. Otherwise, set the content model flag to the PCDATA state. If it is a plaintext element Set the content model flag to PLAINTEXT. Otherwise Leave the content model flag in the PCDATA state. Let root be a new html element with no attributes. Append the element root to the Document node created above. Set up the parser's stack of open elements so that it contains just the single element root . Reset the parser's insertion mode appropriately . The parser will reference the context element as part of that algorithm. Set the parser's form element pointer to the nearest node to the context element that is a form element (going straight up the ancestor chain, and including the element itself, if it is a form element), or, if there is no such form element, to null. Place into the input stream for the HTML parser just created the input . The encoding confidence is irrelevant . Start the parser and let it run until it has consumed all the characters just inserted into the input stream. If there is a context element, return the child nodes of root , in tree order . Otherwise, return the children of the Document object, in tree order . 9.6 Named character references This table lists the character reference names that are supported by HTML, and the code points to which they refer. It is referenced by the previous sections. Name Character AElig; U+000C6 AElig U+000C6 AMP; U+00026 AMP U+00026 Aacute; U+000C1 Aacute U+000C1 Abreve; U+00102 Acirc; U+000C2 Acirc U+000C2 Acy; U+00410 Afr; U+1D504 Agrave; U+000C0 Agrave U+000C0 Alpha; U+00391 Amacr; U+00100 And; U+02A53 Aogon; U+00104 Aopf; U+1D538 ApplyFunction; U+02061 Aring; U+000C5 Aring U+000C5 Ascr; U+1D49C Assign; U+02254 Atilde; U+000C3 Atilde U+000C3 Auml; U+000C4 Auml U+000C4 Backslash; U+02216 Barv; U+02AE7 Barwed; U+02306 Bcy; U+00411 Because; U+02235 Bernoullis; U+0212C Beta; U+00392 Bfr; U+1D505 Bopf; U+1D539 Breve; U+002D8 Bscr; U+0212C Bumpeq; U+0224E CHcy; U+00427 COPY; U+000A9 COPY U+000A9 Cacute; U+00106 Cap; U+022D2 CapitalDifferentialD; U+02145 Cayleys; U+0212D Ccaron; U+0010C Ccedil; U+000C7 Ccedil U+000C7 Ccirc; U+00108 Cconint; U+02230 Cdot; U+0010A Cedilla; U+000B8 CenterDot; U+000B7 Cfr; U+0212D Chi; U+003A7 CircleDot; U+02299 CircleMinus; U+02296 CirclePlus; U+02295 CircleTimes; U+02297 ClockwiseContourIntegral; U+02232 CloseCurlyDoubleQuote; U+0201D CloseCurlyQuote; U+02019 Colon; U+02237 Colone; U+02A74 Congruent; U+02261 Conint; U+0222F ContourIntegral; U+0222E Copf; U+02102 Coproduct; U+02210 CounterClockwiseContourIntegral; U+02233 Cross; U+02A2F Cscr; U+1D49E Cup; U+022D3 CupCap; U+0224D DD; U+02145 DDotrahd; U+02911 DJcy; U+00402 DScy; U+00405 DZcy; U+0040F Dagger; U+02021 Darr; U+021A1 Dashv; U+02AE4 Dcaron; U+0010E Dcy; U+00414 Del; U+02207 Delta; U+00394 Dfr; U+1D507 DiacriticalAcute; U+000B4 DiacriticalDot; U+002D9 DiacriticalDoubleAcute; U+002DD DiacriticalGrave; U+00060 DiacriticalTilde; U+002DC Diamond; U+022C4 DifferentialD; U+02146 Dopf; U+1D53B Dot; U+000A8 DotDot; U+020DC DotEqual; U+02250 DoubleContourIntegral; U+0222F DoubleDot; U+000A8 DoubleDownArrow; U+021D3 DoubleLeftArrow; U+021D0 DoubleLeftRightArrow; U+021D4 DoubleLeftTee; U+02AE4 DoubleLongLeftArrow; U+027F8 DoubleLongLeftRightArrow; U+027FA DoubleLongRightArrow; U+027F9 DoubleRightArrow; U+021D2 DoubleRightTee; U+022A8 DoubleUpArrow; U+021D1 DoubleUpDownArrow; U+021D5 DoubleVerticalBar; U+02225 DownArrow; U+02193 DownArrowBar; U+02913 DownArrowUpArrow; U+021F5 DownBreve; U+00311 DownLeftRightVector; U+02950 DownLeftTeeVector; U+0295E DownLeftVector; U+021BD DownLeftVectorBar; U+02956 DownRightTeeVector; U+0295F DownRightVector; U+021C1 DownRightVectorBar; U+02957 DownTee; U+022A4 DownTeeArrow; U+021A7 Downarrow; U+021D3 Dscr; U+1D49F Dstrok; U+00110 ENG; U+0014A ETH; U+000D0 ETH U+000D0 Eacute; U+000C9 Eacute U+000C9 Ecaron; U+0011A Ecirc; U+000CA Ecirc U+000CA Ecy; U+0042D Edot; U+00116 Efr; U+1D508 Egrave; U+000C8 Egrave U+000C8 Element; U+02208 Emacr; U+00112 EmptySmallSquare; U+025FB EmptyVerySmallSquare; U+025AB Eogon; U+00118 Eopf; U+1D53C Epsilon; U+00395 Equal; U+02A75 EqualTilde; U+02242 Equilibrium; U+021CC Escr; U+02130 Esim; U+02A73 Eta; U+00397 Euml; U+000CB Euml U+000CB Exists; U+02203 ExponentialE; U+02147 Fcy; U+00424 Ffr; U+1D509 FilledSmallSquare; U+025FC FilledVerySmallSquare; U+025AA Fopf; U+1D53D ForAll; U+02200 Fouriertrf; U+02131 Fscr; U+02131 GJcy; U+00403 GT; U+0003E GT U+0003E Gamma; U+00393 Gammad; U+003DC Gbreve; U+0011E Gcedil; U+00122 Gcirc; U+0011C Gcy; U+00413 Gdot; U+00120 Gfr; U+1D50A Gg; U+022D9 Gopf; U+1D53E GreaterEqual; U+02265 GreaterEqualLess; U+022DB GreaterFullEqual; U+02267 GreaterGreater; U+02AA2 GreaterLess; U+02277 GreaterSlantEqual; U+02A7E GreaterTilde; U+02273 Gscr; U+1D4A2 Gt; U+0226B HARDcy; U+0042A Hacek; U+002C7 Hat; U+0005E Hcirc; U+00124 Hfr; U+0210C HilbertSpace; U+0210B Hopf; U+0210D HorizontalLine; U+02500 Hscr; U+0210B Hstrok; U+00126 HumpDownHump; U+0224E HumpEqual; U+0224F IEcy; U+00415 IJlig; U+00132 IOcy; U+00401 Iacute; U+000CD Iacute U+000CD Icirc; U+000CE Icirc U+000CE Icy; U+00418 Idot; U+00130 Ifr; U+02111 Igrave; U+000CC Igrave U+000CC Im; U+02111 Imacr; U+0012A ImaginaryI; U+02148 Implies; U+021D2 Int; U+0222C Integral; U+0222B Intersection; U+022C2 InvisibleComma; U+02063 InvisibleTimes; U+02062 Iogon; U+0012E Iopf; U+1D540 Iota; U+00399 Iscr; U+02110 Itilde; U+00128 Iukcy; U+00406 Iuml; U+000CF Iuml U+000CF Jcirc; U+00134 Jcy; U+00419 Jfr; U+1D50D Jopf; U+1D541 Jscr; U+1D4A5 Jsercy; U+00408 Jukcy; U+00404 KHcy; U+00425 KJcy; U+0040C Kappa; U+0039A Kcedil; U+00136 Kcy; U+0041A Kfr; U+1D50E Kopf; U+1D542 Kscr; U+1D4A6 LJcy; U+00409 LT; U+0003C LT U+0003C Lacute; U+00139 Lambda; U+0039B Lang; U+027EA Laplacetrf; U+02112 Larr; U+0219E Lcaron; U+0013D Lcedil; U+0013B Lcy; U+0041B LeftAngleBracket; U+027E8 LeftArrow; U+02190 LeftArrowBar; U+021E4 LeftArrowRightArrow; U+021C6 LeftCeiling; U+02308 LeftDoubleBracket; U+027E6 LeftDownTeeVector; U+02961 LeftDownVector; U+021C3 LeftDownVectorBar; U+02959 LeftFloor; U+0230A LeftRightArrow; U+02194 LeftRightVector; U+0294E LeftTee; U+022A3 LeftTeeArrow; U+021A4 LeftTeeVector; U+0295A LeftTriangle; U+022B2 LeftTriangleBar; U+029CF LeftTriangleEqual; U+022B4 LeftUpDownVector; U+02951 LeftUpTeeVector; U+02960 LeftUpVector; U+021BF LeftUpVectorBar; U+02958 LeftVector; U+021BC LeftVectorBar; U+02952 Leftarrow; U+021D0 Leftrightarrow; U+021D4 LessEqualGreater; U+022DA LessFullEqual; U+02266 LessGreater; U+02276 LessLess; U+02AA1 LessSlantEqual; U+02A7D LessTilde; U+02272 Lfr; U+1D50F Ll; U+022D8 Lleftarrow; U+021DA Lmidot; U+0013F LongLeftArrow; U+027F5 LongLeftRightArrow; U+027F7 LongRightArrow; U+027F6 Longleftarrow; U+027F8 Longleftrightarrow; U+027FA Longrightarrow; U+027F9 Lopf; U+1D543 LowerLeftArrow; U+02199 LowerRightArrow; U+02198 Lscr; U+02112 Lsh; U+021B0 Lstrok; U+00141 Lt; U+0226A Map; U+02905 Mcy; U+0041C MediumSpace; U+0205F Mellintrf; U+02133 Mfr; U+1D510 MinusPlus; U+02213 Mopf; U+1D544 Mscr; U+02133 Mu; U+0039C NJcy; U+0040A Nacute; U+00143 Ncaron; U+00147 Ncedil; U+00145 Ncy; U+0041D NegativeMediumSpace; U+0200B NegativeThickSpace; U+0200B NegativeThinSpace; U+0200B NegativeVeryThinSpace; U+0200B NestedGreaterGreater; U+0226B NestedLessLess; U+0226A NewLine; U+0000A Nfr; U+1D511 NoBreak; U+02060 NonBreakingSpace; U+000A0 Nopf; U+02115 Not; U+02AEC NotCongruent; U+02262 NotCupCap; U+0226D NotDoubleVerticalBar; U+02226 NotElement; U+02209 NotEqual; U+02260 NotExists; U+02204 NotGreater; U+0226F NotGreaterEqual; U+02271 NotGreaterLess; U+02279 NotGreaterTilde; U+02275 NotLeftTriangle; U+022EA NotLeftTriangleEqual; U+022EC NotLess; U+0226E NotLessEqual; U+02270 NotLessGreater; U+02278 NotLessTilde; U+02274 NotPrecedes; U+02280 NotPrecedesSlantEqual; U+022E0 NotReverseElement; U+0220C NotRightTriangle; U+022EB NotRightTriangleEqual; U+022ED NotSquareSubsetEqual; U+022E2 NotSquareSupersetEqual; U+022E3 NotSubsetEqual; U+02288 NotSucceeds; U+02281 NotSucceedsSlantEqual; U+022E1 NotSupersetEqual; U+02289 NotTilde; U+02241 NotTildeEqual; U+02244 NotTildeFullEqual; U+02247 NotTildeTilde; U+02249 NotVerticalBar; U+02224 Nscr; U+1D4A9 Ntilde; U+000D1 Ntilde U+000D1 Nu; U+0039D OElig; U+00152 Oacute; U+000D3 Oacute U+000D3 Ocirc; U+000D4 Ocirc U+000D4 Ocy; U+0041E Odblac; U+00150 Ofr; U+1D512 Ograve; U+000D2 Ograve U+000D2 Omacr; U+0014C Omega; U+003A9 Omicron; U+0039F Oopf; U+1D546 OpenCurlyDoubleQuote; U+0201C OpenCurlyQuote; U+02018 Or; U+02A54 Oscr; U+1D4AA Oslash; U+000D8 Oslash U+000D8 Otilde; U+000D5 Otilde U+000D5 Otimes; U+02A37 Ouml; U+000D6 Ouml U+000D6 OverBar; U+000AF OverBrace; U+023DE OverBracket; U+023B4 OverParenthesis; U+023DC PartialD; U+02202 Pcy; U+0041F Pfr; U+1D513 Phi; U+003A6 Pi; U+003A0 PlusMinus; U+000B1 Poincareplane; U+0210C Popf; U+02119 Pr; U+02ABB Precedes; U+0227A PrecedesEqual; U+02AAF PrecedesSlantEqual; U+0227C PrecedesTilde; U+0227E Prime; U+02033 Product; U+0220F Proportion; U+02237 Proportional; U+0221D Pscr; U+1D4AB Psi; U+003A8 QUOT; U+00022 QUOT U+00022 Qfr; U+1D514 Qopf; U+0211A Qscr; U+1D4AC RBarr; U+02910 REG; U+000AE REG U+000AE Racute; U+00154 Rang; U+027EB Rarr; U+021A0 Rarrtl; U+02916 Rcaron; U+00158 Rcedil; U+00156 Rcy; U+00420 Re; U+0211C ReverseElement; U+0220B ReverseEquilibrium; U+021CB ReverseUpEquilibrium; U+0296F Rfr; U+0211C Rho; U+003A1 RightAngleBracket; U+027E9 RightArrow; U+02192 RightArrowBar; U+021E5 RightArrowLeftArrow; U+021C4 RightCeiling; U+02309 RightDoubleBracket; U+027E7 RightDownTeeVector; U+0295D RightDownVector; U+021C2 RightDownVectorBar; U+02955 RightFloor; U+0230B RightTee; U+022A2 RightTeeArrow; U+021A6 RightTeeVector; U+0295B RightTriangle; U+022B3 RightTriangleBar; U+029D0 RightTriangleEqual; U+022B5 RightUpDownVector; U+0294F RightUpTeeVector; U+0295C RightUpVector; U+021BE RightUpVectorBar; U+02954 RightVector; U+021C0 RightVectorBar; U+02953 Rightarrow; U+021D2 Ropf; U+0211D RoundImplies; U+02970 Rrightarrow; U+021DB Rscr; U+0211B Rsh; U+021B1 RuleDelayed; U+029F4 SHCHcy; U+00429 SHcy; U+00428 SOFTcy; U+0042C Sacute; U+0015A Sc; U+02ABC Scaron; U+00160 Scedil; U+0015E Scirc; U+0015C Scy; U+00421 Sfr; U+1D516 ShortDownArrow; U+02193 ShortLeftArrow; U+02190 ShortRightArrow; U+02192 ShortUpArrow; U+02191 Sigma; U+003A3 SmallCircle; U+02218 Sopf; U+1D54A Sqrt; U+0221A Square; U+025A1 SquareIntersection; U+02293 SquareSubset; U+0228F SquareSubsetEqual; U+02291 SquareSuperset; U+02290 SquareSupersetEqual; U+02292 SquareUnion; U+02294 Sscr; U+1D4AE Star; U+022C6 Sub; U+022D0 Subset; U+022D0 SubsetEqual; U+02286 Succeeds; U+0227B SucceedsEqual; U+02AB0 SucceedsSlantEqual; U+0227D SucceedsTilde; U+0227F SuchThat; U+0220B Sum; U+02211 Sup; U+022D1 Superset; U+02283 SupersetEqual; U+02287 Supset; U+022D1 THORN; U+000DE THORN U+000DE TRADE; U+02122 TSHcy; U+0040B TScy; U+00426 Tab; U+00009 Tau; U+003A4 Tcaron; U+00164 Tcedil; U+00162 Tcy; U+00422 Tfr; U+1D517 Therefore; U+02234 Theta; U+00398 ThinSpace; U+02009 Tilde; U+0223C TildeEqual; U+02243 TildeFullEqual; U+02245 TildeTilde; U+02248 Topf; U+1D54B TripleDot; U+020DB Tscr; U+1D4AF Tstrok; U+00166 Uacute; U+000DA Uacute U+000DA Uarr; U+0219F Uarrocir; U+02949 Ubrcy; U+0040E Ubreve; U+0016C Ucirc; U+000DB Ucirc U+000DB Ucy; U+00423 Udblac; U+00170 Ufr; U+1D518 Ugrave; U+000D9 Ugrave U+000D9 Umacr; U+0016A UnderBar; U+00332 UnderBrace; U+023DF UnderBracket; U+023B5 UnderParenthesis; U+023DD Union; U+022C3 UnionPlus; U+0228E Uogon; U+00172 Uopf; U+1D54C UpArrow; U+02191 UpArrowBar; U+02912 UpArrowDownArrow; U+021C5 UpDownArrow; U+02195 UpEquilibrium; U+0296E UpTee; U+022A5 UpTeeArrow; U+021A5 Uparrow; U+021D1 Updownarrow; U+021D5 UpperLeftArrow; U+02196 UpperRightArrow; U+02197 Upsi; U+003D2 Upsilon; U+003A5 Uring; U+0016E Uscr; U+1D4B0 Utilde; U+00168 Uuml; U+000DC Uuml U+000DC VDash; U+022AB Vbar; U+02AEB Vcy; U+00412 Vdash; U+022A9 Vdashl; U+02AE6 Vee; U+022C1 Verbar; U+02016 Vert; U+02016 VerticalBar; U+02223 VerticalLine; U+0007C VerticalSeparator; U+02758 VerticalTilde; U+02240 VeryThinSpace; U+0200A Vfr; U+1D519 Vopf; U+1D54D Vscr; U+1D4B1 Vvdash; U+022AA Wcirc; U+00174 Wedge; U+022C0 Wfr; U+1D51A Wopf; U+1D54E Wscr; U+1D4B2 Xfr; U+1D51B Xi; U+0039E Xopf; U+1D54F Xscr; U+1D4B3 YAcy; U+0042F YIcy; U+00407 YUcy; U+0042E Yacute; U+000DD Yacute U+000DD Ycirc; U+00176 Ycy; U+0042B Yfr; U+1D51C Yopf; U+1D550 Yscr; U+1D4B4 Yuml; U+00178 ZHcy; U+00416 Zacute; U+00179 Zcaron; U+0017D Zcy; U+00417 Zdot; U+0017B ZeroWidthSpace; U+0200B Zeta; U+00396 Zfr; U+02128 Zopf; U+02124 Zscr; U+1D4B5 aacute; U+000E1 aacute U+000E1 abreve; U+00103 ac; U+0223E acd; U+0223F acirc; U+000E2 acirc U+000E2 acute; U+000B4 acute U+000B4 acy; U+00430 aelig; U+000E6 aelig U+000E6 af; U+02061 afr; U+1D51E agrave; U+000E0 agrave U+000E0 alefsym; U+02135 aleph; U+02135 alpha; U+003B1 amacr; U+00101 amalg; U+02A3F amp; U+00026 amp U+00026 and; U+02227 andand; U+02A55 andd; U+02A5C andslope; U+02A58 andv; U+02A5A ang; U+02220 ange; U+029A4 angle; U+02220 angmsd; U+02221 angmsdaa; U+029A8 angmsdab; U+029A9 angmsdac; U+029AA angmsdad; U+029AB angmsdae; U+029AC angmsdaf; U+029AD angmsdag; U+029AE angmsdah; U+029AF angrt; U+0221F angrtvb; U+022BE angrtvbd; U+0299D angsph; U+02222 angst; U+0212B angzarr; U+0237C aogon; U+00105 aopf; U+1D552 ap; U+02248 apE; U+02A70 apacir; U+02A6F ape; U+0224A apid; U+0224B apos; U+00027 approx; U+02248 approxeq; U+0224A aring; U+000E5 aring U+000E5 ascr; U+1D4B6 ast; U+0002A asymp; U+02248 asympeq; U+0224D atilde; U+000E3 atilde U+000E3 auml; U+000E4 auml U+000E4 awconint; U+02233 awint; U+02A11 bNot; U+02AED backcong; U+0224C backepsilon; U+003F6 backprime; U+02035 backsim; U+0223D backsimeq; U+022CD barvee; U+022BD barwed; U+02305 barwedge; U+02305 bbrk; U+023B5 bbrktbrk; U+023B6 bcong; U+0224C bcy; U+00431 bdquo; U+0201E becaus; U+02235 because; U+02235 bemptyv; U+029B0 bepsi; U+003F6 bernou; U+0212C beta; U+003B2 beth; U+02136 between; U+0226C bfr; U+1D51F bigcap; U+022C2 bigcirc; U+025EF bigcup; U+022C3 bigodot; U+02A00 bigoplus; U+02A01 bigotimes; U+02A02 bigsqcup; U+02A06 bigstar; U+02605 bigtriangledown; U+025BD bigtriangleup; U+025B3 biguplus; U+02A04 bigvee; U+022C1 bigwedge; U+022C0 bkarow; U+0290D blacklozenge; U+029EB blacksquare; U+025AA blacktriangle; U+025B4 blacktriangledown; U+025BE blacktriangleleft; U+025C2 blacktriangleright; U+025B8 blank; U+02423 blk12; U+02592 blk14; U+02591 blk34; U+02593 block; U+02588 bnot; U+02310 bopf; U+1D553 bot; U+022A5 bottom; U+022A5 bowtie; U+022C8 boxDL; U+02557 boxDR; U+02554 boxDl; U+02556 boxDr; U+02553 boxH; U+02550 boxHD; U+02566 boxHU; U+02569 boxHd; U+02564 boxHu; U+02567 boxUL; U+0255D boxUR; U+0255A boxUl; U+0255C boxUr; U+02559 boxV; U+02551 boxVH; U+0256C boxVL; U+02563 boxVR; U+02560 boxVh; U+0256B boxVl; U+02562 boxVr; U+0255F boxbox; U+029C9 boxdL; U+02555 boxdR; U+02552 boxdl; U+02510 boxdr; U+0250C boxh; U+02500 boxhD; U+02565 boxhU; U+02568 boxhd; U+0252C boxhu; U+02534 boxminus; U+0229F boxplus; U+0229E boxtimes; U+022A0 boxuL; U+0255B boxuR; U+02558 boxul; U+02518 boxur; U+02514 boxv; U+02502 boxvH; U+0256A boxvL; U+02561 boxvR; U+0255E boxvh; U+0253C boxvl; U+02524 boxvr; U+0251C bprime; U+02035 breve; U+002D8 brvbar; U+000A6 brvbar U+000A6 bscr; U+1D4B7 bsemi; U+0204F bsim; U+0223D bsime; U+022CD bsol; U+0005C bsolb; U+029C5 bull; U+02022 bullet; U+02022 bump; U+0224E bumpE; U+02AAE bumpe; U+0224F bumpeq; U+0224F cacute; U+00107 cap; U+02229 capand; U+02A44 capbrcup; U+02A49 capcap; U+02A4B capcup; U+02A47 capdot; U+02A40 caret; U+02041 caron; U+002C7 ccaps; U+02A4D ccaron; U+0010D ccedil; U+000E7 ccedil U+000E7 ccirc; U+00109 ccups; U+02A4C ccupssm; U+02A50 cdot; U+0010B cedil; U+000B8 cedil U+000B8 cemptyv; U+029B2 cent; U+000A2 cent U+000A2 centerdot; U+000B7 cfr; U+1D520 chcy; U+00447 check; U+02713 checkmark; U+02713 chi; U+003C7 cir; U+025CB cirE; U+029C3 circ; U+002C6 circeq; U+02257 circlearrowleft; U+021BA circlearrowright; U+021BB circledR; U+000AE circledS; U+024C8 circledast; U+0229B circledcirc; U+0229A circleddash; U+0229D cire; U+02257 cirfnint; U+02A10 cirmid; U+02AEF cirscir; U+029C2 clubs; U+02663 clubsuit; U+02663 colon; U+0003A colone; U+02254 coloneq; U+02254 comma; U+0002C commat; U+00040 comp; U+02201 compfn; U+02218 complement; U+02201 complexes; U+02102 cong; U+02245 congdot; U+02A6D conint; U+0222E copf; U+1D554 coprod; U+02210 copy; U+000A9 copy U+000A9 copysr; U+02117 crarr; U+021B5 cross; U+02717 cscr; U+1D4B8 csub; U+02ACF csube; U+02AD1 csup; U+02AD0 csupe; U+02AD2 ctdot; U+022EF cudarrl; U+02938 cudarrr; U+02935 cuepr; U+022DE cuesc; U+022DF cularr; U+021B6 cularrp; U+0293D cup; U+0222A cupbrcap; U+02A48 cupcap; U+02A46 cupcup; U+02A4A cupdot; U+0228D cupor; U+02A45 curarr; U+021B7 curarrm; U+0293C curlyeqprec; U+022DE curlyeqsucc; U+022DF curlyvee; U+022CE curlywedge; U+022CF curren; U+000A4 curren U+000A4 curvearrowleft; U+021B6 curvearrowright; U+021B7 cuvee; U+022CE cuwed; U+022CF cwconint; U+02232 cwint; U+02231 cylcty; U+0232D dArr; U+021D3 dHar; U+02965 dagger; U+02020 daleth; U+02138 darr; U+02193 dash; U+02010 dashv; U+022A3 dbkarow; U+0290F dblac; U+002DD dcaron; U+0010F dcy; U+00434 dd; U+02146 ddagger; U+02021 ddarr; U+021CA ddotseq; U+02A77 deg; U+000B0 deg U+000B0 delta; U+003B4 demptyv; U+029B1 dfisht; U+0297F dfr; U+1D521 dharl; U+021C3 dharr; U+021C2 diam; U+022C4 diamond; U+022C4 diamondsuit; U+02666 diams; U+02666 die; U+000A8 digamma; U+003DD disin; U+022F2 div; U+000F7 divide; U+000F7 divide U+000F7 divideontimes; U+022C7 divonx; U+022C7 djcy; U+00452 dlcorn; U+0231E dlcrop; U+0230D dollar; U+00024 dopf; U+1D555 dot; U+002D9 doteq; U+02250 doteqdot; U+02251 dotminus; U+02238 dotplus; U+02214 dotsquare; U+022A1 doublebarwedge; U+02306 downarrow; U+02193 downdownarrows; U+021CA downharpoonleft; U+021C3 downharpoonright; U+021C2 drbkarow; U+02910 drcorn; U+0231F drcrop; U+0230C dscr; U+1D4B9 dscy; U+00455 dsol; U+029F6 dstrok; U+00111 dtdot; U+022F1 dtri; U+025BF dtrif; U+025BE duarr; U+021F5 duhar; U+0296F dwangle; U+029A6 dzcy; U+0045F dzigrarr; U+027FF eDDot; U+02A77 eDot; U+02251 eacute; U+000E9 eacute U+000E9 easter; U+02A6E ecaron; U+0011B ecir; U+02256 ecirc; U+000EA ecirc U+000EA ecolon; U+02255 ecy; U+0044D edot; U+00117 ee; U+02147 efDot; U+02252 efr; U+1D522 eg; U+02A9A egrave; U+000E8 egrave U+000E8 egs; U+02A96 egsdot; U+02A98 el; U+02A99 elinters; U+023E7 ell; U+02113 els; U+02A95 elsdot; U+02A97 emacr; U+00113 empty; U+02205 emptyset; U+02205 emptyv; U+02205 emsp13; U+02004 emsp14; U+02005 emsp; U+02003 eng; U+0014B ensp; U+02002 eogon; U+00119 eopf; U+1D556 epar; U+022D5 eparsl; U+029E3 eplus; U+02A71 epsi; U+003F5 epsilon; U+003B5 epsiv; U+003B5 eqcirc; U+02256 eqcolon; U+02255 eqsim; U+02242 eqslantgtr; U+02A96 eqslantless; U+02A95 equals; U+0003D equest; U+0225F equiv; U+02261 equivDD; U+02A78 eqvparsl; U+029E5 erDot; U+02253 erarr; U+02971 escr; U+0212F esdot; U+02250 esim; U+02242 eta; U+003B7 eth; U+000F0 eth U+000F0 euml; U+000EB euml U+000EB euro; U+020AC excl; U+00021 exist; U+02203 expectation; U+02130 exponentiale; U+02147 fallingdotseq; U+02252 fcy; U+00444 female; U+02640 ffilig; U+0FB03 fflig; U+0FB00 ffllig; U+0FB04 ffr; U+1D523 filig; U+0FB01 flat; U+0266D fllig; U+0FB02 fltns; U+025B1 fnof; U+00192 fopf; U+1D557 forall; U+02200 fork; U+022D4 forkv; U+02AD9 fpartint; U+02A0D frac12; U+000BD frac12 U+000BD frac13; U+02153 frac14; U+000BC frac14 U+000BC frac15; U+02155 frac16; U+02159 frac18; U+0215B frac23; U+02154 frac25; U+02156 frac34; U+000BE frac34 U+000BE frac35; U+02157 frac38; U+0215C frac45; U+02158 frac56; U+0215A frac58; U+0215D frac78; U+0215E frasl; U+02044 frown; U+02322 fscr; U+1D4BB gE; U+02267 gEl; U+02A8C gacute; U+001F5 gamma; U+003B3 gammad; U+003DD gap; U+02A86 gbreve; U+0011F gcirc; U+0011D gcy; U+00433 gdot; U+00121 ge; U+02265 gel; U+022DB geq; U+02265 geqq; U+02267 geqslant; U+02A7E ges; U+02A7E gescc; U+02AA9 gesdot; U+02A80 gesdoto; U+02A82 gesdotol; U+02A84 gesles; U+02A94 gfr; U+1D524 gg; U+0226B ggg; U+022D9 gimel; U+02137 gjcy; U+00453 gl; U+02277 glE; U+02A92 gla; U+02AA5 glj; U+02AA4 gnE; U+02269 gnap; U+02A8A gnapprox; U+02A8A gne; U+02A88 gneq; U+02A88 gneqq; U+02269 gnsim; U+022E7 gopf; U+1D558 grave; U+00060 gscr; U+0210A gsim; U+02273 gsime; U+02A8E gsiml; U+02A90 gt; U+0003E gt U+0003E gtcc; U+02AA7 gtcir; U+02A7A gtdot; U+022D7 gtlPar; U+02995 gtquest; U+02A7C gtrapprox; U+02A86 gtrarr; U+02978 gtrdot; U+022D7 gtreqless; U+022DB gtreqqless; U+02A8C gtrless; U+02277 gtrsim; U+02273 hArr; U+021D4 hairsp; U+0200A half; U+000BD hamilt; U+0210B hardcy; U+0044A harr; U+02194 harrcir; U+02948 harrw; U+021AD hbar; U+0210F hcirc; U+00125 hearts; U+02665 heartsuit; U+02665 hellip; U+02026 hercon; U+022B9 hfr; U+1D525 hksearow; U+02925 hkswarow; U+02926 hoarr; U+021FF homtht; U+0223B hookleftarrow; U+021A9 hookrightarrow; U+021AA hopf; U+1D559 horbar; U+02015 hscr; U+1D4BD hslash; U+0210F hstrok; U+00127 hybull; U+02043 hyphen; U+02010 iacute; U+000ED iacute U+000ED ic; U+02063 icirc; U+000EE icirc U+000EE icy; U+00438 iecy; U+00435 iexcl; U+000A1 iexcl U+000A1 iff; U+021D4 ifr; U+1D526 igrave; U+000EC igrave U+000EC ii; U+02148 iiiint; U+02A0C iiint; U+0222D iinfin; U+029DC iiota; U+02129 ijlig; U+00133 imacr; U+0012B image; U+02111 imagline; U+02110 imagpart; U+02111 imath; U+00131 imof; U+022B7 imped; U+001B5 in; U+02208 incare; U+02105 infin; U+0221E infintie; U+029DD inodot; U+00131 int; U+0222B intcal; U+022BA integers; U+02124 intercal; U+022BA intlarhk; U+02A17 intprod; U+02A3C iocy; U+00451 iogon; U+0012F iopf; U+1D55A iota; U+003B9 iprod; U+02A3C iquest; U+000BF iquest U+000BF iscr; U+1D4BE isin; U+02208 isinE; U+022F9 isindot; U+022F5 isins; U+022F4 isinsv; U+022F3 isinv; U+02208 it; U+02062 itilde; U+00129 iukcy; U+00456 iuml; U+000EF iuml U+000EF jcirc; U+00135 jcy; U+00439 jfr; U+1D527 jmath; U+00237 jopf; U+1D55B jscr; U+1D4BF jsercy; U+00458 jukcy; U+00454 kappa; U+003BA kappav; U+003F0 kcedil; U+00137 kcy; U+0043A kfr; U+1D528 kgreen; U+00138 khcy; U+00445 kjcy; U+0045C kopf; U+1D55C kscr; U+1D4C0 lAarr; U+021DA lArr; U+021D0 lAtail; U+0291B lBarr; U+0290E lE; U+02266 lEg; U+02A8B lHar; U+02962 lacute; U+0013A laemptyv; U+029B4 lagran; U+02112 lambda; U+003BB lang; U+027E8 langd; U+02991 langle; U+027E8 lap; U+02A85 laquo; U+000AB laquo U+000AB larr; U+02190 larrb; U+021E4 larrbfs; U+0291F larrfs; U+0291D larrhk; U+021A9 larrlp; U+021AB larrpl; U+02939 larrsim; U+02973 larrtl; U+021A2 lat; U+02AAB latail; U+02919 late; U+02AAD lbarr; U+0290C lbbrk; U+02772 lbrace; U+0007B lbrack; U+0005B lbrke; U+0298B lbrksld; U+0298F lbrkslu; U+0298D lcaron; U+0013E lcedil; U+0013C lceil; U+02308 lcub; U+0007B lcy; U+0043B ldca; U+02936 ldquo; U+0201C ldquor; U+0201E ldrdhar; U+02967 ldrushar; U+0294B ldsh; U+021B2 le; U+02264 leftarrow; U+02190 leftarrowtail; U+021A2 leftharpoondown; U+021BD leftharpoonup; U+021BC leftleftarrows; U+021C7 leftrightarrow; U+02194 leftrightarrows; U+021C6 leftrightharpoons; U+021CB leftrightsquigarrow; U+021AD leftthreetimes; U+022CB leg; U+022DA leq; U+02264 leqq; U+02266 leqslant; U+02A7D les; U+02A7D lescc; U+02AA8 lesdot; U+02A7F lesdoto; U+02A81 lesdotor; U+02A83 lesges; U+02A93 lessapprox; U+02A85 lessdot; U+022D6 lesseqgtr; U+022DA lesseqqgtr; U+02A8B lessgtr; U+02276 lesssim; U+02272 lfisht; U+0297C lfloor; U+0230A lfr; U+1D529 lg; U+02276 lgE; U+02A91 lhard; U+021BD lharu; U+021BC lharul; U+0296A lhblk; U+02584 ljcy; U+00459 ll; U+0226A llarr; U+021C7 llcorner; U+0231E llhard; U+0296B lltri; U+025FA lmidot; U+00140 lmoust; U+023B0 lmoustache; U+023B0 lnE; U+02268 lnap; U+02A89 lnapprox; U+02A89 lne; U+02A87 lneq; U+02A87 lneqq; U+02268 lnsim; U+022E6 loang; U+027EC loarr; U+021FD lobrk; U+027E6 longleftarrow; U+027F5 longleftrightarrow; U+027F7 longmapsto; U+027FC longrightarrow; U+027F6 looparrowleft; U+021AB looparrowright; U+021AC lopar; U+02985 lopf; U+1D55D loplus; U+02A2D lotimes; U+02A34 lowast; U+02217 lowbar; U+0005F loz; U+025CA lozenge; U+025CA lozf; U+029EB lpar; U+00028 lparlt; U+02993 lrarr; U+021C6 lrcorner; U+0231F lrhar; U+021CB lrhard; U+0296D lrm; U+0200E lrtri; U+022BF lsaquo; U+02039 lscr; U+1D4C1 lsh; U+021B0 lsim; U+02272 lsime; U+02A8D lsimg; U+02A8F lsqb; U+0005B lsquo; U+02018 lsquor; U+0201A lstrok; U+00142 lt; U+0003C lt U+0003C ltcc; U+02AA6 ltcir; U+02A79 ltdot; U+022D6 lthree; U+022CB ltimes; U+022C9 ltlarr; U+02976 ltquest; U+02A7B ltrPar; U+02996 ltri; U+025C3 ltrie; U+022B4 ltrif; U+025C2 lurdshar; U+0294A luruhar; U+02966 mDDot; U+0223A macr; U+000AF macr U+000AF male; U+02642 malt; U+02720 maltese; U+02720 map; U+021A6 mapsto; U+021A6 mapstodown; U+021A7 mapstoleft; U+021A4 mapstoup; U+021A5 marker; U+025AE mcomma; U+02A29 mcy; U+0043C mdash; U+02014 measuredangle; U+02221 mfr; U+1D52A mho; U+02127 micro; U+000B5 micro U+000B5 mid; U+02223 midast; U+0002A midcir; U+02AF0 middot; U+000B7 middot U+000B7 minus; U+02212 minusb; U+0229F minusd; U+02238 minusdu; U+02A2A mlcp; U+02ADB mldr; U+02026 mnplus; U+02213 models; U+022A7 mopf; U+1D55E mp; U+02213 mscr; U+1D4C2 mstpos; U+0223E mu; U+003BC multimap; U+022B8 mumap; U+022B8 nLeftarrow; U+021CD nLeftrightarrow; U+021CE nRightarrow; U+021CF nVDash; U+022AF nVdash; U+022AE nabla; U+02207 nacute; U+00144 nap; U+02249 napos; U+00149 napprox; U+02249 natur; U+0266E natural; U+0266E naturals; U+02115 nbsp; U+000A0 nbsp U+000A0 ncap; U+02A43 ncaron; U+00148 ncedil; U+00146 ncong; U+02247 ncup; U+02A42 ncy; U+0043D ndash; U+02013 ne; U+02260 neArr; U+021D7 nearhk; U+02924 nearr; U+02197 nearrow; U+02197 nequiv; U+02262 nesear; U+02928 nexist; U+02204 nexists; U+02204 nfr; U+1D52B nge; U+02271 ngeq; U+02271 ngsim; U+02275 ngt; U+0226F ngtr; U+0226F nhArr; U+021CE nharr; U+021AE nhpar; U+02AF2 ni; U+0220B nis; U+022FC nisd; U+022FA niv; U+0220B njcy; U+0045A nlArr; U+021CD nlarr; U+0219A nldr; U+02025 nle; U+02270 nleftarrow; U+0219A nleftrightarrow; U+021AE nleq; U+02270 nless; U+0226E nlsim; U+02274 nlt; U+0226E nltri; U+022EA nltrie; U+022EC nmid; U+02224 nopf; U+1D55F not; U+000AC not U+000AC notin; U+02209 notinva; U+02209 notinvb; U+022F7 notinvc; U+022F6 notni; U+0220C notniva; U+0220C notnivb; U+022FE notnivc; U+022FD npar; U+02226 nparallel; U+02226 npolint; U+02A14 npr; U+02280 nprcue; U+022E0 nprec; U+02280 nrArr; U+021CF nrarr; U+0219B nrightarrow; U+0219B nrtri; U+022EB nrtrie; U+022ED nsc; U+02281 nsccue; U+022E1 nscr; U+1D4C3 nshortmid; U+02224 nshortparallel; U+02226 nsim; U+02241 nsime; U+02244 nsimeq; U+02244 nsmid; U+02224 nspar; U+02226 nsqsube; U+022E2 nsqsupe; U+022E3 nsub; U+02284 nsube; U+02288 nsubseteq; U+02288 nsucc; U+02281 nsup; U+02285 nsupe; U+02289 nsupseteq; U+02289 ntgl; U+02279 ntilde; U+000F1 ntilde U+000F1 ntlg; U+02278 ntriangleleft; U+022EA ntrianglelefteq; U+022EC ntriangleright; U+022EB ntrianglerighteq; U+022ED nu; U+003BD num; U+00023 numero; U+02116 numsp; U+02007 nvDash; U+022AD nvHarr; U+02904 nvdash; U+022AC nvinfin; U+029DE nvlArr; U+02902 nvrArr; U+02903 nwArr; U+021D6 nwarhk; U+02923 nwarr; U+02196 nwarrow; U+02196 nwnear; U+02927 oS; U+024C8 oacute; U+000F3 oacute U+000F3 oast; U+0229B ocir; U+0229A ocirc; U+000F4 ocirc U+000F4 ocy; U+0043E odash; U+0229D odblac; U+00151 odiv; U+02A38 odot; U+02299 odsold; U+029BC oelig; U+00153 ofcir; U+029BF ofr; U+1D52C ogon; U+002DB ograve; U+000F2 ograve U+000F2 ogt; U+029C1 ohbar; U+029B5 ohm; U+02126 oint; U+0222E olarr; U+021BA olcir; U+029BE olcross; U+029BB oline; U+0203E olt; U+029C0 omacr; U+0014D omega; U+003C9 omicron; U+003BF omid; U+029B6 ominus; U+02296 oopf; U+1D560 opar; U+029B7 operp; U+029B9 oplus; U+02295 or; U+02228 orarr; U+021BB ord; U+02A5D order; U+02134 orderof; U+02134 ordf; U+000AA ordf U+000AA ordm; U+000BA ordm U+000BA origof; U+022B6 oror; U+02A56 orslope; U+02A57 orv; U+02A5B oscr; U+02134 oslash; U+000F8 oslash U+000F8 osol; U+02298 otilde; U+000F5 otilde U+000F5 otimes; U+02297 otimesas; U+02A36 ouml; U+000F6 ouml U+000F6 ovbar; U+0233D par; U+02225 para; U+000B6 para U+000B6 parallel; U+02225 parsim; U+02AF3 parsl; U+02AFD part; U+02202 pcy; U+0043F percnt; U+00025 period; U+0002E permil; U+02030 perp; U+022A5 pertenk; U+02031 pfr; U+1D52D phi; U+003C6 phiv; U+003C6 phmmat; U+02133 phone; U+0260E pi; U+003C0 pitchfork; U+022D4 piv; U+003D6 planck; U+0210F planckh; U+0210E plankv; U+0210F plus; U+0002B plusacir; U+02A23 plusb; U+0229E pluscir; U+02A22 plusdo; U+02214 plusdu; U+02A25 pluse; U+02A72 plusmn; U+000B1 plusmn U+000B1 plussim; U+02A26 plustwo; U+02A27 pm; U+000B1 pointint; U+02A15 popf; U+1D561 pound; U+000A3 pound U+000A3 pr; U+0227A prE; U+02AB3 prap; U+02AB7 prcue; U+0227C pre; U+02AAF prec; U+0227A precapprox; U+02AB7 preccurlyeq; U+0227C preceq; U+02AAF precnapprox; U+02AB9 precneqq; U+02AB5 precnsim; U+022E8 precsim; U+0227E prime; U+02032 primes; U+02119 prnE; U+02AB5 prnap; U+02AB9 prnsim; U+022E8 prod; U+0220F profalar; U+0232E profline; U+02312 profsurf; U+02313 prop; U+0221D propto; U+0221D prsim; U+0227E prurel; U+022B0 pscr; U+1D4C5 psi; U+003C8 puncsp; U+02008 qfr; U+1D52E qint; U+02A0C qopf; U+1D562 qprime; U+02057 qscr; U+1D4C6 quaternions; U+0210D quatint; U+02A16 quest; U+0003F questeq; U+0225F quot; U+00022 quot U+00022 rAarr; U+021DB rArr; U+021D2 rAtail; U+0291C rBarr; U+0290F rHar; U+02964 race; U+029DA racute; U+00155 radic; U+0221A raemptyv; U+029B3 rang; U+027E9 rangd; U+02992 range; U+029A5 rangle; U+027E9 raquo; U+000BB raquo U+000BB rarr; U+02192 rarrap; U+02975 rarrb; U+021E5 rarrbfs; U+02920 rarrc; U+02933 rarrfs; U+0291E rarrhk; U+021AA rarrlp; U+021AC rarrpl; U+02945 rarrsim; U+02974 rarrtl; U+021A3 rarrw; U+0219D ratail; U+0291A ratio; U+02236 rationals; U+0211A rbarr; U+0290D rbbrk; U+02773 rbrace; U+0007D rbrack; U+0005D rbrke; U+0298C rbrksld; U+0298E rbrkslu; U+02990 rcaron; U+00159 rcedil; U+00157 rceil; U+02309 rcub; U+0007D rcy; U+00440 rdca; U+02937 rdldhar; U+02969 rdquo; U+0201D rdquor; U+0201D rdsh; U+021B3 real; U+0211C realine; U+0211B realpart; U+0211C reals; U+0211D rect; U+025AD reg; U+000AE reg U+000AE rfisht; U+0297D rfloor; U+0230B rfr; U+1D52F rhard; U+021C1 rharu; U+021C0 rharul; U+0296C rho; U+003C1 rhov; U+003F1 rightarrow; U+02192 rightarrowtail; U+021A3 rightharpoondown; U+021C1 rightharpoonup; U+021C0 rightleftarrows; U+021C4 rightleftharpoons; U+021CC rightrightarrows; U+021C9 rightsquigarrow; U+0219D rightthreetimes; U+022CC ring; U+002DA risingdotseq; U+02253 rlarr; U+021C4 rlhar; U+021CC rlm; U+0200F rmoust; U+023B1 rmoustache; U+023B1 rnmid; U+02AEE roang; U+027ED roarr; U+021FE robrk; U+027E7 ropar; U+02986 ropf; U+1D563 roplus; U+02A2E rotimes; U+02A35 rpar; U+00029 rpargt; U+02994 rppolint; U+02A12 rrarr; U+021C9 rsaquo; U+0203A rscr; U+1D4C7 rsh; U+021B1 rsqb; U+0005D rsquo; U+02019 rsquor; U+02019 rthree; U+022CC rtimes; U+022CA rtri; U+025B9 rtrie; U+022B5 rtrif; U+025B8 rtriltri; U+029CE ruluhar; U+02968 rx; U+0211E sacute; U+0015B sbquo; U+0201A sc; U+0227B scE; U+02AB4 scap; U+02AB8 scaron; U+00161 sccue; U+0227D sce; U+02AB0 scedil; U+0015F scirc; U+0015D scnE; U+02AB6 scnap; U+02ABA scnsim; U+022E9 scpolint; U+02A13 scsim; U+0227F scy; U+00441 sdot; U+022C5 sdotb; U+022A1 sdote; U+02A66 seArr; U+021D8 searhk; U+02925 searr; U+02198 searrow; U+02198 sect; U+000A7 sect U+000A7 semi; U+0003B seswar; U+02929 setminus; U+02216 setmn; U+02216 sext; U+02736 sfr; U+1D530 sfrown; U+02322 sharp; U+0266F shchcy; U+00449 shcy; U+00448 shortmid; U+02223 shortparallel; U+02225 shy; U+000AD shy U+000AD sigma; U+003C3 sigmaf; U+003C2 sigmav; U+003C2 sim; U+0223C simdot; U+02A6A sime; U+02243 simeq; U+02243 simg; U+02A9E simgE; U+02AA0 siml; U+02A9D simlE; U+02A9F simne; U+02246 simplus; U+02A24 simrarr; U+02972 slarr; U+02190 smallsetminus; U+02216 smashp; U+02A33 smeparsl; U+029E4 smid; U+02223 smile; U+02323 smt; U+02AAA smte; U+02AAC softcy; U+0044C sol; U+0002F solb; U+029C4 solbar; U+0233F sopf; U+1D564 spades; U+02660 spadesuit; U+02660 spar; U+02225 sqcap; U+02293 sqcup; U+02294 sqsub; U+0228F sqsube; U+02291 sqsubset; U+0228F sqsubseteq; U+02291 sqsup; U+02290 sqsupe; U+02292 sqsupset; U+02290 sqsupseteq; U+02292 squ; U+025A1 square; U+025A1 squarf; U+025AA squf; U+025AA srarr; U+02192 sscr; U+1D4C8 ssetmn; U+02216 ssmile; U+02323 sstarf; U+022C6 star; U+02606 starf; U+02605 straightepsilon; U+003F5 straightphi; U+003D5 strns; U+000AF sub; U+02282 subE; U+02AC5 subdot; U+02ABD sube; U+02286 subedot; U+02AC3 submult; U+02AC1 subnE; U+02ACB subne; U+0228A subplus; U+02ABF subrarr; U+02979 subset; U+02282 subseteq; U+02286 subseteqq; U+02AC5 subsetneq; U+0228A subsetneqq; U+02ACB subsim; U+02AC7 subsub; U+02AD5 subsup; U+02AD3 succ; U+0227B succapprox; U+02AB8 succcurlyeq; U+0227D succeq; U+02AB0 succnapprox; U+02ABA succneqq; U+02AB6 succnsim; U+022E9 succsim; U+0227F sum; U+02211 sung; U+0266A sup1; U+000B9 sup1 U+000B9 sup2; U+000B2 sup2 U+000B2 sup3; U+000B3 sup3 U+000B3 sup; U+02283 supE; U+02AC6 supdot; U+02ABE supdsub; U+02AD8 supe; U+02287 supedot; U+02AC4 suphsub; U+02AD7 suplarr; U+0297B supmult; U+02AC2 supnE; U+02ACC supne; U+0228B supplus; U+02AC0 supset; U+02283 supseteq; U+02287 supseteqq; U+02AC6 supsetneq; U+0228B supsetneqq; U+02ACC supsim; U+02AC8 supsub; U+02AD4 supsup; U+02AD6 swArr; U+021D9 swarhk; U+02926 swarr; U+02199 swarrow; U+02199 swnwar; U+0292A szlig; U+000DF szlig U+000DF target; U+02316 tau; U+003C4 tbrk; U+023B4 tcaron; U+00165 tcedil; U+00163 tcy; U+00442 tdot; U+020DB telrec; U+02315 tfr; U+1D531 there4; U+02234 therefore; U+02234 theta; U+003B8 thetasym; U+003D1 thetav; U+003D1 thickapprox; U+02248 thicksim; U+0223C thinsp; U+02009 thkap; U+02248 thksim; U+0223C thorn; U+000FE thorn U+000FE tilde; U+002DC times; U+000D7 times U+000D7 timesb; U+022A0 timesbar; U+02A31 timesd; U+02A30 tint; U+0222D toea; U+02928 top; U+022A4 topbot; U+02336 topcir; U+02AF1 topf; U+1D565 topfork; U+02ADA tosa; U+02929 tprime; U+02034 trade; U+02122 triangle; U+025B5 triangledown; U+025BF triangleleft; U+025C3 trianglelefteq; U+022B4 triangleq; U+0225C triangleright; U+025B9 trianglerighteq; U+022B5 tridot; U+025EC trie; U+0225C triminus; U+02A3A triplus; U+02A39 trisb; U+029CD tritime; U+02A3B trpezium; U+023E2 tscr; U+1D4C9 tscy; U+00446 tshcy; U+0045B tstrok; U+00167 twixt; U+0226C twoheadleftarrow; U+0219E twoheadrightarrow; U+021A0 uArr; U+021D1 uHar; U+02963 uacute; U+000FA uacute U+000FA uarr; U+02191 ubrcy; U+0045E ubreve; U+0016D ucirc; U+000FB ucirc U+000FB ucy; U+00443 udarr; U+021C5 udblac; U+00171 udhar; U+0296E ufisht; U+0297E ufr; U+1D532 ugrave; U+000F9 ugrave U+000F9 uharl; U+021BF uharr; U+021BE uhblk; U+02580 ulcorn; U+0231C ulcorner; U+0231C ulcrop; U+0230F ultri; U+025F8 umacr; U+0016B uml; U+000A8 uml U+000A8 uogon; U+00173 uopf; U+1D566 uparrow; U+02191 updownarrow; U+02195 upharpoonleft; U+021BF upharpoonright; U+021BE uplus; U+0228E upsi; U+003C5 upsih; U+003D2 upsilon; U+003C5 upuparrows; U+021C8 urcorn; U+0231D urcorner; U+0231D urcrop; U+0230E uring; U+0016F urtri; U+025F9 uscr; U+1D4CA utdot; U+022F0 utilde; U+00169 utri; U+025B5 utrif; U+025B4 uuarr; U+021C8 uuml; U+000FC uuml U+000FC uwangle; U+029A7 vArr; U+021D5 vBar; U+02AE8 vBarv; U+02AE9 vDash; U+022A8 vangrt; U+0299C varepsilon; U+003B5 varkappa; U+003F0 varnothing; U+02205 varphi; U+003C6 varpi; U+003D6 varpropto; U+0221D varr; U+02195 varrho; U+003F1 varsigma; U+003C2 vartheta; U+003D1 vartriangleleft; U+022B2 vartriangleright; U+022B3 vcy; U+00432 vdash; U+022A2 vee; U+02228 veebar; U+022BB veeeq; U+0225A vellip; U+022EE verbar; U+0007C vert; U+0007C vfr; U+1D533 vltri; U+022B2 vopf; U+1D567 vprop; U+0221D vrtri; U+022B3 vscr; U+1D4CB vzigzag; U+0299A wcirc; U+00175 wedbar; U+02A5F wedge; U+02227 wedgeq; U+02259 weierp; U+02118 wfr; U+1D534 wopf; U+1D568 wp; U+02118 wr; U+02240 wreath; U+02240 wscr; U+1D4CC xcap; U+022C2 xcirc; U+025EF xcup; U+022C3 xdtri; U+025BD xfr; U+1D535 xhArr; U+027FA xharr; U+027F7 xi; U+003BE xlArr; U+027F8 xlarr; U+027F5 xmap; U+027FC xnis; U+022FB xodot; U+02A00 xopf; U+1D569 xoplus; U+02A01 xotime; U+02A02 xrArr; U+027F9 xrarr; U+027F6 xscr; U+1D4CD xsqcup; U+02A06 xuplus; U+02A04 xutri; U+025B3 xvee; U+022C1 xwedge; U+022C0 yacute; U+000FD yacute U+000FD yacy; U+0044F ycirc; U+00177 ycy; U+0044B yen; U+000A5 yen U+000A5 yfr; U+1D536 yicy; U+00457 yopf; U+1D56A yscr; U+1D4CE yucy; U+0044E yuml; U+000FF yuml U+000FF zacute; U+0017A zcaron; U+0017E zcy; U+00437 zdot; U+0017C zeetrf; U+02128 zeta; U+003B6 zfr; U+1D537 zhcy; U+00436 zigrarr; U+021DD zopf; U+1D56B zscr; U+1D4CF zwj; U+0200D zwnj; U+0200C 10 The XHTML syntax This section only describes the rules for XML resources. Rules for text/html resources are discussed in the section above entitled " The HTML syntax ". 10.1 Writing XHTML documents The syntax for using HTML with XML, whether in XHTML documents or embedded in other XML documents, is defined in the XML and Namespaces in XML specifications. [XML] [XMLNS] This specification does not define any syntax-level requirements beyond those defined for XML proper. XML documents may contain a DOCTYPE if desired, but this is not required to conform to this specification. This specification does not define a public or system identifier, nor provide a format DTD. According to the XML specification, XML processors are not guaranteed to process the external DTD subset referenced in the DOCTYPE. This means, for example, that using entity references for characters in XHTML documents is unsafe if they are defined in an external file (except for &lt; , &gt; , &amp; , &quot; and &apos; ). 10.2 Parsing XHTML documents This section describes the relationship between XML and the DOM, with a particular emphasis on how this interacts with HTML. An XML parser , for the purposes of this specification, is a construct that follows the rules given in the XML specification to map a string of bytes or characters into a Document object. An XML parser is either associated with a Document object when it is created, or creates one implicitly. This Document must then be populated with DOM nodes that represent the tree structure of the input passed to the parser, as defined by the XML specification, the Namespaces in XML specification, and the DOM Core specification. DOM mutation events must not fire for the operations that the XML parser performs on the Document 's tree, but the user agent must act as if elements and attributes were individually appended and set respectively so as to trigger rules in this specification regarding what happens when an element in inserted into a document or has its attributes set. [XML] [XMLNS] [DOMCORE] [DOMEVENTS] Certain algorithms in this specification spoon-feed the parser characters one string at a time. In such cases, the XML parser must act as it would have if faced with a single string consisting of the concatenation of all those characters. When an XML parser creates a script element, it must be marked as being "parser-inserted" . If the parser was originally created for the XML fragment parsing algorithm , then the element must be marked as "already executed" also. When the element's end tag is parsed, the user agent must run the script element. If this causes there to be a pending external script , then the user agent must pause until that script has completed loading , and then execute it . Since the document.write() API is not available for XML documents , much of the complexity in the HTML parser is not needed in the XML parser . When an XML parser reaches the end of its input, it must stop parsing , following the same rules as the HTML parser . 10.3 Serializing XHTML fragments The XML fragment serialization algorithm for a Document or Element node either returns a fragment of XML that represents that node or raises an exception. For Document s, the algorithm must return a string in the form of a document entity , if none of the error cases below apply. For Element s, the algorithm must return a string in the form of an internal general parsed entity , if none of the error cases below apply. In both cases, the string returned must be XML namespace-well-formed and must be an isomorphic serialization of all of that node's child nodes, in tree order . User agents may adjust prefixes and namespace declarations in the serialization (and indeed might be forced to do so in some cases to obtain namespace-well-formed XML). User agents may use a combination of regular text, character references, and CDATA sections to represent text nodes in the DOM (and indeed might be forced to use representations that don't match the DOM's, e.g. if a CDATASection node contains the string " ]]> "). For Element s, if any of the elements in the serialization are in no namespace, the default namespace in scope for those elements must be explicitly declared as the empty string. (This doesn't apply in the Document case.) [XML] [XMLNS] If any of the following error cases are found in the DOM subtree being serialized, then the algorithm raises an INVALID_STATE_ERR exception instead of returning a string: A Document node with no child element nodes. A DocumentType node that has an external subset public identifier that contains characters that are not matched by the XML PubidChar production. [XML] A DocumentType node that has an external subset system identifier that contains both a U+0022 QUOTATION MARK ('"') and a U+0027 APOSTROPHE ("'"). A node with a local name containing a U+003A COLON (":"). An Attr node, Text node, CDATASection node, Comment node, or ProcessingInstruction node whose data contains characters that are not matched by the XML Char production. [XML] A Comment node whose data contains two adjacent U+002D HYPHEN-MINUS (-) characters or ends with such a character. A ProcessingInstruction node whose target name is an ASCII case-insensitive match for the string " xml ". A ProcessingInstruction node whose target name contains a U+003A COLON (":"). A ProcessingInstruction node whose data contains the string " ?> ". These are the only ways to make a DOM unserializable. The DOM enforces all the other XML constraints; for example, trying to set an attribute with a name that contains an equals sign (=) will raised an INVALID_CHARACTER_ERR exception. 10.4 Parsing XHTML fragments The XML fragment parsing algorithm for either returns a Document or raises a SYNTAX_ERR exception. Given a string input and an optional context element context , the algorithm is as follows: Create a new XML parser . If there is a context element, feed the parser just created the string corresponding to the start tag of that element, declaring all the namespace prefixes that are in scope on that element in the DOM, as well as declaring the default namespace (if any) that is in scope on that element in the DOM. A namespace prefix is in scope if the DOM Core lookupNamespaceURI() method on the element would return a non-null value for that prefix. The default namespace is the namespace for which the DOM Core isDefaultNamespace() method on the element would return true. Feed the parser just created the string input . If there is a context element, feed the parser just created the string corresponding to the end tag of that element. If there is an XML well-formedness or XML namespace well-formedness error, then raise a SYNTAX_ERR exception and abort these steps. If there is a context element, then return the child nodes of the root element of the resulting Document , in tree order . Otherwise, return the children of the Document object, in tree order . 11 Rendering User agents are not required present HTML documents in any particular way. However, this section provides a set of suggestions for rendering HTML documents that, if followed, are likely to lead to a user experience that closely resembles the experience intended by the documents' authors. So as to avoid confusion regarding the normativity of this section, RFC2119 terms have not been used. Instead, the term "expected" is used to indicate behavior that will lead to this experience. 11.1 Introduction In general, user agents are expected to support CSS, and many of the suggestions in this section are expressed in CSS terms. User agents that use other presentation mechanisms can derive their expected behavior by translating from the CSS rules given in this section. In the absence of style-layer rules to the contrary (e.g. author style sheets), user agents are expected to render an element so that it conveys to the user the meaning that the element represents , as described by this specification. The suggestions in this section generally assume a visual output medium with a resolution of 96dpi or greater, but HTML is intended to apply to multiple media (it is a media-independent language). User agents are encouraged to adapt the suggestions in this section to their target media. 11.2 The CSS user agent style sheet and presentational hints 11.2.1 Introduction The CSS rules given in these subsections are, unless otherwise specified, expected to be used as part of the user-agent level style sheet defaults for all documents that contain HTML elements . Some rules are intended for the author-level zero-specificity presentational hints part of the CSS cascade; these are explicitly called out as presentational hints . Some of the rules regarding left and right margins are given here as appropriate for elements whose 'direction' property is 'ltr', and are expected to be flipped around on elements whose 'direction' property is 'rtl'. These are marked " LTR-specific ". When the text below says that an attribute attribute on an element element maps to the pixel length property (or properties) properties , it means that if element has an attribute attribute set, and parsing that attribute's value using the rules for parsing non-negative integers doesn't generate an error, then the user agent is expected to use the parsed value as a pixel length for a presentational hint for properties . When the text below says that an attribute attribute on an element element maps to the dimension property (or properties) properties , it means that if element has an attribute attribute set, and parsing that attribute's value using the rules for parsing dimension values doesn't generate an error, then the user agent is expected to use the parsed dimension as the value for a presentational hint for properties , with the value given as a pixel length if the dimension was an integer, and with the value given as a percentage if the dimension was a percentage. 11.2.2 Display types @namespace url(http://www.w3.org/1999/xhtml); [hidden], area, audio:not([controls]), base, basefont, command, datalist, head, input[type=hidden], link, menu[type=context], meta, noembed, noframes, param, script, source, style, title { display: none; } address, article, aside, blockquote, body, center, dd, dialog, dir, div, dl, dt, figure, footer, form, h1, h2, h3, h4, h5, h6, header, hgroup, hr, html, legend, listing, menu, nav, ol, p, plaintext, pre, rp, section, ul, xmp { display: block; } table { display: table; } caption { display: table-caption; } colgroup { display: table-column-group; } col { display: table-column; } thead { display: table-header-group; } tbody { display: table-row-group; } tfoot { display: table-footer-group; } tr { display: table-row; } td, th { display: table-cell; } li { display: list-item; } ruby { display: ruby; } rt { display: ruby-text; } For the purposes of the CSS table model, the col element is to be treated as if it was present as many times as its span attribute specifies . For the purposes of the CSS table model, the colgroup element, if it contains no col element, is to be treated as if it had as many such children as its span attribute specifies . For the purposes of the CSS table model, the colspan and rowspan attributes on td and th elements are expected to provide the special knowledge regarding cells spanning rows and columns. For the purposes of the CSS ruby model, runs of descendants of ruby elements that are not rt or rp elements are expected to be wrapped in anonymous boxes whose 'display' property has the value 'ruby-base'. User agents that do not support correct ruby rendering are expected to render parentheses around the text of rt elements in the absence of rp elements. The br element is expected to render as if its contents were a single U+000A LINE FEED (LF) character and its 'white-space' property was 'pre'. User agents are expected to support the 'clear' property on inline elements (in order to render br elements with clear attributes) in the manner described in the non-normative note to this effect in CSS2.1. The user agent is expected to hide noscript elements for whom scripting is enabled , irrespective of CSS rules. 11.2.3 Margins and padding @namespace url(http://www.w3.org/1999/xhtml); article, aside, blockquote, dir, dl, figure, listing, menu, nav, ol, p, plaintext, pre, section, ul, xmp { margin-top: 1em; margin-bottom: 1em; } dir dir, dir dl, dir menu, dir ol, dir ul, dl dir, dl dl, dl menu, dl ol, dl ul, menu dir, menu dl, menu menu, menu ol, menu ul, ol dir, ol dl, ol menu, ol ol, ol ul, ul dir, ul dl, ul menu, ul ol, ul ul { margin-top: 0; margin-bottom: 0; } h1 { margin-top: 0.67em; margin-bottom; 0.67em; } h2 { margin-top: 0.83em; margin-bottom; 0.83em; } h3 { margin-top: 1.00em; margin-bottom; 1.00em; } h4 { margin-top: 1.33em; margin-bottom; 1.33em; } h5 { margin-top: 1.67em; margin-bottom; 1.67em; } h6 { margin-top: 2.33em; margin-bottom; 2.33em; } dd { margin-left: 40px; } /* : use 'margin-right' for rtl elements */ dir, menu, ol, ul { padding-left: 40px; } /* : use 'padding-right' for rtl elements */ blockquote, figure { margin-left: 40px; margin-right: 40px; } table { border-spacing: 2px; border-collapse: separate; } td, th { padding: 1px; } The article , aside , nav , and section elements are expected to affect the margins of h1 elements, based on the nesting depth. If x is a selector that matches elements that are either article , aside , nav , or section elements, then the following rules capture what is expected: @namespace url(http://www.w3.org/1999/xhtml); h1 { margin-top: 0.83em; margin-bottom: 0.83em; } h1 { margin-top: 1.00em; margin-bottom: 1.00em; } h1 { margin-top: 1.33em; margin-bottom: 1.33em; } h1 { margin-top: 1.67em; margin-bottom: 1.67em; } x x x x x h1 { margin-top: 2.33em; margin-bottom: 2.33em; } For each property in the table below, given a body element, the first attribute that exists maps to the pixel length property on the body element. If none of the attributes for a property are found, or if the value of the attribute that was found cannot be parsed successfully, then a default value of 8px is expected to be used for that property instead. Property Source 'margin-top' body element's marginheight attribute The body element's container frame element 's marginheight attribute body element's topmargin attribute 'margin-right' body element's marginwidth attribute The body element's container frame element 's marginwidth attribute body element's rightmargin attribute 'margin-bottom' body element's marginheight attribute The body element's container frame element 's marginheight attribute body element's topmargin attribute 'margin-left' body element's marginwidth attribute The body element's container frame element 's marginwidth attribute body element's rightmargin attribute If the body element's Document 's browsing context is a nested browsing context , and the browsing context container of that nested browsing context is a frame or iframe element, then the container frame element of the body element is that frame or iframe element. Otherwise, there is no container frame element . If the Document has a root element , and the Document 's browsing context is a nested browsing context , and the browsing context container of that nested browsing context is a frame or iframe element, and that element has a scrolling attribute, then the user agent is expected to compare the value of the attribute in an ASCII case-insensitive manner to the values in the first column of the following table, and if one of them matches, then the user agent is expected to treat that attribute as a presentational hint for the aforementioned root element's 'overflow' property, setting it to the value given in the corresponding cell on the same row in the second column: Attribute value 'overflow' value on 'scroll' scroll 'scroll' yes 'scroll' off 'hidden' noscroll 'hidden' no 'hidden' auto 'auto' The table element's cellspacing attribute maps to the pixel length property 'border-spacing' on the element. The table element's cellpadding attribute maps to the pixel length properties 'padding-top', 'padding-right', 'padding-bottom', and 'padding-left' of any td and th elements that have corresponding cells in the table corresponding to the table element. The table element's hspace attribute maps to the dimension properties 'margin-left' and 'margin-right' on the table element. The table element's vspace attribute maps to the dimension properties 'margin-top' and 'margin-bottom' on the table element. The table element's height attribute maps to the dimension property 'height' on the table element. The table element's width attribute maps to the dimension property 'width' on the table element. The col element's width attribute maps to the dimension property 'width' on the col element. The tr element's height attribute maps to the dimension property 'height' on the tr element. The td and th elements' height attributes map to the dimension property 'height' on the element. The td and th elements' width attributes map to the dimension property 'width' on the element. In quirks mode , the following rules are also expected to apply: @namespace url(http://www.w3.org/1999/xhtml); form { margin-bottom: 1em; } When a Document is in quirks mode , margins on HTML elements at the top or bottom of the initial containing block, or the top of bottom of td or th elements, are expected to be collapsed to zero. 11.2.4 Alignment @namespace url(http://www.w3.org/1999/xhtml); thead, tbody, tfoot, table > tr { vertical-align: middle; } tr, td, th { vertical-align: inherit; } sub { vertical-align: sub; } sup { vertical-align: super; } th { text-align: center; } The following rules are also expected to apply, as presentational hints : @namespace url(http://www.w3.org/1999/xhtml); table[align=left] { float: left; } table[align=right] { float: right; } table[align=center], table[align=abscenter], table[align=absmiddle], table[align=middle] { margin-left: auto; margin-right: auto; } caption[align=bottom] { caption-side: bottom; } p[align=left], h1[align=left], h2[align=left], h3[align=left], h4[align=left], h5[align=left], h6[align=left] { text-align: left; } p[align=right], h1[align=right], h2[align=right], h3[align=right], h4[align=right], h5[align=right], h6[align=right] { text-align: right; } p[align=center], h1[align=center], h2[align=center], h3[align=center], h4[align=center], h5[align=center], h6[align=center] { text-align: center; } p[align=justify], h1[align=justify], h2[align=justify], h3[align=justify], h4[align=justify], h5[align=justify], h6[align=justify] { text-align: justify; } col[valign=top], thead[valign=top], tbody[valign=top], tfoot[valign=top], tr[valign=top], td[valign=top], th[valign=top] { vertical-align: top; } col[valign=middle], thead[valign=middle], tbody[valign=middle], tfoot[valign=middle], tr[valign=middle], td[valign=middle], th[valign=middle] { vertical-align: middle; } col[valign=bottom], thead[valign=bottom], tbody[valign=bottom], tfoot[valign=bottom], tr[valign=bottom], td[valign=bottom], th[valign=bottom] { vertical-align: bottom; } col[valign=baseline], thead[valign=baseline], tbody[valign=baseline], tfoot[valign=baseline], tr[valign=baseline], td[valign=baseline], th[valign=baseline] { vertical-align: baseline; } The center element, the caption element unless specified otherwise below, and the div element when its align attribute's value is an ASCII case-insensitive match for the string " center ", are expected to center text within themselves, as if they had their 'text-align' property set to 'center' in a presentational hint , and to align descendants to the center. The div , caption , thead , tbody , tfoot , tr , td , and th elements, when they have an align attribute whose value is an ASCII case-insensitive match for the string " left ", are expected to left-align text within themselves, as if they had their 'text-align' property set to 'left' in a presentational hint , and to align descendants to the left. The div , caption , thead , tbody , tfoot , tr , td , and th elements, when they have an align attribute whose value is an ASCII case-insensitive match for the string " right ", are expected to right-align text within themselves, as if they had their 'text-align' property set to 'right' in a presentational hint , and to align descendants to the right. The div , caption , thead , tbody , tfoot , tr , td , and th elements, when they have an align attribute whose value is an ASCII case-insensitive match for the string " justify ", are expected to full-justify text within themselves, as if they had their 'text-align' property set to 'justify' in a presentational hint , and to align descendants to the left. When a user agent is to align descendants of a node, the user agent is expected to align only those descendants that have both their 'margin-left' and 'margin-right' properties computing to a value other than 'auto', that are over-constrained and that have one of those two margins with a used value forced to a greater value, and that do not themselves have an applicable align attribute. 11.2.5 Fonts and colors @namespace url(http://www.w3.org/1999/xhtml); address, cite, dfn, em, i, var { font-style: italic; } b, strong, th { font-weight: bold; } code, kbd, listing, plaintext, pre, samp, tt, xmp { font-family: monospace; } h1 { font-size: 2.00em; font-weight: bold; } h2 { font-size: 1.50em; font-weight: bold; } h3 { font-size: 1.17em; font-weight: bold; } h4 { font-size: 1.00em; font-weight: bold; } h5 { font-size: 0.83em; font-weight: bold; } h6 { font-size: 0.67em; font-weight: bold; } big { font-size: larger; } small, sub, sup { font-size: smaller; } sub, sup { line-height: normal; } :link { color: blue; } :visited { color: purple; } mark { background: yellow; color: black; } table, td, th { border-color: gray; } thead, tbody, tfoot, tr { border-color: inherit; } table[rules=none], table[rules=groups], table[rules=rows], table[rules=cols], table[rules=all], table[frames=void], table[frames=above], table[frames=below], table[frames=hsides], table[frames=lhs], table[frames=rhs], table[frames=vsides], table[frames=box], table[frames=border], table[rules=none] > tr > td, table[rules=none] > tr > th, table[rules=groups] > tr > td, table[rules=groups] > tr > th, table[rules=rows] > tr > td, table[rules=rows] > tr > th, table[rules=cols] > tr > td, table[rules=cols] > tr > th, table[rules=all] > tr > td, table[rules=all] > tr > th, table[frames=void] > tr > td, table[frames=void] > tr > th, table[frames=above] > tr > td, table[frames=above] > tr > th, table[frames=below] td, table[frames=below] > tr > th, table[frames=hsides] > tr > td, table[frames=hsides] > tr > th, table[frames=lhs] > tr > td, table[frames=lhs] > tr > th, table[frames=rhs] > tr > td, table[frames=rhs] > tr > th, table[frames=vsides] > tr > td, table[frames=vsides] > tr > th, table[frames=box] > tr > td, table[frames=box] > tr > th, table[frames=border] > tr > td, table[frames=border] > tr > th, table[rules=none] > thead > tr > td, table[rules=none] > thead > tr > th, table[rules=groups] > thead > tr > td, table[rules=groups] > thead > tr > th, table[rules=rows] > thead > tr > td, table[rules=rows] > thead > tr > th, table[rules=cols] > thead > tr > td, table[rules=cols] > thead > tr > th, table[rules=all] > thead > tr > td, table[rules=all] > thead > tr > th, table[frames=void] > thead > tr > td, table[frames=void] > thead > tr > th, table[frames=above] > thead > tr > td, table[frames=above] > thead > tr > th, table[frames=below] td, table[frames=below] > thead > tr > th, table[frames=hsides] > thead > tr > td, table[frames=hsides] > thead > tr > th, table[frames=lhs] > thead > tr > td, table[frames=lhs] > thead > tr > th, table[frames=rhs] > thead > tr > td, table[frames=rhs] > thead > tr > th, table[frames=vsides] > thead > tr > td, table[frames=vsides] > thead > tr > th, table[frames=box] > thead > tr > td, table[frames=box] > thead > tr > th, table[frames=border] > thead > tr > td, table[frames=border] > thead > tr > th, table[rules=none] > tbody > tr > td, table[rules=none] > tbody > tr > th, table[rules=groups] > tbody > tr > td, table[rules=groups] > tbody > tr > th, table[rules=rows] > tbody > tr > td, table[rules=rows] > tbody > tr > th, table[rules=cols] > tbody > tr > td, table[rules=cols] > tbody > tr > th, table[rules=all] > tbody > tr > td, table[rules=all] > tbody > tr > th, table[frames=void] > tbody > tr > td, table[frames=void] > tbody > tr > th, table[frames=above] > tbody > tr > td, table[frames=above] > tbody > tr > th, table[frames=below] td, table[frames=below] > tbody > tr > th, table[frames=hsides] > tbody > tr > td, table[frames=hsides] > tbody > tr > th, table[frames=lhs] > tbody > tr > td, table[frames=lhs] > tbody > tr > th, table[frames=rhs] > tbody > tr > td, table[frames=rhs] > tbody > tr > th, table[frames=vsides] > tbody > tr > td, table[frames=vsides] > tbody > tr > th, table[frames=box] > tbody > tr > td, table[frames=box] > tbody > tr > th, table[frames=border] > tbody > tr > td, table[frames=border] > tbody > tr > th, table[rules=none] > tfoot > tr > td, table[rules=none] > tfoot > tr > th, table[rules=groups] > tfoot > tr > td, table[rules=groups] > tfoot > tr > th, table[rules=rows] > tfoot > tr > td, table[rules=rows] > tfoot > tr > th, table[rules=cols] > tfoot > tr > td, table[rules=cols] > tfoot > tr > th, table[rules=all] > tfoot > tr > td, table[rules=all] > tfoot > tr > th, table[frames=void] > tfoot > tr > td, table[frames=void] > tfoot > tr > th, table[frames=above] > tfoot > tr > td, table[frames=above] > tfoot > tr > th, table[frames=below] td, table[frames=below] > tfoot > tr > th, table[frames=hsides] > tfoot > tr > td, table[frames=hsides] > tfoot > tr > th, table[frames=lhs] > tfoot > tr > td, table[frames=lhs] > tfoot > tr > th, table[frames=rhs] > tfoot > tr > td, table[frames=rhs] > tfoot > tr > th, table[frames=vsides] > tfoot > tr > td, table[frames=vsides] > tfoot > tr > th, table[frames=box] > tfoot > tr > td, table[frames=box] > tfoot > tr > th, table[frames=border] > tfoot > tr > td, table[frames=border] > tfoot > tr > th { border-color: black; } The initial value for the 'color' property is expected to be black. The initial value for the 'background-color' property is expected to be 'transparent'. The canvas's background is expected to be white. The article , aside , nav , and section elements are expected to affect the font size of h1 elements, based on the nesting depth. If x is a selector that matches elements that are either article , aside , nav , or section elements, then the following rules capture what is expected: @namespace url(http://www.w3.org/1999/xhtml); h1 { font-size: 1.50em; } h1 { font-size: 1.17em; } h1 { font-size: 1.00em; } h1 { font-size: 0.83em; } x x x x x h1 { font-size: 0.67em; } When a body , table , thead , tbody , tfoot , tr , td , or th element has a background attribute set to a non-empty value, the new value is expected to be resolved relative to the element, and if this is successful, the user agent is expected to treat the attribute as a presentational hint setting the element's 'background-image' property to the resulting absolute URL . When a body , table , thead , tbody , tfoot , tr , td , or th element has a bgcolor attribute set, the new value is expected to be parsed using the rules for parsing a legacy color value , and if that does not return an error, the user agent is expected to treat the attribute as a presentational hint setting the element's 'background-color' property to the resulting color. When a body element has a text attribute, its value is expected to be parsed using the rules for parsing a legacy color value , and if that does not return an error, the user agent is expected to treat the attribute as a presentational hint setting the element's 'color' property to the resulting color. When a body element has a link attribute, its value is expected to be parsed using the rules for parsing a legacy color value , and if that does not return an error, the user agent is expected to treat the attribute as a presentational hint setting the 'color' property of any element in the Document matching the ':link' pseudo-class to the resulting color. When a body element has a vlink attribute, its value is expected to be parsed using the rules for parsing a legacy color value , and if that does not return an error, the user agent is expected to treat the attribute as a presentational hint setting the 'color' property of any element in the Document matching the ':visited' pseudo-class to the resulting color. When a body element has a alink attribute, its value is expected to be parsed using the rules for parsing a legacy color value , and if that does not return an error, the user agent is expected to treat the attribute as a presentational hint setting the 'color' property of any element in the Document matching the ':active' pseudo-class and either the ':link' pseudo-class or the ':visited' pseudo-class to the resulting color. When a table element has a bordercolor attribute, its value is expected to be parsed using the rules for parsing a legacy color value , and if that does not return an error, the user agent is expected to treat the attribute as a presentational hint setting the element's 'border-top-color', 'border-right-color', 'border-bottom-color', and 'border-right-color' properties to the resulting color. When a font element has a color attribute, its value is expected to be parsed using the rules for parsing a legacy color value , and if that does not return an error, the user agent is expected to treat the attribute as a presentational hint setting the element's 'color' property to the resulting color. When a font element has a face attribute, the user agent is expected to treat the attribute as a presentational hint setting the element's 'font-family' property to the attribute's value. When a font element has a pointsize attribute, the user agent is expected to parse that attribute's value using the rules for parsing non-negative integers , and if this doesn't generate an error, then the user agent is expected to use the parsed value as a point length for a presentational hint for the 'font-size' property on the element. When a font element has a size attribute, the user agent is expected to use the following steps to treat the attribute as a presentational hint setting the element's 'font-size' property: Let input be the attribute's value. Let position be a pointer into input , initially pointing at the start of the string. Skip whitespace . If position is past the end of input , there is no presentational hint . Abort these steps. If the character at position is a U+002B PLUS SIGN character (+), then let mode be relative-plus , and advance position to the next character. Otherwise, if the character at position is a U+002D HYPHEN-MINUS character (-), then let mode be relative-minus , and advance position to the next character. Otherwise, let mode be absolute . Collect a sequence of characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), and let the resulting sequence be digits . If digits is the empty string, there is no presentational hint . Abort these steps. Interpret digits as a base-ten integer. Let value be the resulting number. If mode is relative-plus , then increment value by 3. If mode is relative-minus , then let value be the result of subtracting value from 3. If value is greater than 7, let it be 7. If value is less than 1, let it be 1. Set 'font-size' to the keyword corresponding to the value of value according to the following table: value 'font-size' keyword Notes 1 xx-small 2 small 3 medium 4 large 5 x-large 6 xx-large 7 xxx-large see below The 'xxx-large' value is a non-CSS value used here to indicate a font size one "step" larger than 'xx-large'. 11.2.6 Punctuation and decorations @namespace url(http://www.w3.org/1999/xhtml); :link, :visited, ins, u { text-decoration: underline; } abbr[title], acronym[title] { text-decoration: dotted underline; } del, s, strike { text-decoration: line-through; } blink { text-decoration: blink; } q:before { content: open-quote; } q:after { content: close-quote; } nobr { white-space: nowrap; } listing, plaintext, pre, xmp { white-space: pre; } ol { list-style-type: decimal; } dir, menu, ul { list-style-type: disc; } dir dl, dir menu, dir ul, menu dl, menu menu, menu ul, ol dl, ol menu, ol ul, ul dl, ul menu, ul ul { list-style-type: circle; } dir dir dl, dir dir menu, dir dir ul, dir menu dl, dir menu menu, dir menu ul, dir ol dl, dir ol menu, dir ol ul, dir ul dl, dir ul menu, dir ul ul, menu dir dl, menu dir menu, menu dir ul, menu menu dl, menu menu menu, menu menu ul, menu ol dl, menu ol menu, menu ol ul, menu ul dl, menu ul menu, menu ul ul, ol dir dl, ol dir menu, ol dir ul, ol menu dl, ol menu menu, ol menu ul, ol ol dl, ol ol menu, ol ol ul, ol ul dl, ol ul menu, ol ul ul, ul dir dl, ul dir menu, ul dir ul, ul menu dl, ul menu menu, ul menu ul, ul ol dl, ul ol menu, ul ol ul, ul ul dl, ul ul menu, ul ul ul { list-style-type: square; } table { border-style: outset; } td, th { border-style: inset; } [dir=ltr] { direction: ltr; unicode-bidi: embed; } [dir=rtl] { direction: rtl; unicode-bidi: embed; } bdo[dir=ltr], bdo[dir=rtl] { unicode-bidi: bidi-override; } In addition, rules setting the 'quotes' property appropriately for the locales and languages understood by the user are expected to be present. The following rules are also expected to apply, as presentational hints : @namespace url(http://www.w3.org/1999/xhtml); td[nowrap], th[nowrap] { white-space: nowrap; } pre[wrap] { white-space: pre-wrap; } br[clear=left] { clear: left; } br[clear=right] { clear: right; } br[clear=all], br[clear=both] { clear: both; } ol[type=1], li[type=1] { list-style-type: decimal; } ol[type=a], li[type=a] { list-style-type: lower-alpha; } ol[type=A], li[type=A] { list-style-type: upper-alpha; } ol[type=i], li[type=i] { list-style-type: lower-roman; } ol[type=I], li[type=I] { list-style-type: upper-roman; } ul[type=disc], li[type=disc] { list-style-type: disc; } ul[type=circle], li[type=circle] { list-style-type: circle; } ul[type=square], li[type=square] { list-style-type: square; } table[rules=none], table[rules=groups], table[rules=rows], table[rules=cols], table[rules=all] { border-style: none; border-collapse: collapse; } table[frames=void] { border-style: hidden hidden hidden hidden; } table[frames=above] { border-style: solid hidden hidden hidden; } table[frames=below] { border-style: hidden hidden solid hidden; } table[frames=hsides] { border-style: solid hidden solid hidden; } table[frames=lhs] { border-style: hidden hidden hidden solid; } table[frames=rhs] { border-style: hidden solid hidden hidden; } table[frames=vsides] { border-style: hidden solid hidden solid; } table[frames=box], table[frames=border] { border-style: solid solid solid solid; } table[frames=void] > tr > td, table[frames=void] > tr > th, table[frames=above] > tr > td, table[frames=above] > tr > th, table[frames=below] > tr > td, table[frames=below] > tr > th, table[frames=hsides] > tr > td, table[frames=hsides] > tr > th, table[frames=lhs] > tr > td, table[frames=lhs] > tr > th, table[frames=rhs] > tr > td, table[frames=rhs] > tr > th, table[frames=vsides] > tr > td, table[frames=vsides] > tr > th, table[frames=box] > tr > td, table[frames=box] > tr > th, table[frames=border] > tr > td, table[frames=border] > tr > th, table[frames=void] > thead > tr > td, table[frames=void] > thead > tr > th, table[frames=above] > thead > tr > td, table[frames=above] > thead > tr > th, table[frames=below] > thead > tr > td, table[frames=below] > thead > tr > th, table[frames=hsides] > thead > tr > td, table[frames=hsides] > thead > tr > th, table[frames=lhs] > thead > tr > td, table[frames=lhs] > thead > tr > th, table[frames=rhs] > thead > tr > td, table[frames=rhs] > thead > tr > th, table[frames=vsides] > thead > tr > td, table[frames=vsides] > thead > tr > th, table[frames=box] > thead > tr > td, table[frames=box] > thead > tr > th, table[frames=border] > thead > tr > td, table[frames=border] > thead > tr > th, table[frames=void] > tbody > tr > td, table[frames=void] > tbody > tr > th, table[frames=above] > tbody > tr > td, table[frames=above] > tbody > tr > th, table[frames=below] > tbody > tr > td, table[frames=below] > tbody > tr > th, table[frames=hsides] > tbody > tr > td, table[frames=hsides] > tbody > tr > th, table[frames=lhs] > tbody > tr > td, table[frames=lhs] > tbody > tr > th, table[frames=rhs] > tbody > tr > td, table[frames=rhs] > tbody > tr > th, table[frames=vsides] > tbody > tr > td, table[frames=vsides] > tbody > tr > th, table[frames=box] > tbody > tr > td, table[frames=box] > tbody > tr > th, table[frames=border] > tbody > tr > td, table[frames=border] > tbody > tr > th, table[frames=void] > tfoot > tr > td, table[frames=void] > tfoot > tr > th, table[frames=above] > tfoot > tr > td, table[frames=above] > tfoot > tr > th, table[frames=below] > tfoot > tr > td, table[frames=below] > tfoot > tr > th, table[frames=hsides] > tfoot > tr > td, table[frames=hsides] > tfoot > tr > th, table[frames=lhs] > tfoot > tr > td, table[frames=lhs] > tfoot > tr > th, table[frames=rhs] > tfoot > tr > td, table[frames=rhs] > tfoot > tr > th, table[frames=vsides] > tfoot > tr > td, table[frames=vsides] > tfoot > tr > th, table[frames=box] > tfoot > tr > td, table[frames=box] > tfoot > tr > th, table[frames=border] > tfoot > tr > td, table[frames=border] > tfoot > tr > th { border-style: solid; } table[rules=none] > tr > td, table[rules=none] > tr > th, table[rules=none] > thead > tr > td, table[rules=none] > thead > tr > th, table[rules=none] > tbody > tr > td, table[rules=none] > tbody > tr > th, table[rules=none] > tfoot > tr > td, table[rules=none] > tfoot > tr > th, table[rules=groups] > tr > td, table[rules=groups] > tr > th, table[rules=groups] > thead > tr > td, table[rules=groups] > thead > tr > th, table[rules=groups] > tbody > tr > td, table[rules=groups] > tbody > tr > th, table[rules=groups] > tfoot > tr > td, table[rules=groups] > tfoot > tr > th, table[rules=rows] > tr > td, table[rules=rows] > tr > th, table[rules=rows] > thead > tr > td, table[rules=rows] > thead > tr > th, table[rules=rows] > tbody > tr > td, table[rules=rows] > tbody > tr > th, table[rules=rows] > tfoot > tr > td, table[rules=rows] > tfoot > tr > th { border-style: none; } table[rules=groups] > colgroup, table[rules=groups] > thead, table[rules=groups] > tbody, table[rules=groups] > tfoot { border-style: solid; } table[rules=rows] > tr, table[rules=rows] > thead > tr, table[rules=rows] > tbody > tr, table[rules=rows] > tfoot > tr { border-style: solid; } table[rules=cols] > tr > td, table[rules=cols] > tr > th, table[rules=cols] > thead > tr > td, table[rules=cols] > thead > tr > th, table[rules=cols] > tbody > tr > td, table[rules=cols] > tbody > tr > th, table[rules=cols] > tfoot > tr > td, table[rules=cols] > tfoot > tr > th { border-style: none solid none solid; } table[rules=all] > tr > td, table[rules=all] > tr > th, table[rules=all] > thead > tr > td, table[rules=all] > thead > tr > th, table[rules=all] > tbody > tr > td, table[rules=all] > tbody > tr > th, table[rules=all] > tfoot > tr > td, table[rules=all] > tfoot > tr > th { border-style: solid; } When rendering li elements, user agents are expected to use the ordinal value of the li element to render the counter in the list item marker. The table element's border attribute maps to the pixel length properties 'border-top-width', 'border-right-width', 'border-bottom-width', 'border-left-width' on the element. If the attribute is present but its value cannot be parsed successfully, a default value of 1px is expected to be used for that property instead. 11.2.7 Resetting rules for inherited properties The following rules are also expected to be in play, resetting certain properties to block inheritance by default. @namespace url(http://www.w3.org/1999/xhtml); table, input, select, option, optgroup, button, textarea, keygen { text-indent: initial; } In quirks mode , the following rules are also expected to apply: @namespace url(http://www.w3.org/1999/xhtml); table { font-weight: initial; font-style: initial; font-variant: initial; font-size: initial; line-height: initial; white-space: initial; text-align: initial; } input { box-sizing: border-box; } 11.2.8 The hr element @namespace url(http://www.w3.org/1999/xhtml); hr { color: gray; border-style: inset; border-width: 1px; } The following rules are also expected to apply, as presentational hints : @namespace url(http://www.w3.org/1999/xhtml); hr[align=left] { margin-left: 0; margin-right: auto; } hr[align=right] { margin-left: auto; margin-right: 0; } hr[align=center] { margin-left: auto; margin-right: auto; } hr[color], hr[noshade] { border-style: solid; } If an hr element has either a color attribute or a noshade attribute, and furthermore also has a size attribute, and parsing that attribute's value using the rules for parsing non-negative integers doesn't generate an error, then the user agent is expected to use the parsed value divided by two as a pixel length for presentational hints for the properties 'border-top-width', 'border-right-width', 'border-bottom-width', and 'border-left-width' on the element. Otherwise, if an hr element has neither a color attribute nor a noshade attribute, but does have a size attribute, and parsing that attribute's value using the rules for parsing non-negative integers doesn't generate an error, then: if the parsed value is one, then the user agent is expected to use the attribute as a presentational hint setting the element's 'border-bottom-width' to 0; otherwise, if the parsed value is greater than one, then the user agent is expected to use the parsed value minus two as a pixel length for presentational hints for the 'height' property on the element. The width attribute on an hr element maps to the dimension property 'width' on the element. When an hr element has a color attribute, its value is expected to be parsed using the rules for parsing a legacy color value , and if that does not return an error, the user agent is expected to treat the attribute as a presentational hint setting the element's 'color' property to the resulting color. 11.2.9 The fieldset element @namespace url(http://www.w3.org/1999/xhtml); fieldset { margin-left: 2px; margin-right: 2px; border: groove 2px ThreeDFace; padding: 0.35em 0.625em 0.75em; } The fieldset element is expected to establish a new block formatting context. The first legend element child of a fieldset element, if any, is expected to be rendered over the top border edge of the fieldset element. If the legend element in question has an align attribute, and its value is an ASCII case-insensitive match for one of the strings in the first column of the following table, then the legend is expected to be rendered horizontally aligned over the border edge in the position given in the corresponding cell on the same row in the second column. If the attribute is absent or has a value that doesn't match any of the cases in the table, then the position is expected to be on the right if the 'direction' property on this element has a computed value of 'rtl', and on the left otherwise. Attribute value Alignment position left On the left right On the right center In the middle 11.3 Replaced elements 11.3.1 Embedded content The applet , canvas , embed , iframe , and video elements are expected to be treated as replaced elements. An object element that represents an image, plugin, or nested browsing context is expected to be treated as a replaced element. Other object elements are expected to be treated as ordinary elements in the rendering model. The audio element, when it has a controls attribute, is expected to be treated as a replaced element about one line high, as wide as is necessary to expose the user agent's user interface features. The video element's controls attribute is not expected to affect the size of the rendering; controls are expected to be overlaid with the page content without causing any layout changes, and are expected to disappear when the user does not need them. When a video element represents its poster frame, the poster frame is expected to be rendered at the largest size that maintains the poster frame's aspect ratio without being taller or wider than the video element itself, and is expected to be centered in the video element. Resizing video and canvas elements does not interrupt video playback or clear the canvas. The following CSS rules are expected to apply: @namespace url(http://www.w3.org/1999/xhtml); iframe { border: 2px inset; } 11.3.2 Images When an img element or an input element when its type attribute is in the Image Button state represents an image, it is expected to be treated as a replaced element. When an img element or an input element when its type attribute is in the Image Button state does not represent an image, but the element already has intrinsic dimensions (e.g. from the dimension attributes or CSS rules), and either the user agent has reason to believe that the image will become available and be rendered in due course or the Document is in quirks mode , the element is expected to be treated as a replaced element whose content is the text that the element represents, if any, optionally alongside an icon indicating that the image is being obtained. For input elements, the text is expected to appear button-like to indicate that the element is a button . When an img element represents some text and the user agent does not expect this to change, the element is expected to be treated as an inline element whose content is the text, optionally with an icon indicating that an image is missing. When an img element represents nothing and the user agent does not expect this to change, the element is expected to not be rendered at all. When an img element might be a key part of the content, but neither the image nor any kind of alternative text is available, and the user agent does not expect this to change, the element is expected to be treated as an inline element whose content is an icon indicating that an image is missing. When an input element whose type attribute is in the Image Button state does not represent an image and the user agent does not expect this to change, the element is expected to be treated as a replaced element consisting of a button whose content is the element's alternative text. The intrinsic dimensions of the button are expected to be about one line in height and whatever width is necessary to render the text on one line. The icons mentioned above are expected to be relatively small so as not to disrupt most text but be easily clickable. In a visual environment, for instance, icons could be 16 pixels by 16 pixels square, or 1em by 1em if the images are scalable. In an audio environment, the icon could be a short bleep. The icons are intended to indicate to the user that they can be used to get to whatever options the UA provides for images, and, where appropriate, are expected to provide access to the context menu that would have come up if the user interacted with the actual image. The following CSS rules are expected to apply when the Document is in quirks mode : @namespace url(http://www.w3.org/1999/xhtml); img[align=left] { margin-right: 3px; } img[align=right] { margin-left: 3px; } 11.3.3 Attributes for embedded content and images The following CSS rules are expected to apply as presentational hints : @namespace url(http://www.w3.org/1999/xhtml); iframe[frameborder=0], iframe[frameborder=no] { border: none; } applet[align=left], embed[align=left], iframe[align=left], img[align=left], input[type=image][align=left], object[align=left] { float: left; } applet[align=right], embed[align=right], iframe[align=right], img[align=right], input[type=image][align=right], object[align=right] { float: right; } applet[align=top], embed[align=top], iframe[align=top], img[align=top], input[type=image][align=top], object[align=top] { vertical-align: top; } applet[align=bottom], embed[align=bottom], iframe[align=bottom], img[align=bottom], input[type=image][align=bottom], object[align=bottom], applet[align=baseline], embed[align=baseline], iframe[align=baseline], img[align=baseline], input[type=image][align=baseline], object[align=baseline] { vertical-align: baseline; } applet[align=texttop], embed[align=texttop], iframe[align=texttop], img[align=texttop], input[type=image][align=texttop], object[align=texttop] { vertical-align: text-top; } applet[align=absmiddle], embed[align=absmiddle], iframe[align=absmiddle], img[align=absmiddle], input[type=image][align=absmiddle], object[align=absmiddle], applet[align=abscenter], embed[align=abscenter], iframe[align=abscenter], img[align=abscenter], input[type=image][align=abscenter], object[align=abscenter] { vertical-align: middle; } applet[align=bottom], embed[align=bottom], iframe[align=bottom], img[align=bottom], input[type=image][align=bottom], object[align=bottom] { vertical-align: bottom; } When an applet , embed , iframe , img , or object element, or an input element whose type attribute is in the Image Button state, has an align attribute whose value is an ASCII case-insensitive match for the string " center " or the string " middle ", the user agent is expected to act as if the element's 'vertical-align' property was set to a value that aligns the vertical middle of the element with the parent element's baseline. The hspace attribute of applet , embed , iframe , img , or object elements, and input elements with a type attribute in the Image Button state, maps to the dimension properties 'margin-left' and 'margin-right' on the element. The vspace attribute of applet , embed , iframe , img , or object elements, and input elements with a type attribute in the Image Button state, maps to the dimension properties 'margin-top' and 'margin-bottom' on the element. When an img element, object element, or input element with a type attribute in the Image Button state is contained within a hyperlink and has a border attribute whose value, when parsed using the rules for parsing non-negative integers , is found to be a number greater than zero, the user agent is expected to use the parsed value for eight presentational hints : four setting the parsed value as a pixel length for the element's 'border-top-width', 'border-right-width', 'border-bottom-width', and 'border-left-width' properties, and four setting the element's 'border-top-style', 'border-right-style', 'border-bottom-style', and 'border-left-style' properties to the value 'solid'. The width and height attributes on applet , embed , iframe , img , object or video elements, and input elements with a type attribute in the Image Button state, map to the dimension properties 'width' and 'height' on the element respectively. 11.3.4 Image maps Shapes on an image map are expected to act, for the purpose of the CSS cascade, as elements independent of the original area element that happen to match the same style rules but inherit from the img or object element. For the purposes of the rendering, only the 'cursor' property is expected to have any effect on the shape. Thus, for example, if an area element has a style attribute that sets the 'cursor' property to 'help', then when the user designates that shape, the cursor would change to a Help cursor. Similarly, if an area element had a CSS rule that set its 'cursor' property to 'inherit' (or if no rule setting the 'cursor' property matched the element at all), the shape's cursor would be inherited from the img or object element of the image map , not from the parent of the area element. 11.3.5 Tool bars When a menu element's type attribute is in the tool bar state, the element is expected to be treated as a replaced element with a height about two lines high and a width derived from the contents of the element. The element is expected to have, by default, the appearance of a tool bar on the user agent's platform. It is expected to contain the menu that is built from the element. ...example with screenshot... 11.4 Bindings 11.4.1 Introduction A number of elements have their rendering defined in terms of the 'binding' property. [BECSS] The CSS snippets below set the 'binding' property to a user-agent-defined value, represented below by keywords like bb . The rules then described for these bindings are only expected to apply if the element's 'binding' property has not been overridden (e.g. by the author) to have another value. Exactly how the bindings are implemented is not specified by this specification. User agents are encouraged to make their bindings set the 'appearance' CSS property appropriately to achieve platform-native appearances for widgets, and are expected to implement any relevant animations, etc, that are appropriate for the platform. [CSSUI] The converting a character width to pixels algorithm, used by some of the bindings below, returns ( size-1 )× avg  +  max , where size is the character width to convert, avg is the average character width of the primary font for the element for which the algorithm is being run, in pixels, and max is the maximum character width of that same font, also in pixels. (The element's 'letter-spacing' property does not affect the result.) 11.4.2 The bb element @namespace url(http://www.w3.org/1999/xhtml); bb:empty { binding: bb ; } When the bb binding applies to a bb element, the element is expected to render as an 'inline-block' box rendered as a button, about one line high, containing text derived from the element's type attribute in a user-agent-defined (and probably locale-specific) fashion. 11.4.3 The button element @namespace url(http://www.w3.org/1999/xhtml); button { binding: button ; } When the button binding applies to a button element, the element is expected to render as an 'inline-block' box rendered as a button whose contents are the contents of the element. 11.4.4 The datagrid element This section will probably include details on how to render DATAGRID (including its pseudo-elements ), drag-and-drop, etc, in a visual medium, in concert with CSS. Implementation experience is desired before this section is filled in. 11.4.5 The details element @namespace url(http://www.w3.org/1999/xhtml); details { binding: details ; } When the details binding applies to a details element, the element is expected to render as a 'block' box with its 'padding-left' property set to '40px'. The element's shadow tree is expected to take a child element that matches the selector :bound-element > legend:first-child and place it in a first 'block' box container, and then take the remaining child nodes and place them in a later 'block' box container. The first container is expected to contain at least one line box, and that line box is expected to contain a triangle widget, horizontally positioned within the left padding of the details element. That widget is expected to allow the user to request that the details be shown or hidden. The later container is expected to have its 'overflow' property set to 'hidden'. When the details element has an open attribute, the later container is expected to have its 'height' set to 'auto'; when it does not, the later container is expected to have its 'height' set to 0. 11.4.6 The input element as a text entry widget @namespace url(http://www.w3.org/1999/xhtml); input { binding: ; } input[type=password] { binding: ; } /* later rules override this for other values of type="" */ When the input-textfield binding applies to an input element whose type attribute is in the Text , Search , Telephone , URL , or E-mail state, the element is expected to render as an 'inline-block' box rendered as a text field. When the input-password binding applies, to an input element whose type attribute is in the Password state, the element is expected to render as an 'inline-block' box rendered as a text field whose contents are obscured. If an input element whose type attribute is in one of the above states has a size attribute, and parsing that attribute's value using the rules for parsing non-negative integers doesn't generate an error, then the user agent is expected to use the attribute as a presentational hint for the 'width' property on the element, with the value obtained from applying the converting a character width to pixels algorithm to the value of the attribute. If an input element whose type attribute is in one of the above states does not have a size attribute, then the user agent is expected to act as if it had a user-agent-level style sheet rule setting the 'width' property on the element to the value obtained from applying the converting a character width to pixels algorithm to the number 20. 11.4.7 The input element as domain-specific widgets @namespace url(http://www.w3.org/1999/xhtml); input[type=datetime] { binding: ; } input[type=date] { binding: ; } input[type=month] { binding: ; } input[type=week] { binding: ; } input[type=time] { binding: ; } input[type=datetime-local] { binding: ; } input[type=number] { binding: input-number ; } When the input-datetime binding applies to an input element whose type attribute is in the Date and Time state, the element is expected to render as an 'inline-block' box depicting a Date and Time control. When the input-date binding applies to an input element whose type attribute is in the Date state, the element is expected to render as an 'inline-block' box depicting a Date control. When the input-month binding applies to an input element whose type attribute is in the Month state, the element is expected to render as an 'inline-block' box depicting a Month control. When the input-week binding applies to an input element whose type attribute is in the Week state, the element is expected to render as an 'inline-block' box depicting a Week control. When the input-time binding applies to an input element whose type attribute is in the Time state, the element is expected to render as an 'inline-block' box depicting a Time control. When the input-datetime-local binding applies to an input element whose type attribute is in the Local Date and Time state, the element is expected to render as an 'inline-block' box depicting a Local Date and Time control. When the input-number binding applies to an input element whose type attribute is in the Number state, the element is expected to render as an 'inline-block' box depicting a Number control. These controls are all expected to be about one line high, and about as wide as necessary to show the widest possible value. 11.4.8 The input element as a range control @namespace url(http://www.w3.org/1999/xhtml); input[type=range] { binding: input-range ; } When the input-range binding applies to an input element whose type attribute is in the Range state, the element is expected to render as an 'inline-block' box depicting a slider control. When the control is wider than it is tall (or square), the control is expected to be a horizontal slider, with the lowest value on the right if the 'direction' property on this element has a computed value of 'rtl', and on the left otherwise. When the control is taller than it is wide, it is expected to be a vertical slider, with the lowest value on the bottom. Predefined suggested values (provided by the list attribute) are expected to be shown as tick marks on the slider, which the slider can snap to. 11.4.9 The input element as a color well @namespace url(http://www.w3.org/1999/xhtml); input[type=color] { binding: input-color ; } When the input-color binding applies to an input element whose type attribute is in the Color state, the element is expected to render as an 'inline-block' box depicting a color well, which, when activated, provides the user with a color picker (e.g. a color wheel or color palette) from which the color can be changed. Predefined suggested values (provided by the list attribute) are expected to be shown in the color picker interface, not on the color well itself. 11.4.10 The input element as a check box and radio button widgets @namespace url(http://www.w3.org/1999/xhtml); input[type=checkbox] { binding: ; } input[type=radio] { binding: input-radio ; } When the input-checkbox binding applies to an input element whose type attribute is in the Checkbox state, the element is expected to render as an 'inline-block' box containing a single check box control, with no label. When the input-radio binding applies to an input element whose type attribute is in the Radio Button state, the element is expected to render as an 'inline-block' box containing a single radio button control, with no label. 11.4.11 The input element as a file upload control @namespace url(http://www.w3.org/1999/xhtml); input[type=file] { binding: input-file ; } When the input-file binding applies to an input element whose type attribute is in the File Upload state, the element is expected to render as an 'inline-block' box containing a span of text giving the filename(s) of the selected files , if any, followed by a button that, when activated, provides the user with a file picker from which the selection can be changed. 11.4.12 The input element as a button @namespace url(http://www.w3.org/1999/xhtml); input[type=submit], input[type=reset], input[type=button] { binding: ; } When the input-button binding applies to an input element whose type attribute is in the Submit Button , Reset Button , or Button state, the element is expected to render as an 'inline-block' box rendered as a button, about one line high, containing the contents of the element's value attribute, if any, or text derived from the element's type attribute in a user-agent-defined (and probably locale-specific) fashion, if not. 11.4.13 The marquee element ...(Waiting til I've specced the DOM side of this)... 11.4.14 The meter element @namespace url(http://www.w3.org/1999/xhtml); meter { binding: ; } When the meter binding applies to a meter element, the element is expected to render as an 'inline-block' box with a 'height' of '1em' and a 'width' of '5em', a 'vertical-align' of '-0.2em', and with its contents depicting a gauge. When the element is wider than it is tall (or square), the depiction is expected to be of a horizontal gauge, with the minimum value on the right if the 'direction' property on this element has a computed value of 'rtl', and on the left otherwise. When the element is taller than it is wide, it is expected to depict a vertical gauge, with the minimum value on the bottom. User agents are expected to use a presentation consistent with platform conventions for gauges, if any. Requirements for what must be depicted in the gauge are included in the definition of the meter element. 11.4.15 The progress element @namespace url(http://www.w3.org/1999/xhtml); progress { binding: ; } When the progress binding applies to a progress element, the element is expected to render as an 'inline-block' box with a 'height' of '1em' and a 'width' of '10em', a 'vertical-align' of '-0.2em', and with its contents depicting a horizontal progress bar, with the start on the right and the end on the left if the 'direction' property on this element has a computed value of 'rtl', and with the start on the left and the end on the right otherwise. User agents are expected to use a presentation consistent with platform conventions for progress bars. In particular, user agents are expected to use different presentations for determinate and indeterminate progress bars. User agents are also expected to vary the presentation based on the dimensions of the element. For example, on some platforms for showing indeterminate progress there is an asynchronous progress indicator with square dimensions, which could be used when the element is square, and an indeterminate progress bar, which could be used when the element is wide. Requirements for how to determine if the progress bar is determinate or indeterminate, and what progress a determinate progress bar is to show, are included in the definition of the progress element. 11.4.16 The select element @namespace url(http://www.w3.org/1999/xhtml); select { binding: ; } When the select binding applies to a select element whose multiple attribute is present, the element is expected to render as a multi-select list box. When the select binding applies to a select element whose multiple attribute is absent, and the element's size attribute specifies a value greater than 1, the element is expected to render as a single-select list box. When the element renders as a list box, it is expected to render as an 'inline-block' box whose 'height' is the height necessary to contain as many rows for items as specified by the element's size attribute, or four rows if the attribute is absent, and whose 'width' is the width of the select 's labels plus the width of a scrollbar. When the select binding applies to a select element whose multiple attribute is absent, and the element's size attribute is either absent or specifies either no value (an error), or a value less than or equal to 1, the element is expected to render as a one-line drop down box whose width is the width of the select 's labels . In either case (list box or drop-down box), the element's items are expected to be the element's list of options , with the element's optgroup element children providing headers for groups of options where applicable. The width of the select 's labels is the wider of the width necessary to render the widest optgroup , and the width necessary to render the widest option element in the element's list of options (including its indent, if any). An optgroup element is expected to be rendered by displaying the element's label attribute. An option element is expected to be rendered by displaying the element's label , indented under its optgroup element if it has one. 11.4.17 The textarea element @namespace url(http://www.w3.org/1999/xhtml); textarea { binding: textarea ; } When the textarea binding applies to a textarea element, the element is expected to render as an 'inline-block' box rendered as a multiline text field. If the element has a cols attribute, and parsing that attribute's value using the rules for parsing non-negative integers doesn't generate an error, then the user agent is expected to use the attribute as a presentational hint for the 'width' property on the element, with the value obtained from applying the converting a character width to pixels algorithm to the value of the attribute and then adding the width of a scroll bar. If the element has a rows attribute, and parsing that attribute's value using the rules for parsing non-negative integers doesn't generate an error, then the user agent is expected to use the attribute as a presentational hint for the 'height' property on the element, with the value being the specified number of lines, plus the height of a scrollbar. For historical reasons, if the element has a wrap attribute whose value is an ASCII case-insensitive match for the string " off ", then the user agent is expected to not wrap the rendered value; otherwise, the value of the control is expected to be wrapped to the width of the control. 11.4.18 The keygen element @namespace url(http://www.w3.org/1999/xhtml); keygen { binding: keygen ; } When the keygen binding applies to a keygen element, the element is expected to render as an 'inline-block' box containing a user interface to configure the key pair to be generated. 11.4.19 The time element @namespace url(http://www.w3.org/1999/xhtml); time:empty { binding: time ; } When the time binding applies to a time element, the element is expected to render as if it contained text conveying the date (if known), time (if known), and time zone (if known) represented by the element, in the fashion most convenient for the user. 11.5 Frames and framesets When an html element's second child element is a frameset element, the user agent is expected to render the frameset element as described below across the surface of the view , instead of applying the usual CSS rendering rules. When rendering a frameset on a surface, the user agent is expected to use the following layout algorithm: The cols and rows variables are lists of zero or more pairs consisting of a number and a unit, the unit being one of percentage , relative , and absolute . Use the rules for parsing a list of dimensions to parse the value of the element's cols attribute, if there is one. Let cols be the result, or an empty list if there is no such attribute. Use the rules for parsing a list of dimensions to parse the value of the element's rows attribute, if there is one. Let rows be the result, or an empty list if there is no such attribute. For any of the entries in cols or rows that have the number zero and the unit relative , change the entry's number to one. If cols has no entries, then add a single entry consisting of the value 1 and the unit relative to cols . If rows has no entries, then add a single entry consisting of the value 1 and the unit relative to rows . Invoke the algorithm defined below to convert a list of dimensions to a list of pixel values using cols as the input list, and the width of the surface that the frameset is being rendered into, in CSS pixels, as the input dimension. Let sized cols be the resulting list. Invoke the algorithm defined below to convert a list of dimensions to a list of pixel values using rows as the input list, and the height of the surface that the frameset is being rendered into, in CSS pixels, as the input dimension. Let sized rows be the resulting list. Split the surface into a grid of w × h rectangles, where w is the number of entries in sized cols and h is the number of entries in sized rows . Size the columns so that each column in the grid is as many CSS pixels wide as the corresponding entry in the sized cols list. Size the rows so that each row in the grid is as many CSS pixels high as the corresponding entry in the sized rows list. Let children be the list of frame and frameset elements that are children of the frameset element for which the algorithm was invoked. For each row of the grid of rectangles created in the previous step, from top to bottom, run these substeps: For each rectangle in the row, from left to right, run these substeps: If there are any elements left in children , take the first element in the list, and assign it to the rectangle. If this is a frameset element, then recurse the entire frameset layout algorithm for that frameset element, with the rectangle as the surface. Otherwise, it is a frame element; create a nested browsing context sized to fit the rectangle. If there are any elements left in children , remove the first element from children . If the frameset element has a border , draw an outer set of borders around the rectangles, using the element's frame border color . For each rectangle, if there is an element assigned to that rectangle, and that element has a border , draw an inner set of borders around that rectangle, using the element's frame border color . For each (visible) border that does not abut a rectangle that is assigned a frame element with a noresize attribute (including rectangles in further nested frameset elements), the user agent is expected to allow the user to move the border, resizing the rectangles within, keeping the proportions of any nested frameset grids. A frameset or frame element has a border if the following algorithm returns true: If the element has a frameborder attribute whose value is not the empty string and whose first character is either a U+0031 DIGIT ONE (1), a U+0079 LATIN SMALL LETTER Y, or a U+0059 LATIN CAPITAL LETTER Y, then return true. Otherwise, if the element has a frameborder attribute, return false. Otherwise, if the element has a parent element that is a frameset element, then return true if that element has a border , and false if it does not. Otherwise, return true. The frame border color of a frameset or frame element is the color obtained from the following algorithm: If the element has a bordercolor attribute, and applying the rules for parsing a legacy color value to that attribute's value does not result in an error, then return the color so obtained. Otherwise, if the element has a parent element that is a frameset element, then the frame border color of that element. Otherwise, return gray. The algorithm to convert a list of dimensions to a list of pixel values consists of the following steps: Let input list be the list of numbers and units passed to the algorithm. Let output list be a list of numbers the same length as input list , all zero. Entries in output list correspond to the entries in input list that have the same position. Let input dimension be the size passed to the algorithm. Let count percentage be the number of entries in input list whose unit is percentage . Let total percentage be the sum of all the numbers in input list whose unit is percentage . Let count relative be the number of entries in input list whose unit is relative . Let total relative be the sum of all the numbers in input list whose unit is relative . Let count absolute be the number of entries in input list whose unit is absolute . Let total absolute be the sum of all the numbers in input list whose unit is absolute . Let remaining space be the value of input dimension . If total absolute is greater than remaining space , then for each entry in input list whose unit is absolute , set the corresponding value in output list to the number of the entry in input list multiplied by remaining space and divided by total absolute . Then, set remaining space to zero. Otherwise, for each entry in input list whose unit is absolute , set the corresponding value in output list to the number of the entry in input list . Then, decrement remaining space by total absolute . If total percentage multiplied by the input dimension and divided by 100 is greater than remaining space , then for each entry in input list whose unit is percentage , set the corresponding value in output list to the number of the entry in input list multiplied by remaining space and divided by total percentage . Then, set remaining space to zero. Otherwise, for each entry in input list whose unit is percentage , set the corresponding value in output list to the number of the entry in input list multiplied by the input dimension and divided by 100. Then, decrement remaining space by total percentage multiplied by the input dimension and divided by 100. For each entry in input list whose unit is relative , set the corresponding value in output list to the number of the entry in input list multiplied by remaining space and divided by total relative . Return output list . User agents working with integer values for frame widths (as opposed to user agents that can lay frames out with subpixel accuracy) are expected to distribute the remainder first the last entry whose unit is relative , then equally (not proportionally) to each entry whose unit is percentage , then equally (not proportionally) to each entry whose unit is absolute , and finally, failing all else, to the last entry. 11.6 Interactive media 11.6.1 Links, forms, and navigation User agents are expected to allow the user to control aspects of hyperlink activation and form submission , such as which browsing context is to be used for the subsequent navigation . User agents are expected to allow users to discover the destination of hyperlinks and of forms before triggering their navigation . User agents are expected to inform the user of whether a hyperlink includes hyperlink auditing , and to let them know at a minimum which domains will be contacted as part of such auditing. User agents are expected to allow users to navigate browsing contexts to the resources indicated by the cite attributes on q , blockquote , section , article , ins , and del elements. User agents are expected to surface hyperlinks created by link elements in their user interface. While link elements that create hyperlinks will match the ':link' or ':visited' pseudo-classes, will react to clicks if visible, and so forth, this does not extend to any browser interface constructs that expose those same links. Activating a link through the browser's interface, rather than in the page itself, does not trigger click events and the like. 11.6.2 The mark element User agents are expected to allow the user to cycle through all the mark elements in a Document . User agents are also expected to bring their existence to the user's attention, even when they are off-screen, e.g. by highlighting portions of the scroll bar that represent portions of the document that contain mark elements. 11.6.3 The title attribute Given an element (e.g. the element designated by the mouse cursor), if the element, or one of its ancestors, has a title attribute, and the nearest such attribute has a value that is not the empty string, it is expected that the user agent will expose the contents of that attribute as a tooltip. U+000A LINE FEED (LF) characters are expected to cause line breaks in the tooltip. 11.7 Print media User agents are expected to allow the user to request the opportunity to obtain a physical form (or a representation of a physical form) of a Document . For example, selecting the option to print a page or convert it to PDF format. When the user actually obtains a physical form (or a representation of a physical form) of a Document , the user agent is expected to create a new view with the print media, render the result, and the discard the view. 11.8 Interaction with CSS Must define that in CSS, tag and attribute names in HTML documents, and class names in quirks mode documents, are case-insensitive, as well as saying which attribute values must be compared case-insensitively. 12 Obsolete features Authors and documents must not use the features listed in this section. They are documented to enable user agents to support legacy content in an interoperable fashion. 12.1 Self-contained features 12.1.1 The applet element The applet element is a Java-specific variant of the embed element. In HTML5 the applet element is obsoleted so that all extension frameworks (Java, .NET, Flash, etc) are handled in a consistent manner. When the sandboxed plugins browsing context flag is set on the browsing context for which the applet element's document is the active document , and when the element has an ancestor object element that is not showing its fallback content , the element must be ignored (it represents nothing). Otherwise, define how the element works, if supported . { readonly attribute ; }; The applets attribute must return an HTMLCollection rooted at the Document node, whose filter matches only applet elements. 12.1.2 The marquee element ... 12.2 Other elements and attributes The following elements are obsolete and either have no meaning whatsoever or have no requirements beyond those described elsewhere in this specification: center The following attributes are obsolete and either have no meaning whatsoever or have no requirements beyond those described elsewhere in this specification: name on a elements alink on body elements background on body elements bgcolor on body elements link on body elements text on body elements vlink on body elements 12.3 Other DOM APIs These APIs expose obsolete content attributes. The [XXX] below is for some annotation meaning "this is just another part of the named interface, and should be treated as if it had been part of the main interface definition". { attribute DOMString ; attribute DOMString ; attribute DOMString ; attribute DOMString ; attribute DOMString ; attribute DOMString ; }; The text DOM attribute of the body element must reflect the element's text content attribute. The bgColor DOM attribute of the body element must reflect the element's bgcolor content attribute. The background DOM attribute of the body element must reflect the element's background content attribute. (The background content is not defined to contain a URL , despite rules regarding its handling in the rendering section above.) The link DOM attribute of the body element must reflect the element's link content attribute. The aLink DOM attribute of the body element must reflect the element's alink content attribute. The vLink DOM attribute of the body element must reflect the element's vlink content attribute. { attribute DOMString ; attribute DOMString ; attribute DOMString ; attribute DOMString ; attribute DOMString ; }; The fgColor attribute on the Document object must reflect the text attribute on the body element . The bgColor attribute on the Document object must reflect the bgcolor attribute on the body element . The linkColor attribute on the Document object must reflect the link attribute on the body element . The vLinkColor attribute on the Document object must reflect the vlink attribute on the body element . The aLinkColor attribute on the Document object must reflect the alink attribute on the body element . For the above attributes, when there is no body element , the attributes must instead return the empty string on getting and do nothing on setting. 12.4 Conformance checkers To ease the transition from HTML4 Transitional documents to the language defined in this specification, conformance checkers are encouraged to categorize errors that represent usage of old obsolete features that generally have no effect (as defined below) into a separate part of their report, to allow authors to distinguish between likely mistakes and mere vestigial markup. The following errors may be categorized as described above: The DOCTYPE parse error , if the DOCTYPE token's name is an ASCII case-insensitive match for the string " HTML ", and either: the token's public identifier is the case-sensitive string " -//W3C//DTD HTML 4.0//EN " and the token's system identifier is either missing or the case-sensitive string " http://www.w3.org/TR/REC-html40/strict.dtd ", or the token's public identifier is the case-sensitive string " -//W3C//DTD HTML 4.01//EN " and the token's system identifier is either missing or the case-sensitive string " http://www.w3.org/TR/html4/strict.dtd ", or the token's public identifier is the case-sensitive string " -//W3C//DTD XHTML 1.0 Strict//EN " and the token's system identifier is the case-sensitive string " http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd ", or the token's public identifier is the case-sensitive string " -//W3C//DTD XHTML 1.1//EN " and the token's system identifier is the case-sensitive string " http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd ". The presence of a profile attribute on the head element, if its value is an unordered set of unique space-separated tokens where the words are all valid URLs . The presence of a meta element with an http-equiv attribute in the Content Language state. The presence of a border attribute on an img element if its value is the string " 0 ". The presence of a longdesc attribute on an img element, if its value is a valid URL . The presence of a language attribute on a script element if its value is an ASCII case-insensitive match for the string " JavaScript ". The presence of a name attribute on an a element, if its value is not the empty string. The presence of a summary attribute on a table element. The presence of an abbr attribute on a td or th element. 13 Things that you can't do with this specification because they are better handled using other technologies that are further described herein This section is non-normative. There are certain features that are not handled by this specification because a client side markup language is not the right level for them, or because the features exist in other languages that can be integrated into this one. This section covers some of the more common requests. 13.1 Localization If you wish to create localized versions of an HTML application, the best solution is to preprocess the files on the server, and then use HTTP content negotiation to serve the appropriate language. 13.2 Declarative 3D scenes Embedding 3D imagery into XHTML documents is the domain of X3D, or technologies based on X3D that are namespace-aware. 13.3 Rendering and the DOM This section is expected to be moved to its own specification in due course. It needs a lot of work to actually make it into a semi-decent spec. Any object implement the AbstractView interface must also implement the { readonly attribute DOMString ; }; The mediaMode attribute on objects implementing the MediaModeAbstractView interface must return the string that represents the canvas' current rendering mode ( screen , print , etc). This is a lowercase string, as defined by the CSS specification . [CSS21] Some user agents may support multiple media, in which case there will exist multiple objects implementing the AbstractView interface. Only the default view implements the Window interface. The other views can be reached using the view attribute of the UIEvent interface, during event propagation. There is no way currently to enumerate all the views. Index This section is non-normative. List of elements List of attributes List of reflecting DOM attributes and their corresponding content attributes List of interfaces List of events References This section will be written in a future draft. Acknowledgements Thanks to Aankhen, Aaron Boodman, Aaron Leventhal, Adam Barth, Adam Roben, Addison Phillips, Adele Peterson, Adrian Sutton, Agustín Fernández, Ajai Tirumali, Alan Plum, Alastair Campbell, Alex Nicolaou, Alexander J. Vincent, Alexey Feldgendler, Алексей Проскуряков (Alexey Proskuryakov), Alexis Deveria, Allan Clements, Anders Carlsson, Andreas, Andrew Clover, Andrew Gove, Andrew Sidwell, Andrew Smith, Andy Heydon, Andy Palay, Anne van Kesteren, Anthony Boyd, Anthony Bryan, Anthony Hickson, Anthony Ricaud, Antti Koivisto, Arphen Lin, Asbjørn Ulsberg, Ashley Sheridan, Aurelien Levy, Ave Wrigley, Ben Boyle, Ben Godfrey, Ben Meadowcroft, Ben Millard, Benjamin Hawkes-Lewis, Bert Bos, Bijan Parsia, Bill Mason, Bill McCoy, Billy Wong, Björn Höhrmann, Blake Frantz, Boris Zbarsky, Brad Fults, Brad Neuberg, Brady Eidson, Brendan Eich, Brenton Simpson, Brett Wilson, Brian Campbell, Brian Korver, Brian Ryner, Brian Smith, Brian Wilson, Bruce Lawson, Bruce Miller, C. Williams, Cameron McCormack, Cao Yipeng, Carlos Perelló Marín, Chao Cai, 윤석찬 (Channy Yun), Charl van Niekerk, Charles Iliya Krempeaux, Charles McCathieNevile, Chris Morris, Chris Pearce, Christian Biesinger, Christian Johansen, Christian Schmidt, Christopher Aillon, Chriswa, Cole Robison, Colin Fine, Collin Jackson, Corprew Reed, Craig Cockburn, Csaba Gabor, Daniel Barclay, Daniel Bratell, Daniel Brooks, Daniel Brumbaugh Keeney, Daniel Davis, Daniel Glazman, Daniel Peng, Daniel Schattenkirchner, Daniel Spång, Daniel Steinberg, Danny Sullivan, Darin Adler, Darin Fisher, Dave Camp, Dave Hodder, Dave Singer, Dave Townsend, David Baron, David Bloom, David Carlisle, David E. Cleary, David Flanagan, David Håsäther, David Hyatt, David Matja, David Smith, David Woolley, DeWitt Clinton, Dean Edridge, Dean Edwards, Debi Orton, Derek Featherstone, Dimitri Glazkov, dolphinling, Doron Rosenberg, Doug Kramer, Drew Wilson, Edmund Lai, Edward O'Connor, Edward Welbourne, Edward Z. Yang, Eira Monstad, Elliotte Harold, Eric Carlson, Eric Law, Eric Rescorla, Erik Arvidsson, Evan Martin, Evan Prodromou, fantasai, Felix Sasaki, Francesco Schwarz, Franck 'Shift' Quélain, Garrett Smith, Geoffrey Garen, Geoffrey Sneddon, George Lund, Greg Botten, Greg Houston, Grey, Gytis Jakutonis, Håkon Wium Lie, Hallvord Reiar Michaelsen Steen, Hans S. Tømmerhalt, Henri Sivonen, Henrik Lied, Henry Mason, Hugh Winkler, Ian Bicking, Ian Davis, Ignacio Javier, Ivan Enderlin, Ivo Emanuel Gonçalves, J. King, Jacques Distler, James Craig, James Graham, James Justin Harrell, James M Snell, James Perrett, Jan-Klaas Kollhof, Jason Kersey, Jason Lustig, Jason White, Jasper Bryant-Greene, Jed Hartman, Jeff Cutsinger, Jeff Schiller, Jeff Walden, Jens Bannmann, Jens Fendler, Jens Lindström, Jens Meiert, Jeroen van der Meer, Jim Jewett, Jim Ley, Jim Meehan, Jjgod Jiang, Joe Clark, Joe Gregorio, Joel Spolsky, Johan Herland, John Boyer, John Bussjaeger, John Fallows, John Harding, John Keiser, John-Mark Bell, Johnny Stenback, Jon Ferraiolo, Jon Gibbins, Jon Perlow, Jonas Sicking, Jonathan Worent, Jonny Axelsson, Jorgen Horstink, Jorunn Danielsen Newth, Joseph Kesselman, Josh Aas, Josh Levenberg, Joshua Randall, Jukka K. Korpela, Jules Clément-Ripoche, Julian Reschke, Justin Sinclair, Kai Hendry, Kartikaya Gupta, Kristof Zelechovski, 黒澤剛志 (KUROSAWA Takeshi), Kyle Hofmann, Léonard Bouchet, Lachlan Hunt, Larry Masinter, Larry Page, Lars Gunther, Lars Solberg, Laura L. Carlson, Laura Wisewell, Laurens Holst, Lee Kowalkowski, Leif Halvard Silli, Lenny Domnitser, Leons Petrazickis, Logan, Loune, Maciej Stachowiak, Magnus Kristiansen, Maik Merten, Malcolm Rowe, Mark Birbeck, Mark Miller, Mark Nottingham, Mark Rowe, Mark Schenk, Mark Wilton-Jones, Martijn Wargers, Martin Atkins, Martin Dürst, Martin Honnen, Martin Kutschker, Masataka Yakura, Mathieu Henri, Matt Wright, Matthew Gregan, Matthew Mastracci, Matthew Raymond, Matthew Thomas, Mattias Waldau, Max Romantschuk, Menno van Slooten, Micah Dubinko, Michael 'Ratt' Iannarelli, Michael A. Nachbaur, Michael A. Puls II, Michael Carter, Michael Daskalov, Michael Enright, Michael Gratton, Michael Nordman, Michael Powers, Michael(tm) Smith, Michel Fortin, Michiel van der Blonk, Mihai Şucan, Mike Brown, Mike Dierken, Mike Dixon, Mike Schinkel, Mike Shaver, Mikko Rantalainen, Mohamed Zergaoui, Neil Deakin, Neil Rashbrook, Neil Soiffer, Nicholas Shanks, Nicolas Gallagher, Ojan Vafai, Olaf Hoffmann, Olav Junker Kjær, Oliver Hunt, Olli Pettay, Patrick H. Lauke, Paul Norman, Peter Karlsson, Peter Kasting, Peter Stark, Peter-Paul Koch, Philip Jägenstedt, Philip Taylor, Philip TAYLOR, Rachid Finge, Rajas Moonka, Ralf Stoltze, Ralph Giles, Raphael Champeimont, Rene Saarsoo, Rene Stach, Rich Doughty, Richard Ishida, Rigo Wenning, Rikkert Koppes, Rimantas Liubertas, Robert Blaut, Robert O'Callahan, Robert Sayre, Roman Ivanov, Ryan King, S. Mike Dierken, Sam Kuper, Sam Ruby, Sam Weinig, Sander van Lambalgen, Scott Hess, Sean Fraser, Sean Hogen, Sean Knapp, Sebastian Schnitzenbaumer, Shanti Rao, Shaun Inman, Shiki Okasaka, Sierk Bornemann, Sigbjørn Vik, Silvia Pfeiffer, Simon Montagu, Simon Pieters, Stefan Haustein, Steffen Meschkat, Stephen Ma, Steve Faulkner, Steve Runyon, Steven Garrity, Stewart Brodie, Stuart Ballard, Stuart Parmenter, Subramanian Peruvemba, Sunava Dutta, Susan Borgrink, Susan Lesch, Tantek Çelik , Ted Mielczarek, Terrence Wood, Thomas Broyer, Thomas O'Connor, Tim Altman, Tim Johansson, Toby Inkster, Todd Moody, Tom Pike, Tommy Thorsen, Travis Leithead, Tyler Close, Vladimir Vukićević, voracity, Wakaba, Wayne Pollock, Wellington Fernando de Macedo, Will Levine, William Swanson, Wladimir Palant, Wolfram Kriesing, Yi-An Huang, Yngve Nysaeter Pettersen, Zhenbin Xu, and Øistein E. Andersen, for their useful comments, both large and small, that have led to changes to this specification over the years. Thanks also to everyone who has ever posted about HTML5 to their blogs, public mailing lists, or forums, including the W3C public-html list and the various WHATWG lists . Special thanks to Richard Williamson for creating the first implementation of canvas in Safari, from which the canvas feature was designed. Special thanks also to the Microsoft employees who first implemented the event-based drag-and-drop mechanism, contenteditable , and other features first widely deployed by the Windows Internet Explorer browser. Special thanks and $10,000 to David Hyatt who came up with a broken implementation of the adoption agency algorithm that the editor had to reverse engineer and fix before using it in the parsing section. Thanks to the many sources that provided inspiration for the examples used in the specification. Thanks also to the Microsoft blogging community for some ideas, to the attendees of the W3C Workshop on Web Applications and Compound Documents for inspiration, to the #mrt crew, the #mrt.no crew, and the #whatwg crew, and to Pillar and Hedral for their ideas and support.