Use accesskey "n" to jump to the internal navigation links at any point. Right now you can
This page provides links to articles, papers and pages I have written.
Much more is available from the W3C Internationalization Activity site.
Tutorial that helps you understand key requirements for implementing writing systems in information technology. It does this by examining real examples of a wide range of modern scripts to discover features that a computerized implementation must support. It also makes special reference, where appropriate, to how the Unicode Standard points the way forward for meeting these requirements.
Lists translations of phrases relating to the W3C in various scripts for use as examples in documents and presentations. You will need appropriate fonts and rendering mechanisms to view properly.
Pages illustrating the features of numerous non-Latin writing systems. These pages also allow you to experiment with various CSS3 styling techniques using dynamic HTML.
Paper delivered at the Unicode Conference in Sept 2002 and Mar 2003. This is also available as Unicode Technical Report #10. Most people will need to read the PDF version to see the Indic text.
In-progress draft of notes that list the symbols used to represent Bengali, describe their use, and relate them to appropriate characters for representation in Unicode. There is an index of shapes you can use to look up Bengali glyphs and track them down to their constituent Unicode codepoints.
In-progress draft of notes that lists the symbols used to represent Lao, describes their use, and relates them to appropriate characters for representation in Unicode.
In-progress draft of notes that lists the symbols used to represent Khmer, describes their use, and relates them to appropriate characters for representation in Unicode.
In-progress draft of notes that list the symbols used to represent Myanmar, describe their use, and relate them to appropriate characters for representation in Unicode.
In-progress draft of notes that list the symbols used to represent Urdu with the Arabic script, describe their use, and relate them to appropriate characters for representation in Unicode. I still need to address which Unicode characters are most appropriate when there is a choice, but there is already a lot of information about the use of letters and symbols when writing Urdu. Most people will need to read the PDF version to see the nasta'liq font, but the HTML version points to the location of a free font.
Is it correct that simplified and traditional Chinese are not completely separate sets of code entries in Unicode? If so, are they simply like two different fonts for the same Unicode point? Would I have to have a simplified and a traditional font installed? One traditional character may correspond to several simplified ones, right?
People who want to point to pages in other languages or for other countries keep asking me where to find information about how to write a country or language name in the native language and script, so I thought I'd try to put a list together myself. This is a draft listing language names - corrections and additions are welcome!
Following the same rationale as the previous item, this draft lists names of countries in their own script. Note that I am currently working on the official list of countries as used by the UN that is likely to supercede this list.
Describes a hack that allows you to do language negotiation across files that are not necessarily in the same directory. It is a method described by Dominique Hazaël-Massieux with a couple of refinements I added relating to handling default files and language extensions appearing before the .html extension. Note: I’m not convinced that it’s a good idea.
Designers must be very careful about how they split up and reuse text on-screen, since the linguistic differences between languages can lead to real headaches for localizers and may in some cases make a reasonable translation impossible to achieve. (Article in Multilingual Computing magazine).
Effective localization of XML documents begins with the development of an internationalized document structure. (Article in Multilingual Computing magazine).
Requirements to guide DTD or Schema developers, or for an internationalised tag set or namespace that can be included in DTDs.
This paper refers to standard topics such as character encoding and language declarations, but also covers topics such as implementation of emphasis and style conventions, handling of citations, use of text in attribute values, and the need for an element like HTML's SPAN. In addition, other topics that have traditionally been associated with translation of user interface messages become applicable due to the nature of XML documents. These include the provision of designer's notes, identification of non-translatable text, and use of element ids for automatic translation of elements.
When I was learning to use FO I needed a table that showed at a glance what each formatting object's children were and what properties supported it.
Implementor's cheat sheet - gives fast access to WAI's recommendations for those implementing HTML (still not quite finished, and still very badly styled, but works for most common things).
On this page I explore some thoughts on requirements for XML editors that will support editing of markup for languages such as Arabic, Hebrew and Farsi, that involve right-to-left and bidirectional text. It is still a rough first draft.
I use XMetal for all my XHTML editing because it ensures validity and I find it very easy to add, move and change tags and attributes. This article describes how I set up my environment to handle XHTML, in the hope that others might find bits of it useful to get started quickly.
Tips I've picked up and want to remember to help me use XMetal. (currently about using as XHTML editor).