Use accesskey "n" to jump to the internal navigation links at any point. Right now you can
This page provides links to articles, papers and pages I have written in addition to my blog posts.
Tutorial that helps you understand key requirements for implementing writing systems in information technology. It does this by examining real examples of a wide range of modern scripts to discover features that a computerized implementation must support. It also makes special reference, where appropriate, to how the Unicode Standard points the way forward for meeting these requirements.
Lists translations of phrases relating to the W3C in various scripts for use as examples in documents and presentations. You will need appropriate fonts and rendering mechanisms to view properly.
Provides information about script characteristics for a number of languages. This gives a rough idea of what is needed to support a given language on a platform. You can also rearrange the data to see, for example, which languages use right-to-left scripts, which languages require the largest number of combining characters, etc.
Introduction to Indic Scripts Updated !
Paper delivered at the Unicode Conference in Sept 2002 and Mar 2003. This is also available as Unicode Technical Report #10. Most people will need to read the PDF version to see the Indic text.
Notes that describe the Balinese script and the characters from the Unicode Balinese block, as used for the Balinese language.
Notes that describe the Bengali script and the characters from the Unicode Bengali block.
In-progress draft of notes that lists the symbols used to represent Lao, describes their use, and relates them to appropriate characters for representation in Unicode.
In-progress draft of notes that lists the symbols used to represent Khmer, describes their use, and relates them to appropriate characters for representation in Unicode.
In-progress draft of notes that list the symbols used to represent Myanmar, describe their use, and relate them to appropriate characters for representation in Unicode.
Notes that describe the Tamil script and the characters from the Unicode Tamil block.
Notes that describe the Arabic script as used for Urdu and the characters used in Urdu from the Unicode Arabic block.
Notes that describe the Ishidic script. There are no Ishidic characters encoded in Unicode as yet. The script is a mixture of abugida, abjad, alphabet and other script techniques.
What's the difference between Simplified & Traditional Chinese, and are they separate in Unicode?
Is it correct that simplified and traditional Chinese are not completely separate sets of code entries in Unicode? If so, are they simply like two different fonts for the same Unicode point? Would I have to have a simplified and a traditional font installed? One traditional character may correspond to several simplified ones, right?
Non-Latin Script Samples Updated !
Pages illustrating the features of numerous non-Latin writing systems. These pages also allow you to experiment with variousCSS3 styling techniques using dynamic HTML.
Fonts supplied with Windows7 and Mac OS X, by script NEW !
A list of fonts provided by the Windows7 and Mac OS X SnowLeopard/Lion operating systems, grouped by script.
Blog posts
Most of my articles on Web internationalization can be found at the W3C Internationalization site.
Blog posts
Other articles
Language negotiating with remote files using Apache
Describes a hack that allows you to do language negotiation across files that are not necessarily in the same directory. It is a method described by Dominique Hazaël-Massieux with a couple of refinements I added relating to handling default files and language extensions appearing before the .html extension. Note: I’m not convinced that it’s a good idea.
Text Fragmentation and Reuse in User Interfaces
Designers must be very careful about how they split up and reuse text on-screen, since the linguistic differences between languages can lead to real headaches for localizers and may in some cases make a reasonable translation impossible to achieve. (Article in Multilingual Computing magazine).
Effective localization of XML documents begins with the development of an internationalized document structure. (Article in Multilingual Computing magazine).
Localizable DTD Design Requirements
Requirements to guide DTD or Schema developers, or for an internationalised tag set or namespace that can be included in DTDs.
Localisation Considerations in DTD Design
This paper refers to standard topics such as character encoding and language declarations, but also covers topics such as implementation of emphasis and style conventions, handling of citations, use of text in attribute values, and the need for an element like HTML's SPAN. In addition, other topics that have traditionally been associated with translation of user interface messages become applicable due to the nature of XML documents. These include the provision of designer's notes, identification of non-translatable text, and use of element ids for automatic translation of elements.
Blog posts
Other articles
When I was learning to use FO I needed a table that showed at a glance what each formatting object's children were and what properties supported it.
Implementor's cheat sheet - gives fast access to WAI's recommendations for those implementing HTML (still not quite finished, and still very badly styled, but works for most common things).
Editing requirements for XML markup and RTL scripts
Exploratory requirements for XML editors that will support editing of markup for languages such as Arabic, Hebrew and Farsi, that involve right-to-left and bidirectional text. It is still a rough first draft.
My setup for editing XHTML with XMetal
I use XMetal for all my XHTML editing because it ensures validity and I find it very easy to add, move and change tags and attributes. This article describes how I set up my environment to handle XHTML, in the hope that others might find bits of it useful to get started quickly.
Tips I've picked up and want to remember to help me use XMetal. (currently about using as XHTML editor).