Slide

A distinction between upper and lower case applies to Latin, Cyrillic, Greek and Armenian scripts. (Georgian makes a distinction between two variants of a character that has been compared to a case distinction, but in modern Georgian is not used.)

Like sorting, case conversion in Unicode cannot be achieved by simply adding or subtracting an offset to a code value. In different Unicode blocks the arrangement of upper and lower case variants is different. Also, mappings are not always straightforward and repeatable, as shown in the Turkish example on the top line of the slide.

Case conversion, like sorting, is also subject to different rules according to the language or dialect in question. The second line alludes to rules for accentuation of upper case letters that differ between European French and Canadian French. In Greek, syntactic differences affect the choice.

The third line shows mappings that are not one to one in German and French.

The fourth line shows an alternate mapping based on the distinction between lower case, upper case and title case in Serbian.

The Unicode database provides semantic information to assist in converting characters between upper and lower case.


Copyright © 2003-2005 Richard Ishida. All rights reserved.