Dochula Pass, Bhutan

Following up on a suggestion by Nathan Hill of SOAS, I added a la-swe glyph to the default view of the picker alongside the medial consonants. If you click on it, it produces U+1039 MYANMAR SIGN VIRAMA + U+101C MYANMAR LETTER LA.

I also rearranged the font pull-down list a little, adding information about what fonts are available on your Mac OS X or Windows7 system, and added a placeholder, like I did recently for the Khmer picker.

You can find the Myanmar picker at http://rishida.net/scripts/pickers/myanmar/

Following up on a very good suggestion by Roger Sperberg, I added two webfonts to the Khmer picker and arranged the font selection list so that you can see which fonts are available on your Mac OS X or Windows7 system.

The webfonts make it possible to use the picker on an iPad or other device that doesn’t have a Khmer font installed. I added two webfonts because one worked on my iPad and the other didn’t, and it was vice versa on my Snow Leopard Macbook.

I also added an HTML5 placeholder for the output box. (I’m wishing you could style that differently from the standard content – and wishing that markup designers would think about this sort of thing and stop using attributes for natural language text…).

You can find the Khmer picker at http://rishida.net/scripts/pickers/khmer/

Picture of the page in action.

>> Use it

This picker contains characters from the Unicode Balinese block needed for writing the Balinese language. Characters needed for Sasak are also available in the Advanced section. Balinese musical notation characters are not included.

About the tool: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility.

About this picker: Characters are grouped to aid input. The consonant block includes characters needed for Kawi and Sanskrit as well as the native Balinese characters, all arranged according to the Brahmi pronunciation grid.

The picker has only a default view and a font grid view. It’s difficult to put in the time for the shape-based, keyboard-based, and various transcription-based views in some other pickers. In a new departure, however, I have included a list of Latin characters on the default view to assist in writing transcriptions alongside Balinese text.

There is, however, a significant issue with this picker, due to the lack of support for Balinese as a script in computers. The only Unicode-based Balinese font I know of is Aksara Bali, but that font seems to only work as expected in Firefox on Mac OS X. Furthermore, the Aksara Bali font doesn’t handle ra repa as described in the Unicode Standard. The sequence <consonant , adeg-adeg, ra repa> produces a visible adeg-adeg, rather than the post-fixed form of ra repa. The sequence <consonant , vowel sign ra repa> produces the post-fixed form of ra repa, rather than the subjoined form. You can produce the post-fixed form with this font by using <consonant , vowel sign ra repa> and the subjoined form by using <consonant , adeg-adeg, ra, pepet>, but these sequences will produce content that cannot be matched against sequences using the Unicode approach, and content that may fail with other Unicode-compliant fonts in the future.

Hopefully some new, fully Unicode-compliant fonts will come along soon. This is one of the most beautiful scripts I have come across.

(Btw, I’m working on a set of notes for Balinese characters, linked from UniView, with some feature innovations to get around the font issue. Look out for that later. And I’m thinking I should develop a Javanese picker to go with this one. Just need a bit of time…)

For the curious, here’s the first article of the Universal Declaration of Human Rights, as typed in the Balinese picker. Translation by Tri Ediwan (reproduced from Omniglot).

>> Use it

Inspired by some comments on John Well’s blog, I decided to add a keyboard layout to the IPA picker today. It follows the layout of Mark Huckvale’s Unicode Phonetic Keyboard (UCL) v1.01.

I can’t say I understand why many of the characters are allocated to the keys they are, but I figured that if John Wells uses this keyboard it would be probably worth using its layout.

Picture of the page in action.

>> Use it

This picker contains characters from the Unicode Mongolian block needed for writing the Mongolian language. It doesn’t include Sibe, Todo or Manchu characters. Mongolian is a complex script, and I am still familiarising myself with it. This is an initial trial version of a Mongolian picker, and as people use it and raise feedback I may need to make changes.

About the tool: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility.

About this picker: The output area for this picker is set up for vertical text. However, only Internet Explorer currently supports vertical text display, and only IE8 supports Mongolian’s left-to-right column progression. In addition, it seems that IE doesn’t support ltr columns in textarea elements. The bottom line is that, although the output area is the right shape and position for vertical text, mostly the output will be horizontal. You will see vertical text in IE, but the column positions will look wrong. Nevertheless, in any of these cases, when you cut and paste text into another document, the characters will still be correctly ordered.

Consonants are to the left, and in the order listed in the Wikipedia article about Mongolian text. To their right are vowels, then punctuation, spaces and control characters, and number digits. The variation selectors are positioned just below the consonants.

As you mouse over the letters, the various combining forms appear in a column to the far left. This is to help identify characters, for those less familiar with the alphabet.

Picture of the page in action.

>> Use it

In 1992 the Chinese government recognised the Fraser alphabet as the official script for the Lisu language and has encouraged its use since then. There are 630,000 Lisu people in China, mainly in the regions of Nujiang, Diqing, Lijiang, Dehong, Baoshan, Kunming and Chuxiong in the Yunnan Province. Another 350,000 Lisu live in Myanmar, Thailand and India. Other user communities are mostly Christians from the Dulong, the Nu and the Bai nationalities in China.

About the tool: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility.

Latest changes: This picker is new. The default view was modified from an original proposal by Benjamin Lee, and is likely to be more useful to people who are somewhat familiar with the alphabet and characters of Lisu. Characters are arranged to simplify entry, with consonants to the left, vowels to their right, and tone marks to their right.

There is also a keyboard view. Many of the positions of characters are based on keyboard layouts I have seen. Those keyboards, however, tended to use some ASCII characters for punctuation, when the Unicode Standard recommends other characters (in particular, MODIFIER LETTER LOW MACRON and MODIFIER LETTER APOSTROPHE) or omit some punctuation characters mentioned in the Unicode Standard. The current version of this keyboard, therefore adds some extra characters.

The layout is adequate, given that pickers assume availability of a QWERTY keyboard, however if a real standardised keyboard layout is to be made, it should involve some further changes. For example, people wanting to use syntax characters such as comma, period, semi-colon, single quote, etc, while writing the text in Lisu will need direct access to those characters. They are missing from this layout.

About the tool: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more useable than a regular character map utility.

Latest changes: This picker has been upgraded to use the version 10 look and feel, and incorporate new characters from Unicode version 5.2. Characters whose use is discouraged in Unicode have been moved to the advanced section – similar looking images in the main section put multiple characters into the output, as per NFC normalization.

>> Use it

About the tool: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more useable than a regular character map utility.

Latest changes: Both pickers have been upgraded to use the version 10 look and feel.

The Arabic block picker now includes the latest characters added to the Arabic and Arabic Supplement blocks in Unicode 5.1. Characters are displayed using the shape view of version 10 pickers. This saves a lot of space on-screen.

The Ethiopic picker was also updated to include more recent characters from the Unicode Ethiopic block (added in version 4.1), and the layout was improved to make it easier to locate a character. It still covers only the basic Ethiopic block.

>> Use the Arabic Block picker

>> Use the Ethiopic picker

The new characters.

About the tool: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more useable than a regular character map utility

Latest changes: I recently added U+2C71 LATIN SMALL LETTER V WITH RIGHT HOOK (labiodental tap or flap) to the IPA picker. This was in the IPA chart for a long time, but was only added to Unicode in version 5.1.

Today I also added, at the request of Dan McCloy, four prosodic markers: prosodic phrase, prosodic word, syllable and mora (see the second line of the picture).

Regular users will also notice that I recently upgraded the picker chrome to version 10, too.

>> Use it


Picture of the page in action.
 
Picture of the page in action.

About the tools: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility

Latest changes: The Urdu and Tamil pickers have been upgraded to version 10. This provides new views of the data, but also involved a thorough overhaul and redesign of the pickers. Transliteration functions have also been added for the Tamil picker.

In addition, the Urdu notes page was updated and a new Tamil notes page was created. Database entries were also updated or, in the case of Tamil, created to support the notes pages. These notes pages are the first to use a new look and feel, based on the analyse-string tool I produced earlier this year. This adds information about each character from the Unicode descriptions data to that from my own database.

Picture of the page in action.

About the tool: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more useable than a regular character map utility

Latest changes: Over the Christmas break I’ve applied version 10 upgrades to the following pickers: Bengali, Hebrew, Khmer, Lao, Malayalam, Myanmar, Thai and Tifinagh. In the case of Hebrew and Tifinagh, this came down to completely rewriting the pickers.

Key changes in version 10 include the following:

  • The visible layout of the shape view has been reduced in the vertical direction by showing a group of characters only when you mouse over the orange keys at the top. This makes it easier and faster to locate characters, and also improves use on screens with restricted space. The way similar characters in other groups is handled has been reinvented to fit the new approach better, and enable faster creation of pickers in the future.
  • The visible layout of the transcription view has been adapted in a similar way to the shape view.
  • The button to dump the phonetic buffer has been moved to just below the output area.
  • The Detail button is now called the Analyse button, and both this and the Codepoints commands now bring up the new String Analyser utility, which provides much better results than the old pages.
  • A keyboard view has been added to the Tifinagh picker. This new view may pop up in other pickers in the future.

There were a number of other changes to the code, and not least to the instructions for use on the main picker page and each set of notes below the pickers themselves.

>> Use it


Picture of the page in action.

About the tool: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more useable than a regular character map utility

Latest changes: This is the first version 9 picker. Changes introduced in version 9 include moving the buttons that allow you to display different views to just below the page title. Also, in version 8 pickers, there was an icon in the phonic view that allowed you to dump to the output the phonetic transcription that builds up while selecting characters. This has been replaced with a button just below the output field. There were a number of other superficial changes.

A significant addition to the Malayalam picker is the ability to convert Malayalam text into a Latin transliteration, based on ISO 15919. There was already a way to convert Latin transliterations to Malayalam script.

This version also continues to allow you to type in chillu characters as either single characters as included in Unicode v5.1, or as a sequence of consonant+virama+zwj. Additions to the Malayalam repertoire added in v5.2 have not yet been added to the picker.

>> Use it


>> Use it !

Picture of the page in action.

I have just upgraded the Malayalam picker to level 7, and added a bunch of new features that should show up in other pickers at level 7 as I get time:

Shape view The pickers are aimed particularly at people who are not familiar enough with a script to use the keyboard. However, there are many ligatures and conjuncts in Malayalam, which makes it difficult to identify the character sequences needed. This view provides most of the shapes you’ll see in Malayalam text, grouped by shape. It’s something I’ve been wanting to add to the pickers for some time.

Picture of the page in action.

Phonic view This has been done in other pickers, but it has some new features over those. The sounds have been arranged along similar lines to a standard IPA chart, and multiple transcriptions are supported. In addition, you can click on the transcription text to build up a phonemic string in IPA. This is particularly useful for creating examples.

Picture of the page in action.

Regular expressions in searches The search feature was upgraded to allow for regular expressions. So now you can highlight characters containing GA without highlighting ones containing NGA: just search for \bga\b (or use the convenient short-cut form .ga.). Of course you can do more complicated searches too.

Add codepoint You can add a hex codepoint value to the box in the yellow area to insert into the text. This is useful for things like the odd unusual character, or for just figuring out what a sequence of codepoints represents. You can input any number of codepoints (including surrogates) into the input box, separating them by spaces.

Chillus This version of the picker supports all Unicode 5.1 characters, including the chillu characters. Because most Malayalam fonts support the old way of inputting chillu forms, you can specify in the yellow box area what you want the output to be when clicking on a chillu letter: the pre-5.1 sequence or the new atomic character. (The default is the atomic character.)

The picker also comes with the usual set of level 7 features, such as font grid view, graphic characters, hiding of uncommon characters, optimised ordering of characters in the alphabetic view, two-tone highlighting, etc.

You can start up directly in either of the available views by appending the following to your URI: ?view=, followed by one of, respectively, alphabet, shape, phonic or fontgrid.

Enjoy.

>> Use it !

Picture of the page in action.

I have just upgraded the Burmese picker as follows:

Rearranged characters The Myanmar3 font expects multiple combining characters to be entered in the order described in the Unicode 5.1 Standard for correct display. The panel of combining characters has been arranged so that you can easily remember what that order was. Characters to the left precede those to the right, characters higher up precede those lower down.

In addition to that, I have rearranged all the character positions so that it is easier to locate the various parts of a syllable as you type.

I also added some combinations of characters that make up multi-part vowels and the kinzi with a single click.

I have also moved some of the less common characters to an ‘advanced’ area to the right which can be opened and closed by clicking on the arrow-head icon.

New highlighting As you mouse over a character the picker will show you other characters that are visually similar (particularly useful for those not very familiar with the script). This new version shows the more likely confusable characters with a blue outline, and other similar characters with orange. This is useful given that many Myanmar characters look quite similar.

As always, you can turn off this feature or disable it in the URI you use to call the picker.

Font grid view Shows characters in Unicode order, using whatever font is specified in the Font list or Custom font input fields. This allows comparison of fonts (especially useful in IE, which shows if a glyph is missing from a font).

You can start up directly in either of the available views by appending the following to your URI: ?view=, followed by one of, respectively, alphabet or fontgrid.

Enjoy.

>> Use it !

Picture of the page in action.

This latest picker includes all characters in the Unicode Lao block, plus a few punctuation characters. There are several alternative views.

Alphabetic By default, characters are arranged by groups, and consonants and vowels are listed in alphabetic order. Digits are in keypad order. Similar characters are highlighted by default, but this can be switched off using the ‘Hint’ selector.

Tone marks and combining vowels are reordered automatically so that vowels come first in the output character sequence.

Phonic Characters are grouped and ordered by sound. I set this up for myself to enter Lao text that I wanted to copy that was accompanied by a transcription. Initial consonants are followed by tones and consonants that come second in a cluster, then vowels. Alternatives with the same sound are separated by a red dot. Consonants that have different sounds when word final are also listed under those sounds. (Dropped aspiration is not considered significant.)

Dashes representing consonants indicate which vowels are non-final or occur before the consonant. Where a vowel has a part that comes before a consonant, a single click should arrange the parts properly. This behaviour speeds up typing. It may not be so intuitive to people familiar with Lao, however, since it makes Lao behave like Khmer and Indic scripts.

You should add any tone mark before the vowel and the picker will automatically reorder characters as needed. If you want to wrap text around a combination of two syllable-initial characters, type the characters then click on ‘flag as cluster’ before clicking on the tone mark or vowel.

Two old vowel spellings are only displayed if you click on the grey arrow, top right.

Font grid Shows characters in Unicode order, using whatever font is specified in the Font list or Custom font input fields. This allows comparison of fonts (especially useful in IE, which shows if a glyph is missing from a font).

You can start up directly in one of the above views by appending the following to your URI: ?view=, followed by one of, respectively, alphabet, phonic or fontgrid.

Enjoy.

>> Use it !

Picture of the page in action.

The default arrangement for this picker is still shape-based (though with some small improvements), but I have added a new view that is arranged by sound.

Update: After some initial feedback, I decided to change the phonic view of the picker so that vowels are entered by single click. This will probably disconcert people familiar with typing Thai. Revised description follows.

Another update (2008-03-03): I have added additional ways of viewing the characters, and re-architected the picker as a basis for extending this to other pickers in the future. I also changed the way of dealing with initial clusters in the phonic view. I changed the text below again to reflect what’s new:

Alphabetic view By default, characters are arranged by groups, and consonants and vowels are listed in alphabetic order. Digits are in keypad order. Obsolete and rare characters are only displayed if you click on the grey arrow, top right. Similar characters are highlighted by default, but this can be switched off using the ‘Hint’ selector.

Comparison view This was the original view for the Thai picker. Characters are grouped by shape or type to enable easy identification by people who are unfamiliar with the Thai script. Vowels are shown near the bottom. Digits are on the right, in keypad order.

Phonic view Characters are grouped and ordered by sound. I set this up for myself, because I wanted to enter Thai text that was accompanied by a transcription.

Initial consonants are followed by tones and consonants that come second in a cluster, then vowels. Alternatives with the same sound are separated by a red dot. Consonants that have different sounds when word final are also listed under those sounds. (Dropped aspiration is not considered significant.)

Dashes representing consonants indicate which vowels are non-final or occur before the consonant.

Where a vowel has a part that comes before a consonant, a single click should arrange the parts properly. This behaviour speeds up typing. It may not be so intuitive to people familiar with Thai, however, since it makes Thai behave like Khmer and Indic scripts. You should add any tone mark before the vowel and the picker will automatically reorder characters as needed.

If you want to wrap text around a combination of two syllable-initial characters, type the characters then click on ‘flag as cluster’ before clicking on the tone mark or vowel.

Font grid view Shows characters in Unicode order, using whatever font is specified in the Font list or Custom font input fields. This allows comparison of fonts (especially useful in IE, which shows if a glyph is missing from a font).

You can start up directly in any one of the above views by appending the following to your URI: ?view=, followed by one of, respectively, alphabet, comparison, phonic or fontgrid.

Enjoy.

>> Use it !

Picture of the page in action.

This latest picker includes characters used for writing Vietnamese. Characters are taken from various Latin Unicode blocks.

Tones are separated from base characters in the selection area, but the output you create is always fully precomposed. If you copy and paste text into the output area, you can normalize the Vietnamese text as NFC by selecting the tab below. The Vietnamese text in the output area is also normalized when you select one of the transcription tabs.

The tabs IPA N and IPA S tabs provide a basic, mostly phonemic-level, transcription of the pronunciation. N means North Vietnamese, S is for South. The sources I used for this varied a great deal, particularly in the choice of symbols to represent vowels. There are also more than two main dialects. So this is a synthesis and a rough guide. Some rare vowel combinations may be missing, although I have covered quite a number.

There are a large number of UVN fonts – so many that I didn’t know which ones to pick for the font pulldown. I chose the two that show up on Alan Wood’s page. If you think certain others are so common that they ought to be there, please let me know.

Enjoy.

>> Use it !

Picture of the page in action.

Although I have a picker already for Arabic, Persian and Urdu, I have developed another that is specifically for inputting Urdu. One reason for this is to reduce the choice of characters so that the user is more likely to select the right character for Urdu (eg. heh goal rather than arabic heh). Another is to provide shortcuts for things like aspirated letters and some common combinations (like the word ‘allah’).

It includes characters used for Urdu in Unicode 5.0. Most of the characters in the Urdu standard UZT 1.01 are included.

The aspirated letters of the alphabet can be entered with a single click. Also, base characters with diacritics can be inserted into the text with a single click where NFC normalisation would produce a single precomposed character.

Letters of the alphabet are shown in alphabetic order at the top left, digits are in keypad order, and combining characters related to vowel sounds are shown along the bottom. The lower middle section contains useful but non-alphabetic characters and punctuation. To the right are various symbols. Hinting is implemented for visually similar glyphs.

>> Use it !

Picture of the page in action.

Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more useable than a regular character map utility.

The Bengali picker includes all the characters in the Unicode 5.0 Bengali block. Note: There was an important addition to the Bengali block in version 4.1, a single character for khanda ta, that may not yet be supported in fonts, but has been added to this version of the picker.

Consonants are mostly in a typical articulatory arrangement, vowels are aligned with vowel signs, and digits are in keypad order. Hinting is implemented for visually similar glyphs.

A function has also been added to transliterate Bengali text to Latin, though the scheme used is not standard, and may change at short notice. Don’t use it in anger yet.

I’ve been wanting to improve the editing behaviour of my pickers for quite some time, so that users could interact more easily with the keyboard, and insert characters into the middle of a composition, not just at the end. In fact, the output area maintains the focus all the time, now – which makes a major improvement to the usability of the pickers.

This week I made those things happen, and created a new template with some other changes, too.

An updated Bengali picker is first out of the box, but look out for a brand new Urdu-specific picker to follow close on its heels. I will retrofit the new template to other pickers as time allows, or need dictates.

I also beefed up the font selection list with a large number of TT and OT fonts, and improved the reference material at the bottom.

I improved the mechanism that highlights similar characters, to give more fine-grained control to the associations between characters.

I also added a field just under the title that gives information about the character the user is mousing over, and added a search field to help users find characters for which they know the Unicode name or number. I plan to extend the information associated with characters in future to include native names (eg. e-kar) and other useful search info.

I also changed the scripting and HTML so that a single click can now produce multiple characters in the composition field. This will allow users to input ligatures like the indic ‘ksha’ or Urdu aspirated consonants, or complex sequences tied to ligatures (like the word ‘Allah’) with a simple click.

Some things have also been removed. There is no DEL button now, since you can interact more easily with the keyboard for that. Spaces are available from the (now rationalised) character area, rather than a button. And there is no longer an option to switch between graphics and characters for the selection. This is partly for simplicity, and partly to make it easier to represent some of the slightly more complicated selection options I want to add in future – for example, specific shapes are appropriate for Urdu arabic characters, and I don’t want to leave it to chance as to whether the user’s system has the right fonts to produce the desired shapes.

Getting to this actually required a huge amount of unseen work, since I had to wrap all the images in button markup and move and change attributes, etc. so that the composition box retains the focus in IE (it worked fine for Firefox, Opera and Safari). I also, of course, made significant, but probably not noticeable, changes to the Javascript and CSS.