Search Results for 'uniview'
Posted on Wed 1 Aug 2012 under general, i18n, utilities, web

>> Use UniView
The main addition in this version is a couple of buttons that appear when you ask UniView to display a block.
Clicking on Show annotated list generates a list of all characters in the block, with annotations.
Clicking on Show script links displays a list of links to key sources of information about the script of the block, links to relevant articles and apps on the rishida.net site, and related fonts and input methods. This provides a very quick way of finding this information. One particularly useful link (‘Historical documentation’, which links to a Scriptsource.org page) allows you to find the proposals for all additions to Unicode related to the relevant script. These proposals are a mine of useful information about the individual characters in a block, and SIL staff should get a medal for trawling through all the relevant data to provide this.
In addition, there were some changes to the user interface, including the following:
- The order of information in the lower right panel (detailed character information) was slightly changed, and two alterative representations of the character were added: an HTML escape, and a URI escape.
- The search box at the top left was constrained to appear closer to the other controls when the window is stretched wide.
Various bugs were also fixed.
Posted on Mon 5 Mar 2012 under general, utilities, web

>> Use it
This picker contains characters from the Unicode Balinese block needed for writing the Balinese language. Characters needed for Sasak are also available in the Advanced section. Balinese musical notation characters are not included.
About the tool: Pickers allow you to quickly create phrases in a script by clicking on Unicode characters arranged in a way that aids their identification. Pickers are likely to be most useful if you don’t know a script well enough to use the native keyboard. The arrangement of characters also makes it much more usable than a regular character map utility.
About this picker: Characters are grouped to aid input. The consonant block includes characters needed for Kawi and Sanskrit as well as the native Balinese characters, all arranged according to the Brahmi pronunciation grid.
The picker has only a default view and a font grid view. It’s difficult to put in the time for the shape-based, keyboard-based, and various transcription-based views in some other pickers. In a new departure, however, I have included a list of Latin characters on the default view to assist in writing transcriptions alongside Balinese text.
There is, however, a significant issue with this picker, due to the lack of support for Balinese as a script in computers. The only Unicode-based Balinese font I know of is Aksara Bali, but that font seems to only work as expected in Firefox on Mac OS X. Furthermore, the Aksara Bali font doesn’t handle ra repa as described in the Unicode Standard. The sequence <consonant , adeg-adeg, ra repa> produces a visible adeg-adeg, rather than the post-fixed form of ra repa. The sequence <consonant , vowel sign ra repa> produces the post-fixed form of ra repa, rather than the subjoined form. You can produce the post-fixed form with this font by using <consonant , vowel sign ra repa> and the subjoined form by using <consonant , adeg-adeg, ra, pepet>, but these sequences will produce content that cannot be matched against sequences using the Unicode approach, and content that may fail with other Unicode-compliant fonts in the future.
Hopefully some new, fully Unicode-compliant fonts will come along soon. This is one of the most beautiful scripts I have come across.
(Btw, I’m working on a set of notes for Balinese characters, linked from UniView, with some feature innovations to get around the font issue. Look out for that later. And I’m thinking I should develop a Javanese picker to go with this one. Just need a bit of time…)
For the curious, here’s the first article of the Universal Declaration of Human Rights, as typed in the Balinese picker. Translation by Tri Ediwan (reproduced from Omniglot).

Posted on Tue 31 Jan 2012 under general, i18n, utilities, web

>> Use UniView
The major change in this update is the update of the data to support Unicode version 6.1.0, which should be released today. (See the list of links to new Unicode blocks below.)
There are also a number of feature and bug related changes.
What UniView does: Look up and see characters (using graphics or fonts) and property information, view whole character blocks or custom ranges, select characters to paste into your document, paste in and discover unknown characters, search for characters, do hex/dec/ncr conversions, highlight character types, etc. etc. Supports Unicode 6.1 and written with Web Standards to work on a variety of browsers. No need to install anything.
List of changes:
-
One significant change enables you to display information in a separate window, rather than overwriting the information currently displayed. This can be done by typing/pasting/dragging a set of characters or character code values into the new Popout area and selecting the
icon alongside the Characters or Copy & paste input fields (depending on what you put in the popout window).
-
Two new icons were added to the Copy & paste area:
Clicking on this will display the characters in the area in the lower right part of the page with all relevant characters converted to uppercase, lowercase and titlecase. Characters that had no case conversion information are also listed.
Clicking on this produces the same kind of output as clicking on the icon just above, but shows the mappings for those characters that have been changed, eg. e→E.
-
Where character information displayed in the lower right panel has a case or decomposition mapping, UniView now displays the characters involved, rather than just giving the hex value(s), eg. Uppercase mapping: 0043 C. You will need a font on your system to see the characters displayed in this way, but whether or not you have a font, this provides a quick and easy way to copy the case-changed character (rather than having to copy the hex value and convert it first).
-
There is also a new line, slightly further down, when UniView is in graphic mode. This line starts with ‘As text:’, and shows the character using whatever default font you have on your system. Of course, if you don’t have a font that includes that character you won’t see it. This has been added to make it easier to copy and paste a character into text.
-
There is also a new line, slightly further down, when UniView is in graphic mode. This line starts with ‘As text:’, and shows the character using whatever default font you have on your system. Of course, if you don’t have a font that includes that character you won’t see it. This has been added to make it easier to copy and paste a character into text.
-
Fixed some small bugs, such as problems with search when U+29DC INCOMPLETE INFINITY is returned.
Enjoy.
Here are direct links to the new blocks added to Unicode 6.1:
Posted on Tue 25 Oct 2011 under general, i18n, utilities, web, writings
One of the more useful features of UniView is its ability to list the characters in a string with names and codepoints. This is particularly useful when you can’t tell what a string of characters contains because you don’t have a font, or because the script is too complex, etc.
For example, I was recently sent an email where my name was written in Persian as ایشیدا. The image shows how it looks in a nastaliq font.
To see the component characters, drop the string into UniView’s Copy & Paste field and click on the
icon. Here is the result:

Note how you can now see that there’s an invisible control character in the string. Note also that you see a graphic image for each character, which is a big help if the string you are investigating is just a sequence of boxes on your system.
Not only can you discover characters in this way, but you can create lists of characters which can be pasted into another document, and customise the format of those lists.
Pasting the list elsewhere
If you select this list and paste it into a document, you’ll see something like this:
0627 ARABIC LETTER ALEF
06CC ARABIC LETTER FARSI YEH
0634 ARABIC LETTER SHEEN
06CC ARABIC LETTER FARSI YEH
200C ZERO WIDTH NON-JOINER
062F ARABIC LETTER DAL
0627 ARABIC LETTER ALEF
You can make the characters appear by deselecting Use graphics on the Look up tab. (Of course, you need an arabic font to see the list as intended.)
ا 0627 ARABIC LETTER ALEF
ی 06CC ARABIC LETTER FARSI YEH
ش 0634 ARABIC LETTER SHEEN
ی 06CC ARABIC LETTER FARSI YEH
200C ZERO WIDTH NON-JOINER
د 062F ARABIC LETTER DAL
ا 0627 ARABIC LETTER ALEF
Customising the list format
What may be less obvious is that you can also customise the format of this list using the settings under the Options tab. For example, using the List format settings, I can produce a list that moves the character column between the number and the name, like this:
0627 ا ARABIC LETTER ALEF
06CC ی ARABIC LETTER FARSI YEH
0634 ش ARABIC LETTER SHEEN
06CC ی ARABIC LETTER FARSI YEH
200C ZERO WIDTH NON-JOINER
062F د ARABIC LETTER DAL
0627 ا ARABIC LETTER ALEF
Or I can remove one or more columns from the list, such as:
ا ARABIC LETTER ALEF
ی ARABIC LETTER FARSI YEH
ش ARABIC LETTER SHEEN
ی ARABIC LETTER FARSI YEH
ZERO WIDTH NON-JOINER
د ARABIC LETTER DAL
ا ARABIC LETTER ALEF
With the option Show U+ in lists I can also add or remove the U+ before the codepoint value. For example, this lets me produce the following list:
U+0627 ARABIC LETTER ALEF
U+06CC ARABIC LETTER FARSI YEH
U+0634 ARABIC LETTER SHEEN
U+06CC ARABIC LETTER FARSI YEH
U+200C ZERO WIDTH NON-JOINER
U+062F ARABIC LETTER DAL
U+0627 ARABIC LETTER ALEF
Other lists in UniView
We’ve shown how you can make a list of characters in the Cut & Paste box, but don’t forget that you can create lists for a Unicode block, custom range of characters, search list results, or list of codepoint values, etc. And not only that, but you can filter lists in various ways.
Here is just one quick example of how you can obtain a list of numbers for the Devanagari script.
- On the Look up tab, select Devanagari from the Unicode block pull down list.
- Select Show range as list and deselect (optional) Use graphics.
- Under the Filter tab, select Number from the Show properties pull down list.
- Click on Make list from highlights
You end up with the following list, that you can paste into your document.
० 0966 DEVANAGARI DIGIT ZERO
१ 0967 DEVANAGARI DIGIT ONE
२ 0968 DEVANAGARI DIGIT TWO
३ 0969 DEVANAGARI DIGIT THREE
४ 096A DEVANAGARI DIGIT FOUR
५ 096B DEVANAGARI DIGIT FIVE
६ 096C DEVANAGARI DIGIT SIX
७ 096D DEVANAGARI DIGIT SEVEN
८ 096E DEVANAGARI DIGIT EIGHT
९ 096F DEVANAGARI DIGIT NINE
(Of course, you can also customise the layout of this list as described in the previous section.)
Try it out.
Reversing the process: from list to string
To complete the circle, you can also cut & paste any of the lists in the blog text above into UniView, to explore each character’s properties or recreate the string.
Select one of the lists above and paste it into the Characters input field on the Look up tab. Hit the
icon alongside, and UniView will recreate the list for you. Click on each character to view detailed information about it.
If you want to recreate the string from the list, simply click on the
icon below the Copy & paste box, and the list of characters will be reconstituted in the box as a string.
Voila!
Posted on Fri 4 Mar 2011 under general, i18n, utilities, web

>> Use UniView
About the tool: Look up and see characters (using graphics or fonts) and property information, view whole character blocks or custom ranges, select characters to paste into your document, paste in and discover unknown characters, search for characters, do hex/dec/ncr conversions, highlight character types, etc. etc. Supports Unicode 6.0 and written with Web Standards to work on a variety of browsers. No need to install anything.
Latest changes: The majority of changes in this update relate to the user interface. They include the following:
- Many controls have been grouped under three tabs: Look up, Filter, and Options. Various previously dispersed controls were gathered together under the Filter and Options tabs. Many of the controls have been slightly renamed.
- The Search control has been moved to the top right of the window, where it is always visible.
- The old Text Area is now a Copy & Paste control that has a 2-dimensional input box. In browser such as Safari, Chrome and Firefox 4, this box can be stretched by the user to whatever size is preferred.
- The icon that provides a toggle switch between revealing detailed information for a character in a list or table, or copying that character to the Copy & Paste box has been redesigned. It stands alone and indicates the location of the current outcome using arrows.
It looks like this:
or this
.
- Title text has been provided for all controls, describing briefly what that control does. You can see this information by hovering over the control with the mouse.
Many of these changes were introduced to make it a little easier for newcomers to get to grips with UniView.
There were also some feature changes:
- The ‘Codepoints’ control was converted to accept text as well as code points and renamed ‘Characters’. By default the control expect hex code point values, but this can be switched using the radio buttons. For text, you would usually use the ‘Copy & Paste’ control, but if you want to check out some characters without disturbing the contents of that control, you can now do so by setting the ‘Character’ radio button on the ‘Characters’ control.
- The control to look up characters in the Unihan database
was fixed, but also extended to handle multiple characters at a time, opening a separate window for each character. (UniView warns you if you try to open more than 5 windows.)
- The control to send characters to the Unicode Conversion tool
was fixed and now puts the character content of the field in the green box of the Converter Tool. If you need to convert hex or decimal code point values, do that in the converter.
- The Show Age feature now works with lists, not just tables.
Posted on Fri 3 Dec 2010 under general, i18n, script notes, web, writings
Bopomofo, or zhùyīn fúhào, is an alphabet that is used for phonetic transliteration of Chinese text. It is usually only used in dictionaries or educational texts, to clarify the pronunciation of the Chinese ideographic characters.
This post is intended to evolve over time. I’ll post other blog posts or tweets as it changes. The current content is to the best of my knowledge correct. Please contribute comments (preferably with pointers to live examples) to help build an accurate picture if you spot something that needs correcting or expanding.
The name bopomofo is equivalent to saying “ABCD” in English, ie. it strings together the pronunciation of the first four characters in the zhuyin fuhao alphabet.
For more information about bopomofo, see Wikipedia and the Unicode Standard.
In this post we will summarise how bopomofo is displayed, to assist people involved in developing the CSS3 Ruby specification. These notes will focus on typical usage for Mandarin Chinese, rather than the extended usage for Minnan and Hakka languages.
Characters and tone marks
These are the bopomofo characters in the basic Unicode Bopomofo block.
One of these characters, U+3127 BOPOMOFO LETTER I, can appear as either a horizontal or vertical line, depending on the context.
In addition to the base characters, there are a set of Unicode characters that are used to express tones. For Mandarin Chinese, these characters are :

02C9 MODIFIER LETTER MACRON

02CA MODIFIER LETTER ACUTE ACCENT

02C7 CARON

02CB MODIFIER LETTER GRAVE ACCENT

02D9 DOT ABOVE
See the list in UniView.
It is important to understand that bopomofo tone marks are not combining characters. They are regular spacing characters that are stored after the sequence of bopomofo letters that make up a syllable. These tone marks can be displayed alongside bopomofo base characters in one of two ways.
Bopomofo used as ruby
When used to describe the phonetics of Chinese ideographs in running text (ie. ruby), bopomofo can be rendered in different ways. A bopomofo transliteration is always done on a character by character basis (ie. mono-ruby).
Horizontal base, horizontal ruby
In this approach the bopomofo is generally written above horizontal base text.
There appear to be two ways of displaying tone marks: (1) following the bopomofo characters for each ideograph, and (2) above the bopomofo characters, as if they were combining characters. We need clarity on which of these approaches is most common, and which needs to be supported. For details about tone placement in (2) see the next section.
Tones following:
Tones above:
Horizontal base, vertical ruby
This is a common configuration. The bopomofo appears in a vertical line to the right of each base character. In general, tone marks then appear to the right of the bopomofo characters, however there are some complications with regard to the actual positioning of these marks (see the next section for details).
Vertical base, vertical ruby
This works just like horizontal base+vertical ruby.
Vertical base, horizontal ruby
I don’t believe that this exists.
Tones in bopomofo ruby
In ruby text, tones 2-4 are displayed in their own vertical column to the right of the bopomofo letters, and tone 1 is displayed above the column of bopomofo letters.
The first tone
The first tone is not displayed. Here is an example of a syllable with the first tone. There are two bopomofo letters, but no tone mark.
Tones 2 to 4
The position of tones 2-4 depends on the number of bopomofo characters the tone modifies.
The Ministry of Education in Taiwan has issued charts indicating the expected positioning for vertically aligned bopomofo that conform roughly to this diagram:
Essentially, about half of the tone glyph box extends upwards from the top of the last bopomofo character box.
Tones in horizontal ruby are placed differently, relative to the bopomofo characters, according to the Ministry charts. Essentially, about half the width of the tone glyph extends to the right of the last bopomofo character in the sequence.
The charts cover alignment for vertical text (here, here and here) and for horizontal text (here, here and here).
In some cases the tone appears to be simply displayed alongside the last character in vertical text, as shown in these examples:
The light tone
When a light tone is used (U+02D9 DOT ABOVE). This appears at the top of the column of bopomofo letters, even though when written it appears after these in memory. The image just below illustrates this.
Note that the actual sequence of characters in memory is:
3109: BOPOMOFO LETTER D
3127: BOPOMOFO LETTER I
02D9: DOT ABOVE
The apparent placing of the dot above the first bopomofo letter is an artifact of rendering only.
Bopomofo written on its own
It is not common to see text written only in bopomofo, but it does occur from time to time for Chinese, and sometimes it is used for aboriginal Taiwanese languages.
In horizontal text
When written on its own in horizontal layout any tone marks are displayed as spacing characters after the syllable they modify.
Example: 
In vertical text
I haven’t seen bopomofo used in its own right in vertical text, so I don’t know whether in that case one puts the tone marks below the bopomofo letters for a syllable, or to the side like when bopomofo is used as ruby.
In horizontal text
I have also come across instances where a bopomofo character has been included among Chinese ideographs. It may be that this reflects slang or colloquial usage.
Example 1. Example 2.
Posted on Sun 22 Aug 2010 under general, i18n, utilities, web
Analyser: http://rishida.net/tools/analysestring/
Converter: http://rishida.net/tools/conversion/
The string analyser tool provides information about the characters in a string. One difference in this version is a new section “Data input as graphics”, where you see a horizontal sequence of graphics for each of the characters in the string you are analysing. This can be useful to get a screen snap of the characters. Of course, there is no combining or ligaturing behaviour involved – just a graphic per character.
You can reverse the character order for right-to-left scripts.
Another difference is that you can explode example text in the notes. Take this example: if you click on the Arabic word for Koran (red word near the bottom of the notes), you’ll see a pop-up window in the bottom right corner of the window that lists the characters in that word.
The other change is that the former “Related links” section in the sidebar is now called “Do more”, and the links carry the string you are analysing to the Converter or UniView apps.
Oh, and the page now remembers the options you set between refreshes, which makes life much easier.
The converter tool converts between characters and various escaped character formats. It was changed so that the “View names” button sends the characters to the string analyser tool. This means that you’ll now see graphics for the characters, and that, once on the string analyser page, you can change the amount of information displayed for each character (including showing font-based characters, if you need to).
I also fixed a bug related to the UTF-8 and UTF-16 input. Including spaces after the code values no longer fires off a bug.
Posted on Tue 18 May 2010 under general, i18n, utilities, web

>> Use UniView lite
>> Use UniView
About the tool: Look up and see characters (using graphics or fonts) and property information, view whole character blocks or custom ranges, select characters to paste into your document, paste in and discover unknown characters, search for characters, do hex/dec/ncr conversions, highlight character types, etc. etc. Supports Unicode 5.2 and written with Web Standards to work on a variety of browsers. No need to install anything.
Latest changes: The major change in this update is the addition of an alternative UniView lite interface for the tool that makes it easier to use UniView in restricted screen sizes, such as on mobile devices. The lite interface offers a subset of the functionality provided in the full version, rearranges the user interface and sets up some different defaults (eg. list view is the default, rather than the matrix view). However, the underlying code is the same – only the initial markup and the CSS are different.
Another significant change is that when you click on a character in a list or matrix that character is either added to the text area or detailed information for that character is displayed, but not now both at the same time. You switch between the two possibilities by clicking on the
icon. When the background is white (default) details are shown for the character. When the background is orange
the character will be added to the text area (like a character map or picker).
Information from my character database is now shown by default when you are shown detailed information for a character. The switch to disable this has been moved to the Options panel.
Text highlighted in red in information from the character database contains examples. In case you don’t have a font for viewing such examples, or in case you just want to better understand the component characters, you can now click on these and the component characters will be listed in a new window (using the String Analyzer tool).
Access to Settings panel has been moved slightly downwards and renamed Options in the full version.
The default order for items in lists is now <character><codepoint><name>, rather than the previous <codepoint><character><name>. This can still be changed in the Options panel, or by setting query parameters.
I changed the Next and Previous functions in the character detail pane so that it moves one codepoint at a time through the Unicode encoding space. The controls are now buttons rather than images.
Posted on Mon 8 Feb 2010 under general, i18n, utilities, web

About the tool: Look up and see characters (using graphics or fonts) and property information, view whole character blocks or custom ranges, select characters to paste into your document, paste in and discover unknown characters, search for characters, do hex/dec/ncr conversions, highlight character types, etc. etc. Supports Unicode 5.2 and written with Web Standards to work on a variety of browsers. No need to install anything.
Latest changes: The major change in this update is the addition of a function, Show age, to show the version of Unicode where a character was added (after version 1.1). The same information is also listed in the details given for a character in the lower right panel.
The trigger for context-sensitive help was reduced to the first character of a command name, rather than the whole command name. This improves behaviour for commands under More actions by allowing you to click on the command name rather than just the icon alongside to activate the command.
Some ‘quick start’ instructions were also added to the initial display to orient people new to the tool, and this help text was updated in various areas.
The highlighting mechanism was changed. Rather than highlight characters using a coloured border (which is typically not very visible), highlighting now works by greying out characters that are not highlighted. This also makes it clearer when nothing is highlighted.
In the recent past, when you converted a matrix to a list in the lower left panel, greyed-out rows would be added for non-characters. These are no longer displayed. Consequently, the command to remove such rows from the list (previously under More actions) has been removed.
A lot of invisible work went into replacing style attributes in the code with class names. This produces better source code, but doesn’t affect the user experience.
>> Use it
Posted on Mon 4 Jan 2010 under general, i18n, utilities, web

About the tool: This tool shows you what characters are in a string of Unicode characters, and gives you informaiton about each one. Either type/paste the string into the box on the right of the page, or send it in the URL. It’s especially useful if you have no font for the text, or you are trying to unravel a sequence of characters in a complex script, but also allows you to just dig out information about one or more characters.
Here’s an example
By default you see a large graphic image of each character, the Unicode code point number and name, the Unicode script block in which it occurs, any annotations in the Unicode Standard, and any notes for that character in my character database (which I also updated today with information about Hebrew, Malayalam, Lisu and other scripts).
However, the result can be tailored in terms of the level of information and various aspects of the presentation. Simply click on the options to the right of the page, or (again) include the relevant info in the URI.
For example, you can remove any of these items of information individually (except the codepoint and name), or add a text version of the character. You can also choose a smaller graphic.
In addition, notes from my character database contain examples (coloured red). By clicking on these examples you can list the characters in the example text without leaving the page. The list of characters shows up in the right margin.
Oh, and you can click on links to see a character in UniView (to explore its Unicode properties) or to show the whole block in which the character lives.
You’ll shortly see my other applications such as pickers, UniView, etc, linking to this app.
Hope it’s useful.
>> Use it
Posted on Sat 5 Dec 2009 under general, i18n, utilities, web
>> See what it can do
>> Use it

It took me a while to find the time, but I have finally upgraded UniView to suport the final 5.2 release of Unicode, plus a few extra features.
The order of blocks listed in the top left pulldown menu was changed to ressemble the order in the Unicode Charts page. Several sub-block selections were also added to the list (as in the Unicode page), and are displayed in italics.
When you display details of a character in the right panel, the heading Script group has now been used to indicate the sub-block-level headings in the block listings of the Unicode Standard. The link to the Unicode block now follows the heading Unicode block. These sub-block-level headings are also shown when you display a range as a list (as opposed to a matrix).
When you mouse over characters displayed in a matrix, the codepoint and name information for that character now appear just above the matrix. This makes it much easier to locate characters you are looking for.
Finally, but by no means least, small and large graphics are now available for all 1071 Egyptian Hieroglyph characters. This was the last block for which graphics were completely unavailable.
Posted on Wed 19 Aug 2009 under general, i18n, utilities, web
>> Use it

I have added a bunch of additional new features to my lookup tool to help with choosing language tags. There is additional information available when you look up subtags (such as what to use if the subtag is deprecated, and what subtags macrolanguages enclose, etc.), and more tests on well-formedness with clearer explanations of the problem. Example.
This should make it a lot more useful to people who haven’t read BCP 47 and want to create language tags. Hopefully, in a short while, I’ll also write and link to an article that describes how to use subtags from the ground up in a procedural way, that will complement the tool.
For further assistance, you can now link from a language subtag result to the SIL Ethnologue, to make it easier to check whether that subtag really does refer to the language you were thinking of.
In addition, script subtag results link to Unicode blocks in UniView.
Posted on Fri 31 Jul 2009 under general, i18n, utilities, web
>> See what it can do
>> Use it

Following hot on the heels of the last release come some further significant changes to UniView aimed at making it easier to use as Unicode grows.
The big change is that UniView now starts up in graphics mode by default. This means that pages load more slowly, but (especially with the continuing growth of Unicode) also means that you are more likely to be able to see the characters you are looking for. It’s easy to switch between modes at any point, using the “Use graphics” checkbox. (And if you preferred font glyphs as a default, you just need to change the URI in your bookmarked link slightly, and you can continue to work that way.)
To facilitate this change, I created my own graphics for a number of blocks which are not yet covered by decodeunicode, or which are no longer fully covered by decodeunicode. The blocks for which I provided graphics are Latin Extended-C, Latin Extended-D, Latin Extended Additional, Cyrillic Supplement, Cyrillic Extended-B, Modifier Tone Letters, Tibetan, Malayalam, Saurashtra, Ol Chiki, Myanmar, Kayah Li, Cham, Rejang, Vai, Supplemental Punctuation, and Miscellaneous Symbols and Arrows.
There are still many characters for which there are no graphics (especially the new characters in Unicode 5.2), but coverage is much better than it was. As I find more fonts, I will be able to create graphics for the remaining characters.
I also put a grey box around the characters in tables. This is particularly useful if there are no graphics or font glyphs for a block or range of characters, as it makes it easier to locate the character you are looking for.
I also fixed a bug that was preventing Chrome and Safari and IE from displaying the first two Latin blocks. I think the bug was actually in the Unicode data file.
Posted on Mon 27 Jul 2009 under general, i18n, utilities, web
>> See what it can do
>> Use it

With the family now in Japan, I had some extra time to spare this weekend, so I upgraded UniView to handle all the proposed characters for Unicode 5.2.
While the properties for new and modified characters are still in beta they are not officially stable, however the character allocations should be stable at this point. UniView therefore alerts you if you are looking at a new character.
If the Unicode database information has changed for a given character you are also warned, and provided with a link that points to the previous information for that character. These warnings will be removed from UniView when Unicode 5.2 is released.
Of course, you are unlikely to be able to actually see the new characters themselves, unless you are lucky enough to have a very new font to hand. The graphic alternatives are not available yet for these characters. I’m wondering whether it’s possible for me to do something about that, but that will take a little longer. In the meantime, you might find it more useful to view blocks in list view. (Click on ‘Show range as list’).
This release also fixes a few small bugs in the HTML and JavaScript code.
Posted on Sat 14 Feb 2009 under general, i18n, utilities, web
>> See what it can do !
>> Use it !

The major changes in this version include a new feature to normalise text as NFC or NFD, the ability to accept decimal code point values, and an overhaul of top part of the user interface.
Added buttons to the Text area to allow conversion of the text to NFC or NFD normalization forms. (You may not notice the change until you list the characters.)
The control panel was also substantially rearranged again to hopefully make it easier for newcomers to see what they can do.
The Code point conversion feature was upgraded to handle decimal code point values.
A single character in the codepoints area or text area is now listed in the lower left panel when you click on
, rather than in the right-hand properties panel. This is to improve consistency and avoid surprises.
Added a link to the CLDR property demo from the right panel to give access to additional properties.
Improved the parsing of codepoints when surrounded by text in the Code point input field, so that it now works with &#x…; and \u… and \U… escapes.
Jettisoned some unneeded code to reduce download by around 40-50K bytes. Implemented the NFC/NFD feature using AJAX, to avoid putting the download size back up.
When you delete the contents of the text area or the code point area, the associated input field is given focus, so you are ready for input.
A couple more minor bug fixes.
Posted on Wed 4 Feb 2009 under code notes, general, i18n, utilities, web
I was asked to make available the code for my normalization functions in JavaScript and PHP. The links are below. I’m making the code available under a Creative Commons Attribution-Noncommercial-Share Alike licence.
Disclaimers Note that I make no claim to have produced polished, compact or well-optimised code! The code does what I need, and I’m happy with that. You are welcome to suggest improvements, and I’m sure there are many that could be made.
As they say, this code is made available in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.
The code is a little more convoluted that it ought to be, to get around the fact that JavaScript doesn’t understand supplementary characters, and PHP just doesn’t naturally understand Unicode. (How I long for PHP6.)
Update: [[I meant to mention that there is a way of doing normalization in PHP already. I made this code available just because I had it. I created it as a learning exercise. It may be useful, however, if you are unable to load the ICU and intl packages onto your server.]]
To use the code, simply call nfc('your-text-string') or nfd('your-text-string') from your code and capture the result.
For PHP you’ll need these routines and this data.
For JavaScript look at these routines and this data. There is also a lite version of the data file that doesn’t include Han characters. I use this sometimes for bandwidth savings (about 14K less).
Test files I also created some test files for PHP and for JavaScript.
Both of these expect to find a copy of http://www.unicode.org/Public/UNIDATA/NormalizationTest.txt in the local directory. These files run 71,076 tests.
Cautions Be careful about the editor you use for the data files. I spent several hours fruitlessly debugging the routines, only to find that Notepad++ was displaying certain supplementary characters ok, but corrupting them on save. I switched to Notepad and the problem evaporated. And I probably don’t need to add that editing the data files in something like DreamWeaver is a bad idea because it will probably normalize the data before saving.
Another point: you may see Unicode replacement characters at a couple of points in the PHP source. These represent the first and last characters in the high surrogate range.
Experimenting If you want to play with something that uses this you could try my Tłįchǫ (Dogrib) character picker, or my Normalizer tool. I will slowly fit this to all the pickers and to UniView. I have a local version of UniView waiting in the wings that uses the PHP files via AJAX, to reduce download size. For that you need a file that returns the result as plain text across the wire, such as this.
Well, I hope that that may be of use to someone, somewhere. I hope I haven’t forgotten anything.
Posted on Wed 14 Jan 2009 under general, i18n, utilities, web
>> See what it can do !
>> Use it !

The major changes in this version relate to the way searching and property-based lookup is done on characters in the lower left panel, and features for refining and capturing the resulting lists.
Removed the two Highlight selection boxes. These used to highlight characters in the lower left panel with a specific property value. The Show selection box on the left (used to be Show list) now does that job if you set the Local checkbox alongside it. (Local is the default for this feature.)
As part of that move, the former SiR (search in range) checkbox that used to be alongside Custom range has been moved below the Search for input field, and renamed to Local. If Local is checked, searching can now be done on any content in the lower left panel, and the results are shown as highlighting, rather than a new list.
To complement these new highlighting capabilities, a new feature was added. If you click on the icon next to Make list from highlights the content of the lower left panel will be replaced by a list of just those items that are currently highlighted – whether the highlighting results from a search or a property listing. Note that this can also be useful to refine searches: perform an initial search, convert the result to a list, then perform another search on that list, and so on.
Finally got around to putting
icons after the pull-down lists. This means that if you want to reapply, say, a block selection after doing something else, only one click is needed (rather than having to choose another option, then choose the original option). The effect of this on the ease of use of UniView is much greater than I expected.
Added an icon
to the text area. If you click on this, all the characters in the lower left panel are copied into the text area. This is very useful for capturing the result of a search, or even a whole block. Note that if a list in the lower left panel contains unassigned code points, these are not copied to the text area.
As a result of the above changes, the way Show as graphics and Show range as list work internally was essential rewritten, but users shouldn’t see the difference.
Changed the label Character area to Text area.
Posted on Wed 7 Jan 2009 under general, i18n, utilities, web
>> See what it can do !
>> Use it !

The main change in this version is the reworking of the former Cut & paste and Code point(s) fields to make it easier to use UniView as a generalised picker.
Moved the cut&paste field downwards, made it larger, and changed the label to character area. This should make it easier to deal with text copy/cut & paste, and more obvious that that is possible with UniView. It is much clearer now that UniView provides character map/picker functionality, and not just character lookup.
Whereas previously you had to double-click to put a character in the lower left pane into the Cut&paste field, UniView now echoes characters to the Character area every time you (single) click on a character in the lower left hand pane. This can be turned off. Double-clicking will still add the codepoint of a character in the lower left panel to the Code points field.
The Character area has its own set of icons, some of which are new: ie. you can select the text, add a space, and change the font of the text in the area (as well as turn the echo on and off). I also spruced up the icons on the UI in general.
Note that on most browsers you can insert characters at the point in the Character area where you set the cursor, or you can overwrite a highlight range of characters, whereas (because of the non-standard way it handles selections and ranges) Internet Explorer will always add characters to the end of the line.
The Code points field has also been enlarged, and I moved the Show list pull-down to the left and Show as graphics and Show page as list to the right. This puts all the main commands for creating lists together on the left.
When you mouse over character in the lower left pane you now see both hex and decimal codepoint information. (Previously you just saw an unlabelled decimal number.) You will also find decimal code point values for characters displayed in the lower right panel.
Fixed a bug in the Code points input feature so that trailing spaces no longer produce errors, but also went much further than that. You can now add random text containing codepoints or most types of hex-based escaped characters to the input field, and UniView will seek them out to create the list. For example, if you paste the following into the Code points field:
the decomposition mapping is <U+CE20, U+11B8>, and not <U+110E, U+1173, U+11B8>.
the result will be:
CE20: 츠 [Hangul Syllables]
11B8: ᆸ HANGUL JONGSEONG PIEUP
110E: ᄎ HANGUL CHOSEONG CHIEUCH
1173: ᅳ HANGUL JUNGSEONG EU
11B8: ᆸ HANGUL JONGSEONG PIEUP
Of course, UniView is not able to tell that an ordinary word like ‘Abba’ is not a hex codepoint, so you obviously need to watch out for that and a few other situations, but much of the time this should make it much easier to extract codepoint information.
I still haven’t found a way to fix the display bug in Safari and Google Chrome that causes initial content in the lower left pane to be only partially displayed.
Posted on Sat 1 Nov 2008 under general, i18n, utilities, web
>> See what it can do !
>> Use it !

A large amount of code was rewritten to enable data to be downloaded from the server via AJAX at the point of need. This eliminates the long wait when you start to use UniView without the database information in your cache. This means that there is a slightly longer delay when you view a new block, but the code is designed so that if you have already downloaded data, you don’t have to retrieve it again from the server.
The search mechanism was also rewritten. The regular expressions used must now be supported in both JavaScript and PHP (PHP is used if not searching within the current range). When ‘other’ is ticked, the search will look in the alternative name fields, but not in other property settings (so you can no longer use something like ;AL; to search for characters with a particular property. (Use ‘Show list’ instead.))
Removed several zero-width space characters from the code, which means that UniView now works with Google Chrome, except for some annoying display bugs that I’m not sure how to fix – for example, the first time you try to display any block you only seem to get the top line (although, if you click or drag the mouse, the block is actually there). This seems to be WebKit related, since it happens in Safari, too.
Please report any bugs to me, and don’t forget to refresh any UniView files in your cache before using the new version.
Posted on Mon 7 Apr 2008 under general, i18n, utilities, web
>> See what it can do !
>> Use it !

Those of you who have used UniView over the last couple of days will have seen that it now supports Unicode 5.1. All Unicode 5.1 character information is available, however you will only be able to see the new characters if you have fonts that cover them. The decodeunicode graphics for the new characters are not yet available.
Last night I also fixed a long-running bug that had meant that additional information available in my character database was not accessible in Internet Explorer (due to AJAX issues). (See the related post if you are interested in the code).
There are no other changes at this time (though those two are pretty significant).
Please report any bugs to me, and don’t forget to refresh any UniView files in your cache before using the new version.