Accesskey n skips to in page navigation. Skip to the content start

 
ishida >> utilities

UniView Help & User Guide

UniView is an XHTML-based application to look up and see characters (using graphics or fonts) and property information, view whole character blocks or custom ranges, select characters to paste into your document, paste in and discover unknown characters, search for characters using regular expressions, do hex/dec/ncr conversions, highlight character types, etc. etc. It supports Unicode 5.2 and is written with Web Standards to work on a variety of browsers

This help file relates to Version 5.2.0b of UniView. The major change in version 5.2.0b was the introduction of an alernative UniView lite interface, which is particularly well-suited for mobile devices. A number of other small changes were introduced to the full-sized version as well. For details of this and other changes see the change history.

The UniView lite interface provides a subset of the functionality described below, presented in a compact format that works well on screens with restricted size. Some optional settings have been moved to the Options tab to save space. In this help file FV describes the full version and LV the lite version.

The first three digits of the UniView version number reflect the version of Unicode that it supports. This version therefore supports Unicode version 5.2.0.

With UniView you can...

see a range of Unicode characters

FV Either select a Unicode block from the Show range pull-down, or type it using hex numbers into the Custom text box and click on  .

LV These controls are available under the Range tab.

The Custom field will accept various formats. The numbers must be in hexadecimal form and separated by a colon (the default), a hyphen, one or more spaces, or one or more periods. The numbers can be in the following formats: 1234, ሴ, \1234;, \u1234, U+1234. The actual number of hex digits can be between 1 and 6.

You can display the result as either a table of characters or a list (that includes names) by clicking the checkbox entitled Show range as list. If you use a list, you will also see the sub-block-level headings that are used in the Unicode Standard.

By default characters are shown as a matrix in the full version of UniView and as a list in UniView lite.

Unassigned character positions in a matrix are shown with a greyed out background (though you can change the colour, if you want).

view characters as graphics or font glyphs

Enable Use graphics to toggle between font glyphs and graphics. Using UTF-8 loads the page faster, but relies on you having a good Unicode font or selection of fonts to cover the Unicode code points. Many of the graphics are downloaded from the decodeunicode server, others (particularly more recently added characters) are provided by myself. The default is graphics mode.

If you are looking for fonts I recommend Wazu Japan's Gallery of Unicode Fonts or Alan Wood’s Unicode Resources. You should be able to find free fonts there for most characters. (You can also change the default font in UniView, if you wish.)

You can also tell UniView to start up in either graphic or font glyph mode by changing the URI. If you prefer to always start up in font glyph mode, just add the relevant extension to your bookmarked URI.

search for one or more characters based on text in the unicode database (eg. you know the name contains 'khwai')

FV Type in the string you want to search for in the box labeled Search text and hit enter or click on  .

LV Enter the text in the field under the Find tab and hit the Search button.

You can also use regular expressions in searches. For example, suppose you wanted to find all characters with the word 'tet'. You could type into the input field, \btet\b. (In UniView a colon can be used as a short form of \b so the example could have been written :tet:). The \b represents a word boundary. If you wanted to search for entries containing either the word 'tet' or the word 'tat' you can use the 'or' operator | as in \btet\b|\btat\b.

Another example: You want to search for 'alpha', but you only want results for the Latin characters (not the many Greek or mathematical results). Simply use the following search string latin.*alpha. The .* represents any number of intervening characters.

I haven't tested this feature to destruction, but most basic regular expressions that work in both JavaScript and PHP code should work.

Note that by default searches match against character names and alternative names in the main Unicode database, and also searches the information displayed for an individual character under the heading Description in the right panel. You can limit the search using the Names, Descriptions and Other checkboxes. Other refers to alternative names.

You can also limit the search to the specific range of characters currenlty in a displayed list. To limit the search, select the checkbox labelled Local. Matching characters will be highlighted. (In the full version, you can produce a list of just the highlighted characters by clicking on the icon next to Make list from highlights. If you need to refine your search, you can then search again on this list, and so on.)

discover what characters are in a string via cut and paste

FV Cut and paste the string into the box labeled Text area and hit enter or click on  .

LV Paste the characters into the field under the Find tab and hit the Chars button.

In the full version you can also apply a particular font to the text in the Text area by using the Font field alongside.

find out about one or more characters, whose hex or decimal code point value you know

FV Type or paste the hexadecimal codepoints in the box labeled Code points, and hit return or click on  . (See also the next point). By default, this feature will find and list hex code points. If you check the box labeled Decimal you can look up decimal code points instead.

LV Paste the codepoints into the field under the Find tab and hit the HexCP or DecCP button.

This field is very forgiving about the format of the text entered into the box. Most types of character escapes will be recognised, and you can even paste in surrounding text. For example, UniView will detect and list the characters referred to by codepoint in the text "the decomposition mapping is <U+CE20, U+11B8>, and not <U+110E, U+1173, U+11B8>." Of course, this is not foolproof, but should provide the desired results most of the time.

link to the Unihan database for any han character (FV only)

Type the hex number in the box labeled Code points or the character itself in the box labelled Text area, and click on  . The Unihan information will be displayed in a separate page.

You can also find a link to the Unihan database in the detailed character information in the lower right panel when a CJK Unified Ideograph is displayed there.

convert to hex, decimal, NCR, percent-encoded escape formats, or view utf-8 or utf-16 code equivalents, etc. (FV only)

Click on the icon  next to either the Code points or the Text area boxes to open the conversion tool. If there is a code point value or a string of characters in the box, values for those will be automatically shown when the conversion page opens.

(You can look up a single character at a time in FV by opening the details for that character and clicking on the Conversion tool link.)

apply a different font to displayed Unicode characters (FV only)

Type the name of the font in the input field labeled Font. Then hit enter or click on the  icon to the right. Characters for which there is a glyph in the font will use it. The default is that no font is set.

(This only has a visible effect if you have unchecked Show as graphics.)

You can also type a series of fonts, as per the usual CSS syntax, so that if one font is not available the next will be looked for (eg. 'Arial Unicode MS', sans-serif). If you want to use quotes, make them single quotes.

On Gecko-based and Opera browsers, font substitution will ensure that characters will be rendered if they are not in the font chosen but available in another font on your system. In Internet Explorer, if the font chosen doesn't have a glyph for a character, that character will not be displayed. (This can sometimes be useful for determining which characters are contained in a font.)

Note that a specific font only covers a certain range of Unicode characters. To return to the default font, empty the box and hit return or  .

If you don't have a font on your system that covers the characters shown, you can use the graphics switch.

This does not affect characters in the text area or in notes from the database.

(You can also specify your preferred default font).

Note that the font of the text in the Text area is set independently of this mechanism.

set the height of the viewable area in the left panel (FV only)

You can change the height of the display box in the lower left panel by clicking on Settings, then changing the value in the field. Don't forget to specify the 'px' or other measurement!

This is particularly useful when you are dealing with lists on a small screen such as a netbook. If you set the height to something like 400px, you can scroll through a long list, but still see the details in the right panel when you click on a character in the list.

list characters with a given General or Bidirectional property (FV only)

Select a property from the Search properties drop down list. If the Local checkbox is selected this will show characters only in the specified custom range.

show when a character was added to Unicode (FV only)

Click on the  icon alonside Show age. This currently only works when characters are displayed in a matrix. It shows version numbers for characters added after Unicode version 1.1.

You can also find the same information on a character-by-character basis when details are displayed in the lower right panel.

To remove the information, click on the Clear. icon alongside Show age.

While working with characters in a list or matrix you can also...

show detailed information for any character

UPDATEDEnsure that the icon has a white background (if not, click or tap on it), then click on the character. Details for that character are shown.

quickly transfer all or one characters to the text area

UPDATED Ensure that the icon has an orange background (if not, click or tap on it), then click on the character. The character will be appended to the Text area.

To copy all characters in the list or matrix into the Text area, click on the  icon, just below the Text area.

switch between viewing as a matrix or a list

Enable or disable Show range as list. (This only applies if the lower left panel was populated by selecting a range using Show range or Custom, since these are the only functions that produce a matrix.) Lists show character names and hex codepoints, in addition to the character.

search the names or descriptions

Set the Local checkbox for the search options, then specify what you want to search for in the seach input box. Search strings can be regular expressions, and you can specify what aspects of the information about a character are searched (see above for details). Characters that match will be highlighted.

highlight the characters by property (FV only)

Select the type of property you want from the Show properties selection list, after ensuring that the adjacent Local checkbox is checked. Characters with the selected property will be highlighted. Properties available include general category and combining class or directionality.

create a list of only those characters that are highlighted (FV only)

Mouse over More actions and click on the icon next to the Make list from highlights control. The characters shown in the lower left panel will be reduced to a list of just those that were highlighted. (This is particularly useful for refining searches.)

create a list of only those characters that are NOT highlighted (FV only)

Mouse over More actions and click on the icon next to the Make list from non-highlighted items control. The characters shown in the lower left panel will be reduced to a list of just those that were not highlighted. (This is particularly useful for refining searches.)

find out the decimal codepoint value for a character

Mouse over a character and the decimal code point value pops up in a tooltip. The decimal code point value is also shown in the right panel.

If mouseover doesn't work (for example on a mobile phone), display the details for the character and you will see the decimal codepoint listed.

increase the size of the characters in a matrix (FV only)

Click on Settings at the top right of the menu panels, then select a zoom factor from the pull down menu next to the label Left panel size that contains the text "100%".

This only works when Use graphics is unchecked. You can, however, use the browser's zoom function instead, although that affects all text, rather than just the characters themselves.

Increasing or decreasing a browser's text zoom can multiply the effect of the selector.

change the order or number of items on the lines in a list

This can be particularly useful when you want to copy and paste a list into another document. In the options, use the checkboxes after List format to indicate what you want to see.

switch between showing U+ before hex code point values

Toggle the checkbox labelled Show U+ in lists.

switch between showing or hiding hex numbers around the matrix

Toggle the checkbox labelled Hide numbers around matrix.

While looking at the detailed character information you can also...

view any character represented by a hex number (FV only)

Double click on the hex number, and release the mouse button. Then click on the highlighted text and drag and drop or copy and paste the Hex number to the area with a yellow background towards the right of the menu panels. The character will be displayed just above as you move your cursor out of the yellow area.

NEW view the composition of examples

In the information from the character database examples are coloured red. It is possible that you don't have a font for such examples, or cannot tell the composition easily in a complex script. If you click or tap on the red text a new window will open that lists the component characters using graphics.

view the previous codepoint in the Unicode database

Click on the Previous button at the top of the detailed information pane.

view the next codepoint in the Unicode database

Click on the Next button at the top of the detailed information pane.

display additional notes about characters where available

If Show character database information is enabled (default) in the options, then when you view detailed information for a character you will also see any available notes in my character database. The notes are continuously growing.

look up information about that character in other databases

Click on the CLDR's Property demo link. A new window will open to show the entry for that character in the CLDR database. This provides additional, less commonly used data and properties relating to the character.

Click on the decodeUnicode link. A new window will open to show the entry for that character in the decodeUnicode database. decodeUnicode is a wiki where people can provide information about characters.

decodeUnicode.org is a wiki where people can contribute information about Unicode blocks and characters. It is developed at the Department of Design at the University of Applied Sciences in Mainz. The project is supported by the Federal Ministry of Education and Research (BMBF) and has the objectives of creating a basis for fundamental typographic research and facilitating a textual approach to the characters of the world for all computer users. (They also provide the graphic versions of characters for UniView.)

Click on the FileFormat link. A new window will open to show the entry for that character in the FileFormat database.

The FileFormat pages provide useful information for Java and .Net programmers.

Click on the Conversion tool link. A new window will open to show a number of possible alternative representations of the character, eg. numeric character entity references, percent escaped forms, hex and decimal codepoint information, etc.

display the block to which the character belongs

Click on the link next to the subheading Unicode block and all characters in that block will be displayed as a list or matrix (according to your settings).

Working with the text area

The Text area is a set of controls for managing characters as text. It makes UniView like a character map or picker tool, but also much more.

To add characters to the text area, ensure that the icon has an orange background (if not, click or tap on it), then click on the character. The character will be appended to the Text area.

Alternatively, you can paste text into this area, or edit it directly. The  icon allows you to add spaces with a click.

The insertion point for characters echoed to the text area can be changed in most browsers by just clicking where you want characters to appear (but not Internet Explorer, where characters are always added to the end of the line). You can also highlight a range of text and any typed or echoed characters will replace the highlighted range.

The  icon is provided to simplify copying the text in the text area. It highlights all the text in the text area. (Particularly useful to check you have caught all combining characters.)

(FV only) You can display in the lower left panel a list of the characters in the text area, with names and codepoints, by clicking on the  icon. This is particularly useful for investigating text with characters you can't see or correctly identify. Simply paste the text into the text area and click on  , and UniView will produce a list of the names and codepoints for all the characters. (To do this in the lite version, paste the characters into the field just above and click/tap on the Chars button.)

Conversely, clicking on the  icon will copy to the text area all the characters currently displayed in the lower left panel. This is particularly useful for capturing search results, or making a list of all characters in a block, etc.

(FV only) You can convert the text to Unicode Normalization Form C or D (NFC or NFD) by clicking on either  or  . The change may not be obvious, but if you click on  you should see any changes to the text listed below.

(FV only) You can change the font of the text in the text area using the provided input field. This can help when the text is not supported by the default font used by UniView.

(FV only) You can also convert characters to various escape or other forms by clicking on Open the converter file, or look up han characters in the UniHan database by clicking on Look up this codepoint in the UniHan database. The Clear. icon clears all text from the text area.

Controls explained

If you click on the first letter of the label of any of the controls listed below, you will open this document at the appropriate place for an explanation of that control. This feature is available only for the full version.

Show range

Select a Unicode block from the pull-down list and the characters in the block will be displayed in the lower left panel. You can then click on characters to view detailed information about them or add them to the text area, etc.

You can display the result as either a matrix or a list (that includes names) by clicking the checkbox lower down entitled Show range as list.

Unassigned character positions are shown in the matrix with a greyed out background (though you can change the colour, if you want).

You can also specify a custom range by typing or pasting a hex codepoint range into the Custom box alongside this control.

Custom

If you type or paste a start and end code point value (in hex) into this control, the characters in the range will display in the lower left panel. Note that this can only be one contiguous range.

You can display the result as either a matrix or a list (that includes names) by clicking the checkbox lower down entitled Show range as list.

If the range you select does not fill a whole column when displayed as a matrix, surrounding characters are greyed out. (When displaying as a list, you will only see the characters in the range.)

The Custom field will accept various formats, making it easier to paste a range from elsewhere. The numbers must be in hexadecimal form but can be separated by either a colon (the default), a hyphen, one or more spaces, or one or more periods. The code point values themselves can be in the following formats: 1234, &#x1234;, \1234;, \u1234, U+1234. The actual number of hex digits can be between 1 and 6.

Code points

Add a list of hex code point values to this control and they will display in a list below.

You can also work with decimal code point values if the Decimal checkbox is selected.

You can do a two other things with these code point values, in addition to listing the characters below.

  1. Click on Open the converter file to convert the code points to various escaped or other forms, using the conversion tool.
  2. Click on Look up this codepoint in the UniHan database to look up a character in the UniHan database. (Only the first character will be looked up.)

Click on Clear. to quickly clear the control.

This field is very forgiving about the format of the text entered into the box. Most types of character escapes will be recognised, and you can even paste in surrounding text. For example, UniView will detect and list the characters referred to by codepoint in the text "the decomposition mapping is <U+CE20, U+11B8>, and not <U+110E, U+1173, U+11B8>." Of course, this is not foolproof, but should provide the desired results most of the time.

Search text

This control allows you to search for text in the Unicode database, and returns a list of matching characters.

You can use regular expressions in searches. For example, suppose you wanted to find all characters with the word 'tet'. You could type into the input field, \btet\b. The \b represents a word boundary. If you wanted to search for entries containing either the word 'tet' or the word 'tat' you can use the 'or' operator | as in \btet\b|\btat\b. (In UniView a colon can be used as a short form of \b so the example could have been written :tet:).

Another example: You want to search for 'alpha', but you only want results for the Latin characters (not the many Greek or mathematical results). Simply use the following search string latin.*alpha. The .* represents any number of intervening characters.

I haven't tested this feature to destruction, but most basic regular expressions that work in both JavaScript and PHP code should work.

Note that by default searches match against character names and alternative names in the main Unicode database, and also searches the information displayed for an individual character under the heading Description in the right panel. You can limit the search using the Names, Descriptions and Other checkboxes under the search input field. Other refers to alternative names.

You can also limit the search to the specific range of characters currently in the lower left panel. To limit the search, select the checkbox labelled Local. Matching characters will be highlighted. (You can then produce a list of just the highlighted characters by clicking on the icon next to More actions > Make list from highlights. If you need to refine your search, you could then search again on this list, and so on.)

Search properties

This control allows you to search for characters with a particular property. It lists matching characters below.

By default searches match against the characters currently listed in the lower left panel. Matching characters will be highlighted. (You can then produce a list of just the highlighted characters by clicking on the icon next to More actions > Make list from highlights. If you need to refine your search, you could then search again on this list, and so on.)

To enlarge the search to the whole of Unicode, deselect the checkbox labelled Local.

Show age

This control allows you to see when a character was added to Unicode (This currently only works when characters are displayed in a matrix.) It shows version numbers for characters added after Unicode version 1.1.

You can also find the same information on a character-by-character basis when details are displayed in the lower right panel.

To remove the information, click on the Clear. icon alongside.

Font

Unless you have specified a default font in the query of the URL you used to call UniView, the default is for no font to be set. Most browsers will look for available fonts on your system to display the characters.

You can use this control to explicitly change the font to one you have on your system. Simply type the name of the font in the box and hit return or  .

You can also type a series of fonts, as per the usual CSS syntax, so that if one font is not available the next will be looked for (eg. 'Arial Unicode MS', sans-serif). If you want to use quotes, make them single quotes.

On Gecko-based and Opera browsers, font substitution will ensure that characters will be rendered if they are not in the font chosen but available in another font on your system. In Internet Explorer, if the font chosen doesn't have a glyph for a character, that character will not be displayed. (This can sometimes be useful for determining which characters are contained in a font.)

Note that a specific font only covers a certain range of Unicode characters. To return to the default font, empty the box and hit return or  .

If you don't have a font on your system that covers the characters shown, you can use the graphics switch.

This does not affect characters in the text area or in notes from the database.

(You can also specify your preferred default font).

Use graphics

If the checkbox is selected, all characters except those in the text area and notes from the database will be shown as graphics, rather than text. The graphics are downloaded from the decodeunicode server.

If Unicode has recently issued a new version, it may take a while for the new characters to become available.

Show range as list

If the checkbox is selected, when you select a range using Show range or Custom the characters will be displayed as a list, rather than a matrix.

You can also use this to switch between matrix and list views of a range you have just selected.

Show character database information

When you click on a character listed in the lower left panel, detailed information about that character is displayed lower right. When the DB checkbox is selected, additional notes about a character are displayed (where available) at the bottom of the lower right hand panel. When you view character details for which notes exist you will automatically see those notes.

The notes are stored in a database compiled by myself. The information changes from time to time, as I add to or adapt the information in the database.

If you set up your bookmark to UniView to include database=on this feature will be on by default when you start UniView.

More actions > Make list from highlights

If you have highlighted items in a list in the lower left panel, using the Search text or Search properties controls, this control will remove all but the highlighted items from the list.

More actions > Make list from non-highlighted items

If you have highlighted items in a list in the lower left panel, using the Search text or Search properties controls, this control will remove all the highlighted items from the list, leaving the non-highlighted items only.

More actions > Remove unassigned characters from list

If you have unassigned characters in a list in the lower left panel this control will remove them from the list.

Settings > Show U+ in lists

If this is checked, hex code point numbers in lists in the lower left panel will be preceded by U+. The default is just the number.

Settings > List format

This allows you to change the order and items in lists appearing in the lower left panel. By default, you would see something like this:

0968 २ DEVANAGARI DIGIT TWO

With this control you can position the character before or after the number (or both!) or remove it altogether. You can also specify whether the list should show the number and/or the name of the character.

This control is provided for people who want some control over how the list will look when copied and pasted into their text.

Settings > Hide numbers around matrix

This allows you to hide the column and row numbers around a matrix. The default is to show the numbers.

Settings > Left panel height

This control allows you to change the height of the display box in the lower left panel. Don't forget to specify the 'px' or other measurement!

This is particularly useful when you are dealing with lists on a small screen such as a netbook. If you set the height to something like 400px, you can scroll through a long list, but still see the details in the right panel when you click on a character in the list.

Settings > Left panel size

This control allows you to increase the size of the characters in the lower left panel (independently of text elsewhere on the page). Note: It has no effect when viewing characters as graphics.

To see all text larger, use the normal browser method for zooming (eg. Ctrl++ and Ctrl+- in browsers on Windows).

You can tailor the program by...

using URIs to start up UniView with data in left or right panels

This is useful for pointing people to particular information using a URI, for example in email. By providing query parameters in the URI you can start up UniView with specific information displayed as follows:

You should only use one of these query parameters in a single call to UniView.

You can also start up UniView with character notes as follows:

uniview/?database=on This will automatically load notes from my character database when you view character details in the lower right panel. You can combine this parameter with any other. For more information about notes, see "display additional notes about characters where available" above.
eg. http://rishida.net/scripts/uniview/?block=thai&database=on

setting default display preferences

By providing query parameters when you call UniView you can modify the default settings for look and feel as follows:

You can use all or none of these query parameters in a single call to UniView.

If you store your bookmark with these parameters set, you will always open UniView with your preferences.

Acknowledgements and thanks

François Yergeau co-developed the Unicode Code Converter utility, and translated it into French.

Patrick Andries translated UniView into French, but that was many versions ago, and the French version is no longer available.

Change history

Changes in version 5.2.0b

The major change in this update is the addition of a new UniView lite interface for the tool that makes it easier to use UniView in restricted screen sizes, such as on mobile devices. The lite interface offers a subset of the functionality provided in the full version, rearranges the user interface and sets up some different defaults (eg. list view is the default, rather than the matrix view). However, the underlying code is the same - only the initial markup and the CSS are different.

Another significant change is that when you click on a character in a list or matrix that character is either added to the text area or detailed information for that character is displayed, but not now both at the same time. You switch between the two possibilities by clicking on the icon. When the background is white (default) details are shown for the character. When the background is orange the character will be added to the text area (like a character map or picker).

Information from my character database is now shown by default when you are shown detailed information for a character. The switch to disable this has been moved to the Options panel.

Text highlighted in red in information from the character database contains examples. In case you don't have a font for viewing such examples, or in case you just want to better understand the component characters, you can now click on these and the component characters will be listed in a new window (using the String Analyzer tool).

Access to Settings panel has been moved slightly downwards and renamed Options in the full version.

The default order for items in lists is now <character><codepoint><name>, rather than the previous <codepoint><character><name>. This can still be changed in the Options panel, or by setting query parameters.

I changed the Next and Previous functions in the character detail pane so that it moves one codepoint at a time through the Unicode encoding space. The controls are now buttons rather than images.

Changes in version 5.2.0a

The major change in this update is the addition of a function, Show age, to show the version of Unicode where a character was added. The same information is also listed in the details given for a character in the lower right panel.

The trigger for context-sensitive help was reduced to the first character of a command name, rather than the whole command name. This improves behaviour for commands under More actions by allowing you to click on the command name rather than just the icon alongside to activate the command.

The highlighting mechanism was changed. Rather than highlight characters using a coloured border (which is typically not very visible), highlighting now works by greying out characters that are not highlighted. This also makes it clearer when nothing is highlighted.

In the recent past, when you converted a matrix to a list in the lower left panel, greyed-out rows would be added for non-characters. These are no longer displayed. Consequently, the command to remove such rows from the list (previously under More actions) has been removed.

A lot of invisible work went into replacing style attributes in the code with class names. This produces better source code, but doesn't affect the user experience.

Some 'quick start' instructions were added to the initial display to orient people new to the tool, and this help text was updated in various areas.

Changes in version 5.2.0

The major change in this update is conformance to the final 5.2.0 version of the Unicode database.

The order of blocks listed in the top left pulldown menu was changed to ressemble the order in the Unicode Charts page. Several sub-block selections were also added to the list (as in the Unicode page), and are displayed in italics.

When you display details of a character in the right panel, the heading Script group has now been used to indicate the sub-block-level headings in the block listings of the Unicode Standard. The link to the Unicode block now follows the heading Unicode block. These sub-block-level headings are also shown when you display a range as a list (as opposed to a matrix).

When you mouse over characters displayed in a matrix, the codepoint and name information for that character now appear just above the matrix. This makes it much easier to locate characters you are looking for..

Finally, graphics are now available for all the many Egyptian Hieroglyph characters. This was the last block for which graphics were completely unavailable.

Changes in version 5.2(beta)a

The big change in this update is that UniView starts up in graphics mode by default. This means that pages load more slowly, but (especially with the continuing growth of Unicode) also means that you are more likely to be able to see the characters you are looking for. (If you preferred font glyphs as a default, you just need to change the URI in your bookmarked link slightly, and you can continue to work that way.)

To facilitate this change, I created my own graphics for blocks which are not yet covered by decodeunicode, or which are no longer fully covered by decodeunicode. The blocks for which I provided graphics are Latin Extended-C, Latin Extended-D, Latin Extended Additional, Cyrillic Supplement, Cyrillic Extended-B, Modifier Tone Letters, Tibetan, Malayalam, Saurashtra, Ol Chiki, Myanmar, Kayah Li, Cham, Rejang, Vai, Supplemental Punctuation, and Miscellaneous Symbols and Arrows.

There are still many characters for which there are no graphics (especially the new characters in Unicode 5.2), but coverage is much better than it was. As I find more fonts, I will be able to create graphics for the remaining characters.

I also put a grey box around the characters in tables. This is particularly useful if there are no graphics or font glyphs for a block or range of characters.

I also fixed a problem that was preventing Chrome and Safari and IE from displaying the first two Latin blocks.

Changes in version 5.2(beta)

This update adds the characters and changes proposed for Unicode 5.2.

While the properties for new and modified characters are still in beta they are not officially stable, however the characters should be stable at this point. UniView therefore alerts you if you are looking at a new character. If the Unicode database information has changed for a given character you are also warned, and provided with a link that points to the previous information for that character.

These warnings will be removed from UniView when Unicode 5.2 is released.

This release also fixes a few small bugs in the HTML and JavaScript code.

Changes in version 5.1.0f

This is a very small update.

Moved the  icon (to select all text in the text area) near to the beginning of the row, since this is a frequently used icon.

Caused the Open the converter file icons to link to the latest version of the converter tool, and work properly.

Text area now has the focus when you open UniView.

Changes in version 5.1.0e

Custom ranges are no longer enlarged to fill full columns in the matrix. Full columns are still shown, but characters not in the range specified are greyed out. When displaying the range as a list, only the characters in the specified range appear.

Context-sensitive help was added. The labels for controls link to a new section in this document where the controls are explained one by one. To get the help, click on the label.

Some changes were made to help those who want to copy and paste lists of characters to other documents. There are new settings that make it possible to automatically prefix hex code point numbers in lists with U+, if you prefer, and tailor what appears in the list, and where the character is shown. (You can also set U+ to appear by default by including u=yes in the URI you use to call UniView.) These controls are accessed by clicking on Settings (used to be called Options).

A control was added to remove unassigned characters from a list. This can be useful, for example, if you want a list of all characters in a block.

Another control was added that removes the highlighted characters from a list. This and the previous control were moved into a popup that opens as you mouse over the text More actions.

The default font was set to nothing, and the Font control was moved from the settings popup to the main control area. To reset the default font, simply delete the last font name in the font control and hit return.

If an unassigned character is displayed in the right panel, it is now possible to display the group it belongs to on the left. Hex code point numbers for unassigned characters in lists are now a minimum of 4 characters long. Unassigned characters now have a grey background in all lists. DB notes are no longer reported for unassigned characters (bugfix).

A couple more minor user interface changes.

Changes in version 5.1.0d

A major feature change is the addition of buttons to the Text area to allow conversion of the text to NFC or NFD normalization forms. (You may not notice the change until you list the characters.)

The control panel was also substantially rearranged again to hopefully make it easier for newcomers to see what they can do.

The Code point conversion feature was upgraded to handle decimal code point values.

A single character in the codepoints area or text area is now listed in the lower left panel when you click on  , rather than in the right-hand properties panel. This is to improve consistency and avoid surprises.

Added a link to the CLDR property demo from the right panel to give access to additional properties.

Improved the parsing of codepoints when surrounded by text in the Code point input field, so that it now works with &#x...; and \u... and \U... escapes.

Jettisoned some unneeded code to reduce download by around 40-50K bytes. Implemented the NFC/NFD feature using AJAX, to avoid putting the download size back up.

When you delete the contents of the text area or the code point area, the associated input field is given focus, so you are ready for input.

A couple more minor bug fixes.

Changes in version 5.1.0c

Removed the two Highlight selection boxes.These used to highlight characters in the lower left panel with a specific property value. The Show selection box on the left (used to be Show list) now does that job if you set the Local checkbox alongside it. (Local is the default for this feature.)

As part of that move, the former SiR (search in range) checkbox that used to be alongside Custom range has been moved below the Search for input field, and renamed to Local. If Local is checked, searching can now be done on any content in the lower left panel, and the results are shown as highlighting, rather than a new list.

To complement these new highlighting capabilities, a new feature was added. If you click on the icon next to Make list from highlights the content of the lower left panel will be replaced by a list of just those items that are currently highlighted - whether the highlighting results from a search or a property listing. Note that this can also be useful to refine searches: perform an initial search, convert the result to a list, then perform another search on that list, and so on.

Finally got around to putting  icons after the pull-down lists. This means that if you want to reapply, say, a block selection after doing something else, only one click is needed (rather than having to choose another option, then choose the original option). The effect of this on the ease of use of UniView is much greater than I expected.

Added an icon  to the text area. If you click on this, all the characters in the lower left panel are copied into the text area. This is very useful for capturing the result of a search, or even a whole block. Note that if a list in the lower left panel contains unassigned code points, these are not copied to the text area.

As a result of the above changes, the way Show as graphics and Show range as list work internally was essential rewritten, but users shouldn't see the difference.

Changed the label Character area to Text area.

Changes in version 5.1.0b

Moved the cut&paste field downwards, made it larger, and changed the label to character area. This should make it easier to deal with text copy/cut & paste, and more obvious that that is possible with UniView. It is much clearer now that UniView provides character map/picker functionality, and not just character lookup.

Whereas previously you had to double-click to put a character in the lower left pane into the Cut&paste field, UniView now echoes characters to the Character area every time you (single) click on a character in the lower left hand pane. This can be turned off. Double-clicking will still add the codepoint of a character in the lower left panel to the Code points field.

The Character area has its own set of icons, some of which are new: ie. you can select the text, add a space, and change the font of the text in the area (as well as turn the echo on and off). I also spruced up the icons on the UI in general.

Note that on most browsers you can insert characters at the point in the Character area where you set the cursor, or you can overwrite a highlight range of characters, whereas (because of the non-standard way it handles selections and ranges) Internet Explorer will always add characters to the end of the line.

The Code points field has also been enlarged, and I moved the Show list pull-down to the left and Show as graphics and Show page as list to the right. This puts all the main commands for creating lists together on the left.

When you mouse over character in the lower left pane you now see both hex and decimal codepoint information. (Previously you just saw an unlabelled decimal number.) You will also find decimal code point values for characters displayed in the lower right panel.

Fixed a bug in the Code points input feature so that trailing spaces no longer produce errors, but also went much further than that. You can now add random text containing codepoints or most types of hex-based escaped characters to the input field, and UniView will seek them out to create the list. For example, if you paste the following into the Code points field:

the decomposition mapping is <U+CE20, U+11B8>, and not <U+110E, U+1173, U+11B8>.

the result will be:

CE20: 츠 [Hangul Syllables]
11B8: ᆸ HANGUL JONGSEONG PIEUP
110E: ᄎ HANGUL CHOSEONG CHIEUCH
1173: ᅳ HANGUL JUNGSEONG EU
11B8: ᆸ HANGUL JONGSEONG PIEUP

Of course, UniView is not able to tell that an ordinary word like 'Abba' is not a hex codepoint, so you obviously need to watch out for that and a few other situations, but much of the time this should make it much easier to extract codepoint information.

I still haven't found a way to fix the display bug in Safari and Google Chrome that causes initial content in the lower left pane to be only partially displayed.

 

Changes in version 5.1.0a

A large amount of code was rewritten to enable data to be downloaded from the server via AJAX at the point of need. This eliminates the long wait when you start to use UniView without the database information in your cache. This means that there is a slightly longer delay when you view a new block, but the code is designed so that if you have already downloaded data, you don't have to retrieve it again from the server.

The search mechanism was also rewritten. The regular expressions used must now be supported in both JavaScript and PHP (PHP is used if not searching within the current range). When 'other' is ticked, the search will look in the alternative name fields, but not in other property settings (so you can no longer use something like ;AL; to search for characters with a particular property. (Use 'Show list' instead.)

Removed several zero-width space characters from the code, which means that UniView now works with Google Chrome, except for some annoying display bugs that I'm not sure how to fix - for example, the first time you try to display any block you only seem to get the top line (although, if you click or drag the mouse, the block is actually there). This seems to be WebKit related, since it happens in Safari, too.

Changes in version 5.1.0

Updated to cover Unicode Version 5.1.0.

Added <option value="(;R;)|(;AL;)">Right-to-left (R or AL)</option> to property lister.

Bugfix: fixed ranges supplied via URI query (used to still split).

Changes in version 5.0.0c

Changed the custom range input to a single field that will accept various range formats.

Added the ability to select whether Search looks at any combination of character names only, other parts of a record in the Unicode database, or the other character description information, and added a message to say how many characters were matched.

Added the ability to search within the range specified in the field entitled Range.

Added the ability to list characters with a given General or Bidirectional property (within a specified range or not).

Added an AJAX link to my database of information about Unicode characters. If enabled, using the DB checkbox, this automatically retrieves any available data for a character when information about that character is displayed in the lower right panel. You can also specify that UniView should open with that set as the default using database=on in the URI used to call UniView.

Because of the previous improvement, I removed the ability to link in a file of information about characters. (The information in the files was a copy of the information in the database.)

Moved the Code point(s) and Cut & paste fields lower, to make them easier to use.

Fixed a bug that was preventing the Search function finding characters in the Basic Latin block.

Bugfix: a range like 0036:0067 will always show full rows now; a range with start higher than end will show alert.

Added reference to decodeunicode when graphics are displayed in left column

Bugfix: search parameter won't break when graphics etc toggled

You can now specify windowHeight parameter at startup in the URI's query string.

Changes in version 5.0.0b

Extended the ability to open UniView with data displayed from a URI. In addition to specifying a block and a character, you can now specify a range, a list of codepoints, a list of characters, or a search string. This is useful for pointing people to results using URIs in links or email.

Switching between graphics or fonts for display of characters now refreshes the right panel also.

Clicking on the information about the script group of a character displayed in the right panel will cause that block to be displayed in the left panel. This is particularly useful when you find a single character and want to know what's around it.

Replaced the use of hyphens to specify block names in URI queries with underscores or %20. This may break some existing URIs, but fixes a bug that meant that block names that actually contain hyphens were not displaying.

Added an option to the right hand panel to display the current character in the Unicode Conversion tool.

Fixed some other bugs related to specifying Basic Latin block in a URI.

Reinstated CJK Unified Ideographics and Hangul Syllables in the block selection pull-down, but added a warning and opt out if the block you are about to display contains more than 2000 characters. Also added warning and opt out if you try to specify a range of over 2000 characters.

Changes in version 5.0.0a

Substantially revised the code so that UniView now handles ideographic and hangul characters and other characters not in the Unidata database. For example, ideographs now display in the left panel for a specified range and property values are available in the right panel.

Added regular expression support to the search input field.

Changes to the user interface: moved highlighting controls to the initial screens and move others, such as the chart numbering toggle, to the submenu under "Options"; provided wider input fields for codepoint and cut&paste input; replaced the graphics and list toggle icons with checkboxes; provided an icon to quickly clear the contents of the codepoint and cut&paste input fields. A link to the UniHan database was added alongside the Cut & paste input field: when clicked, this icon looks up the first character in either field. A link to the UniHan database was also added to the right panel when a Unified CJK character is displayed there.

The Codepoint input field now accepts more than one codepoint (separated by spaces).

When you double-click on a character in the left panel the codepoint is appended to the Codepoint input field as well as adding the character to the Cut & paste field.

When you click in the checkbox Show as graphics the change is immediately applied to whatever is in the left panel. It no longer redisplays the range if you are looking at, say, a list of characters generated by the Codepoint input, but redisplays the same list.

Set the default font to "Arial Unicode MS, sans-serif".

Added a message for those who do not have JavaScript turned on, and messages to please wait while data is being downloaded on initial startup.

Fixed the icons linking to the converter tool, so that the contents of the adjacent field are passed to the converter and converted automatically.

Added links in the right panel to FileFormat pages (in addition to decodeUnicode). The FileFormat pages provide useful information for Java and .Net users about a given character.

Removed the option to specify your own character notes (I'm not aware that anyone ever did, since it hasn't worked for a while now and no-one has complained). This is because AJAX technology will not allow an XML file to be included from another domain. When that is fixed I will reinstate it.

Fixed a number of other bugs, particularly related to supplementary character support and highlighting.

Changes in version 5.0.0

Updated to support Unicode 5.0.0.

Restyled the menu panels, moving some less used functions to pop up windows to save on horizontal space.

Implemented an AJAX approach for incorporating notes files. This means that the page no longer has to be reloaded to add notes. It is now also possible to add more than one set of notes at a time. Note that these changes requires a small change to the markup of notes files - the div containing the notes for display has to have a class name 'notes' as well as the id for the character.

I added some bundled notes files - most notably myanmar. Note that these are subject to change on an ongoing basis.

Most of the properties display in the character-detail panel on the right are taken from the unicodedata file at the moment. I plan to incorporate additional property information over the coming months, but wanted to release this now so that you can get information about Unicode 5 characters sooner rather than later.

Changes in version 4.1.0b

Added a link to the decodeUnicode wiki for each character that is displayed in the right-hand panel.

Provided a way to start up UniView with a particular block and/or character displayed as a table in the lower panels. This should be particularly useful for pointing a person to a particular Unicode block or character in a URI.

Fixed a couple of minor bugs in the CSS.

Changes in version 4.1.0a

Rearranged the top of the page to allow UniView to be used in narrower windows.

Added support for Unicode version 4.1.0.

Retrieves graphics from decodeunicode.org rather than the slow-loading and sparse graphics that were available from the Unicode site. Also added my own graphics where decodeunicode has gaps.

Moved the files to PHP. This enables a different approach to the inclusion of user-defined notes that now works on IE and Opera, too.

Another benefit of using PHP is that you can now prep the conversion page with data in the 'Code point' or 'Cut & paste' fields. By clicking on the appropriate icon, the conversion page will now open with the conversions already done for the relevant field.

Yet another benefit of PHP is that, if you really want to, you can now set various preferences related to the intial look and feel by specifying them as query parameters when you call UniView.

NOTE: If you want to be able to download UniView to your hard drive and you don't have a server and PHP, let me know. If enough people ask for it, I will create a downloadable zipped package again that will work without PHP (and without the additional notes feature). I will also post notes on how to customise various aspects of the setup.

Note also that I have disabled links to the French version until and new French translation has been prepared. I will probably not do language based content-negotiation.

Changes in version 4.1

Surrogate support added.

You can now double-click on any line in a list on the left, and the character will appear in the Cut&Paste field above.

Han and Hangul character glyphs are now displayed in the right panel after entering a codepoint in the Code Point field. There may not be much information available, but at least you can see the character if you have a font that supports it.

Changes in version 4.0.1

Minor improvements to user interface, including provision of tooltips for all feature selectors.

Disabled (attempts to) display user-defined notes for IE and Opera. I still haven't found how to make it work yet, even using proprietary coding, but at least the attempt won't crash the browser now.

Provided a facility to allow visible area in left panel to be increased.

Changes in version 4.0

Name changed to UniView.

Support for Unicode 4.0.0.

No frames. Cross-browser support.

You can specify your preferred default font for display of Unicode characters in prefs.js. If an alternative font is applied using the control on the page, it remains in force for any view until the user sets it back to the default.

Highlighting of General or Bidi properties remains in force until you disabled it, and applies to any matrix or list in the left panel (ie. including search results and cut & paste results).

Script blocks are now grouped with visible labels in the main range-selection pulldown.

Mousing over a character in matrix or list view produces a tool tip containing the decimal code value for the character. In the previous version this was the Hex value, and was limited to the matrix view.

There are no facilities to display information in a pop-up window instead of in the main window. If you want to temporarily display information separately, open a new window.

You used to be able to double-click on a list or on the character descriptions to make the highlighted text appear in various fields. This has not been implemented, but you can still highlight and drag, or copy and paste the text.

Because character sizes are specified in pixels for cross-browser consistency, you must use IE's accessibility options to increase character size in IE over and above what is available from the font size setting provided on the page.

Options for displaying in page descriptions of script blocks have been disabled. Open the files in a separate tab or window as a standalone file.

Author: Richard Ishida.

Valid XHTML 1.0!
Valid CSS!
Encoded in UTF-8!

Last update 2010-05-18 6:07 GMT