I’ve been lucky enough to have access to a pre-publication electronic version of the new Unicode Standard 5 book, and though I’ve been terribly busy just lately, I’ve carved out a little time to read and even use some of it. And I like what I see.

I’ve always thought the Unicode book was a really useful thing to have if you need to understand the ins-and-outs of Unicode for implementation purposes, or if you are simply interested in how scripts work. It has always been relatively easy to read, and more like a guidebook than a standard, if you know what I mean. The good news is that that seems to be even more the case in the latest version. There are lots of small edits that improve the clarity of the text and make it more readable.

In simple terms, a grapheme cluster is a sequence of characters that need to be kept together for things like wrapping text at the end of a line, cursor movement, delete, etc.

There are, however, some more significant changes that are also very welcome. For example, I’ve been looking at first-letter styling in CSS recently, particularly in the context of Indian scripts, but despite a lot of searching I was unable to figure out where the Standard actually told me that a default grapheme cluster didn’t cover a whole Indic syllable. The grapheme cluster concept is really quite an important one for implementations, and it was frustrating to see it described so poorly.

All that has changed with extensive additions to Chapter 3. Now section 3.6 Combination contains a substantial amount of new text that explains grapheme clusters quite clearly. Again, don’t be put off by the dour-sounding title for Chapter 3, Conformance. It contains lots of useful definitions and explanations in the typical clear and succinct style of the book.

I have to admit to a tinge of disappointment that the Standard Annexes which are now included in the book have simply been added as appendices, rather than integrated into the text proper. My evaluation copy didn’t actually contain this text, so I can’t comment further, however.

Also, I had decided a short while ago that I need to finally get to grips with Tibetan script, and some urgency has been added to that given that I will visit Bhutan in January. I was disappointed, therefore, to find that the section on Tibetan script had not been edited at all. That section has always been substandard, to my mind, in terms of clarity and writing style.

On the other hand, I see that useful additions have been made to existing block descriptions elsewhere (such as a useful additional section on Rendering of Thai Combining Marks in the Thai description). I see similar additions to block descriptions such as Lao, Gujarati and Gurmukhi, and the Bengali block description seems to have been largely rewritten. I’m looking forward to getting my teeth into those and also the numerous, enticing new block descriptions, such as Phags-pa, N’ko, Sumer-Akkadian (cuneiform) and the like.

So would I recommend it? Certainly. The Unicode Standard is a mine of useful and accessible information if, as I said, you are implementing Unicode-based applications or you are interested in how scripts work. And it’s worth replacing your previous version, not only because the new smaller format will make it much easier to handle and keep on your bookshelf, but because of the value of the many useful additions. I’ll be picking up my copy at the Unicode Conference in Washington next month.