As I mentioned a few days ago, my blogging tool, WordPress 1.5, doesn’t deal with named entities as it should. Namely, when fed with named entities, it outputs them as is in any context. But if named entities are fine in (X)HTML, they’re not with the various other flavors of RSS/RDF, where these entities cannot be parsed.
So, since I have two blogs running with wordpress that I don’t expect to update to a more recent version anytime soon, I first coded up a plugin that converts HTML named entities into numeric entities. I’ve applied it to this very instance of WordPress, and it seems to work as expected.
And since a plugin that unbreaks something doesn’t feel very right, I also made a proper patch that hopefully will fix the problem in future versions of wordpress, if it is applied to the trunk — I have created a bug report to that effect.
A few notes on the PHP code used to implement this change:
- I was first hoping that a simple function call à la
html_entity_decode($content,ENT_NOQUOTES,get_settings('blog_charset'))would do the right thing; unfortunately, a bug in PHP 4.x makes this fail for UTF-8, which the encoding all my blogs run in (and I expect most wordpress blogs do)
- then I was hoping to re-use the existing HTML entities table available in PHP through
get_html_translation_table(HTML_ENTITIES), but then again, that table doesn’t have all the named entities defined in HTML! A quick count shows that it has 107 known entities, when HTML defines 253 of them; I really can’t tell why this is so, and haven’t found a relevant bug report on this yet (although someone already reported that not all named entities were in the table)
- so, as ultimate solution, I had to build the mapping between named entities and their numeric equivalents for myself; it was easy enough to extract it from the HTML DTD itself using:
less /usr/share/xml/entities/xhtml/*.ent|grep '^<!ENTITY'|sed -e 's/^<\!ENTITY[ \t]*\([A-Za-z0-9]*\)[ \t]*"&#\([0-9]*\);".*$/"\1"=>\2,/'
- the rest of the coding was then completely trivial
The good news about all this is that it also made me discover how easy it is to create plugins for WordPress, so inspiration helping, I’ll be able to create those quite quickly from now on…