Don’t call me DOM


24 September 2004


I have been busy lately deploying a tool that I (and others) had started to develop one year ago, and had been stalled since then, informally called Annospam; the tool allows to cleanse W3C Mailing List Archives from its huge number of spams they host and are likely to continue to receive, however clever our anti-spams systems are getting.

The idea is to use the Annotea protocol as a way to store and retrieve spam marks on archived messages, and to regenerate the relevant archives based on these marks; it uses lots of W3C Technologies (XSLT as a way to build a user interface, RDF/XML as a data format, HTTP as a query/update protocol), which makes it really interesting, if sometimes somewhat challenging.

22 September 2004

XHTMLizer on steroids

Filed under:

One of the coolest things with XHTML is that it is an XML language, so you can apply any kind of XML tools to it.

One of the terrible thing with XHTML (and XML more generally) is how hard it is sometimes to get it right.

One of the depressing thing with building tools based on XML for Web technologies is that most of the content out there is in HTML (or the tag soup that some people call with that name), or in ill-formed XHTML.

13 September 2004

Give Spammers a rest!

Filed under:

Spammers, like many people down here, needs to rest after all their efforts; spammers needs to take a week-end break, too, as shows the repartion of the number of messages per weekday:

Statistics of received message per weekday (1 is Monday, 7 Sunday) These plots are based on the spam I received in the past 2 months.

Note that in fact, this interpretation is probably buggy; for instance, it’s likely that a fair number of Zombies computers used to send spams are shut down during the week-end.

« Newer entriesOlder entries »

Picture of Dominique Hazael-MassieuxDominique Hazaël-Massieux ( is part of the World Wide Web Consortium (W3C) Staff; his interests cover a number of Web technologies, as well as the usage of open source software in a distributed work environment.