Don’t call me DOM

10 January 2005

Spellchecker code available

Filed under:

I’ve been asked privately whether the code of spellchecker service run on the W3C site was available; it wasn’t, but now it is, along with tidy on-line and HTTP HEAD services code.

The spellchecker uses a fairly simple Python wrapper around aspell, that:

  • allows to pick the language of the document being spell checked – ideally, this would be autodetected in the HTTP headers (Content-Language) and in the HTML document itself (with the lang/xml:lang attributes)
  • presents the errors found, and optionally the possible corrections
  • links to a different form (whose code hasn’t been released yet) to add words in the local dictionary
  • works for HTTP-protected resources (Basic Authentication only)

For the last point, it relies on a Python module I use on pretty much all my Python CGIs, http_auth.py, which basically intercepts the 401 requests when doing an HTTP GET , sends it back to the originator, and re-uses the originator credentials to re-do the request.

If this feature is to be used:

  • http_auth.py must be in the path where python is going to search for modules, that is either in the same directory as the CGI script itself, in the global directory where python searches (à la /usr/lib/python2.3/site-packages/) or in any directory added manually to the path using sys.path.insert(0,'/path/to/my/directory/) in the CGI script
  • the CGI script must be transmitted the HTTP header containing the authentication credentials, which is not possible by defaults on any sane Web server configuration; in Apache, to make this possible, the following directive is needed:
    RewriteRule ^name_of_the_script(.*) name_of_the_script$1 [E=HTTP_AUTHORIZATION:%{HTTP:AUTHORIZATION},PT,L]

If the script is going to be used without the need for HTTP Authentication proxying, it is probably simpler to remove the underlying code, namely by replacing:

		                import http_auth
				url_opener = http_auth.ProxyAuthURLopener()

by

		                import urllib
				url_opener = urllib.FancyURLopener()

Patches to the code are welcome.

Picture of Dominique Hazael-MassieuxDominique Hazaël-Massieux (dom@w3.org) is part of the World Wide Web Consortium (W3C) Staff; his interests cover a number of Web technologies, as well as the usage of open source software in a distributed work environment.