Don’t call me DOM

23 September 2005

Introducing “BigBrother”, yet another Semantic Web bot

As anybody really interested in the Semantic Web, I had to write a Semantic-Web enabled IRC bot; and so I did…

Well actually, the first thing I did was to review the existing bots; I’m OK with writing a bot to prove my attachment to the Semantic Web, but not if that requires some actual work. And the good news is: there is plenty of existing code to work from. A few notes on the ones I quickly looked at:

  • Emeka that I didn’t really test, since it relies on a recent API addition to 4Suite which wasn’t available on my Debian testing
  • FOAFBot had a lot of the features I was looking for my bot, but seemed a bit too complex for the kind of work I wanted to put in the project — I was really looking for a toy, nothing more
  • swBot sounded interesting, but was obviously unmaintained since its code didn’t change for the past 22 months; in particular, it wouldn’t run with a recent version of cwm

The final winner of this harsh selection was Julie also known as Redlandbot; it almost failed my tests since it was so hard to find the actual code — Christopher, if you read this, Julie’s homepage links a non-existent SVN repository.

But I eventually found it, and downloaded the four modules needed: redlandbot.py, flowcontrol.py, datauri.py, and wrapper.py. And I quickly understood that redlandbot was a perfect match: it relies on Redland, and so provides seemlessly access to a complete and compliant RDF parser, to a reasonably complete SPARQL engine (thus allowing to query RDF statements); furthermore, the code was in Python, easy enough to read and understand for me, and seemed fairly easy to extend.

And indeed, since one of the features I wanted for my bot was the ability to read iCalendar resources, it took me only a few minutes and exactly 11 lines to add support for that by modifying the modules managing Web access; of course, parsing an iCalendar resource doesn’t take 11 lines of code, but again, it was easy to re-use existing code, and in this particular case, the iCalendar to RDF converter that was developed in the Semantic Web Interest Group.

So, what’s my new bot about anyway? Well, it doesn’t really do much more than Julie – to the point I should have probably imported Julie’s code instead of modifying it. But I didn’t know that when I started. Anyway, the few additional features it has:

  • it reads iCalendars as mentioned
  • it uses a slightly more user-friendly interaction language; instead of addressing it with commands starting with a carret (^), you ask “BigBrother, do this” which I find easier to read, and also allows more than one bot to be on a channel without all reacting to the said input
  • instead of stored commands, BigBrother stores questions; as demonstrated below, you can ask it “BigBrother, where is dom?”
  • its default set of questions is pretty cool; you can ask where someone is on a given date, what’s someone’s phone number, etc; see below for more details
  • it doesn’t rely on a database connection, but on a in-memory storage, making it slightly easier to run anywhere – but probably slower too

So, here come a few examples of what the bot can do; I’ll start with a couple of fake URIs and modified responses, since the data on which the bot was working when this was captured are not publicly available.

So, let’s say that we have an HTML phonebook available at http://www.example.org/team-phonelist, and that this HTML has been annotated as FOAF data using GRDDL, for instance with one of my GRDDL transformations. We can ask the bot to load that page directly:

<dom> BigBrother, load http://www.example.org/team-phonelist
<BigBrother>  Model size increased by 571 to 11386 by adding GRDDL data.

And then ask it specific information from what it just read:

<dom> BigBrother, dom's phones?
<BigBrother>  tel:+33.4.55.55.55.55 Hazaël-Massieux
<BigBrother>  tel:+1.555.123.456 Hazaël-Massieux

Now, let’s ask it to look at an iCalendar file at http://www.example.org/team-calendar which holds information on who is where and when:

<dom> BigBrother,  load http://www.example.org/team-calendar
<BigBrother>  Model size increased by 1046 to 12432 by adding ICS data.
<dom> BigBrother, where is jo?
<BigBrother>  Lambda, Joe: Foo meeting, Edinburgh
<dom> BigBrother, where will be jo tomorrow?
<BigBrother>  Lambda, Joe: National Holiday
<dom> BigBrother, where will be jo on Dec 25?
<BigBrother>  Lambda, Joe: Vacation

As you can see above, the bot is capable of parsing date names in various string formats, thanks partly to mx.DateTime; unfortunately, python doesnt’ seem to have a function or a module nearly as powerful as PHP’s strtotime.

Now, let’s start to see other questions, this time with publicly available information. The list of W3C Team members is available in HTML GRDDL-annotated as FOAF; which allows to ask the following:

<dom> BigBrother,  load http://www.w3.org/People/all
<BigBrother>  Model size increased by 482 to 12914 by adding GRDDL data.
<dom> BigBrother, what does steven look like?
<BigBrother>  http://www.cwi.nl/~steven/steven-london.jpg Pemberton

Of course, we can also load directly RDF/XML:

<dom> BigBrother,  load http://www.w3.org/2002/01/tr-automation/tr.rdf
<BigBrother>  Model size increased by 2938 to 15852 by adding RDF/XML data.
<dom> BigBrother, what specifications did Dom edit?
<BigBrother>  Dominique Hazaël-Massieux Gleaning Resource Descriptions from Dialects of Languages (GRDDL)
<BigBrother>  Dominique Hazaël-Massieux QA Framework: Specification Guidelines
<BigBrother>  Dominique Hazaël-Massieux Variability in Specifications

And more generally, feed the bot with any SPARQL query (the line returns were added for sake of readability):

<dom> BigBrother, query PREFIX rec:<http://www.w3.org/2001/02pd/rec54#> 
            SELECT ?title WHERE {
           ?tr a rec:REC ?tr rec:patentRules <http://www.w3.org/Consortium/Patent-Policy-20040205/>
           ?tr dc:title ?title } 
<BigBrother>  xml:id Version 1.0
<BigBrother>  QA Framework: Specification Guidelines

All in all, I had a lot of fun playing with this, especially to see how powerful SPARQL is, and how useful RDF can be as a data agregation format; I’m not sure I’ll put any more effort in the code at this point, unless someone else work with me on it; there is already a few improvements needed to make it robust enough for a real use.

And it shows again how much power one can gain from the open source: had it not been for Dave’s, Christopher’s, DanC’s and others existing tools, I wouldn’t have been able even to start thinking to develop such a tool.

6 Responses to “Introducing “BigBrother”, yet another Semantic Web bot”

  1. B:datenbrei » Blog Archive » short update Says:

    […] Now, I’ve be in beet a few days, thus haven’t been hacking. Now I try to catch up with all the interessting stuff arround. My main interesst by now is Sparql/XMPP and Jena therefor. I’m going to implement a bot very simular to BigBrother but split up in an bot that talks with the user and one that serves Sparql Queries via XMPP – which seems to me like a natural extension of the RDF Net API. […]

  2. Stephanie_B Says:

    I think we’ll see a lot of interesting results when people start stress-testing SPARQL (i.e. writing queries that are more than just SELECT ?x WHERE (?x foaf:knows rich:Rich)). It’s debatable whether SPARQL is “good enough” (striking the right balance of simplicity and familiarity versus power), or whether missing features or a basis in SQL is detrimental, and I think that exposure to real applications will illuminate this.

  3. John Beale Says:

    I had my doubts when I first saw the SQL style in RDQL (or whichever), wondering why that approach was taken rather than an RPath. The latter did seem the intuitive approach. Since then I’ve only read the articles on the Path side, but have played around with the SQL side. Ok, I’ve run into some situations where something like a recursive call would have been welcome. But generally SPARQL appears to work a treat, and I reckon conceptually very easy for anyone with a vague idea of SQL.

  4. Andy Freeman Says:

    We have a bot project called Intel/Agent. You can find it at:

    http://www.intelagent.org

  5. jghk lswcgujk Says:

    kjlwhi dysba pkmutawrv nsjphkri ewcla wmskur gczjfroq

  6. nchauvat Says:

    I refactored the code of redlandbot. It was fun. If you want a more general-purpose intelligent agent, try http://www.logilab.org/project/narval

Picture of Dominique Hazael-MassieuxDominique Hazaël-Massieux (dom@w3.org) is part of the World Wide Web Consortium (W3C) Staff; his interests cover a number of Web technologies, as well as the usage of open source software in a distributed work environment.