Don’t call me DOM

28 August 2007

New W3C GRDDL service

Filed under:

Screenshot of the new W3C GRDDL Service

To celebrate the progress of GRDDL towards its final stage, and to replace the aging and somewhat unreliable XSLT-based GRDDL demonstrator, I’ve just released a brand new W3C GRDDL service.

It simply takes a Web page and extracts the RDF statements it can find in there using GRDDL.

I have made its source code available on W3C Public CVS server, but the gist the work is done by the underlying library, python-librdf, the Python binding for Redland. Particular thanks to Dave Beckett who provided amazing user support to help setting this up!

11 Responses to “New W3C GRDDL service”

  1. Justin Thorp Says:

    So for the average Joe developer, what does this mean to them? Will this take my microformatted data and turn it into RDF? If so, once the data is in RDF what would we do with it?

  2. dom Says:

    I fear it would be too long to answer that question in a single blog comment but hopefully the GRDDL primer answers some of it already:

    Another choice is to use microformats. A microformat that allows for more information about friends to be gleaned from the document is XFN, ” XHTML Friends Network”. Examples of such relationships are friends, colleagues, co-workers, and so on, as given in this example file.

    Since XFN relationships are embedded in anchor (a) elements, they can be expressed in RDF in a variety of ways. Given Jane’s HTML document uses the XFN microformat, a GRDDL transformation can extract RDF data. These descriptions would allow a RDF spider (a “scutter”) to follow links to additional RDF content that may include more XFN, vCard, or FOAF descriptions.

    To make a long story short, a microformat that uses a profile URI can be made GRDDL-friendly very easily, and thus provide a wealth of data to the semantic Web; these data can then be used and re-used, typically in mash-ups services.

  3. Justin Thorp Says:

    So for me as the end-user I don’t really have to worry about GRDDL or RDF, I just have to make sure that my blogroll is marked up using the XFN microformat? Someone else is going to screen scrape my blog to gather my relationships data and translate that into something more Semantic Web friendly?

  4. dom Says:

    Indeed (with the caveat that not all microformats are GRDDL-friendly).

  5. masaka Says:

    Hi, nice to see new service.

    Unfortunately, this doesn’t seem to work well with a profile based GRDDL whose encoding is not UTF-8 (OK for link based GRDDL). For example, I can get a proper result from http://www.kanzaki.com/docs/sw/ using XSLT service, but “Input is not proper UTF-8″ error from new service.

    Could you check for this issue ?

    thank you.

  6. Dom Says:

    Indeed, that’s a bug; I’m looking into it, but haven’t found a straightforward workaround yet…

  7. Dom Says:

    Looking at it more closely, it happens because the profile document http://www.kanzaki.com/ns/metaprof has its encoding only declared in the HTTP header, not in the XML encoding declaration.

    Of course, this is acceptable per the spec, so I have reported a bug in the underlying library:
    http://bugs.librdf.org/mantis/view.php?id=231
    (it ties back to a bug I had reported a while back in libxml2 I fear:
    http://bugzilla.gnome.org/show_bug.cgi?id=104790 )

  8. masaka Says:

    OK, I added an XML declaration to my profile, and got proper result.

    thank you for quick response!

  9. Dave Beckett Says:

    Turns out that due to the way I use libxml as a push parser, there is no way to pass in an external
    encoding: xmlCreatePushParserCtxt:

    xmlParserCtxtPtr
    xmlCreatePushParserCtxt(xmlSAXHandlerPtr sax, void *user_data,
    const char *chunk, int size, const char *filename);

    It would either need libxml to add a new API, I would have to call it a different way
    with knowledge of it’s internals (yuck) or it would need internal changes to libxml.

  10. Laurent Saint Jean Says:

    This service seems fabulous, but i get the utf-8 error two when trying the service on 3 sites i’ve tested.

    And when trying to parse this very blog, i get a blank page on Mac-Firefox2 (scripts are enabled)

    I still have the encoding stipulated though.

    ??

  11. Nodalities » Blog Archive » This Week’s Semantic Web Says:

    [...] W3C GRDDL service – announcement – also [...]

Picture of Dominique Hazael-MassieuxDominique Hazaël-Massieux (dom@w3.org) is part of the World Wide Web Consortium (W3C) Staff; his interests cover a number of Web technologies, as well as the usage of open source software in a distributed work environment.