RDF


As the W3C Team lead for financial data and the Semantic Web, I am looking at how the Web is changing the way investors assess the value of companies.

Public companies worldwide are required to file regular reports setting out the financial health of the company. These are available from corporate investor relations websites and from regulatory agencies like the Securities and Exchange Commission (SEC). If you want to analyze this data, you have to re-key it, which involves a lot of work and introduces errors. That is all about to change.

The SEC and kindred agencies around the world are starting to require companies to file reports in XBRL (the extensible business reporting language). XBRL ties each reported item of data to the reporting concept used to collect it, and moreover, does so in a way that computers can make sense of, avoiding the need for re-keying data.

XBRL will allow investor relation sites to support interactive access and sharing of tagged financial data. This will build upon Web 2.0 and the phenomenon of user provided content on wiki’s, blogs and social networking sites aimed at investors, e.g. wikinvest, where investors share data, insights and analyses. Youtube provides a powerful precedent in the way it allows people to share content by embedding a view or a link to a view in their blogs.

For XBRL, this means providing a way for people to browse the data, and to pull out tables and charts as needed for their blogs. These items could be rendered by the investor relations site and shared à la Youtube, or the blog could itself make use of a script to query data across one or more investor relations sites and render it locally. This is where the Semantic Web and linked open data comes in. I’ve previously reported on techniques for converting XBRL into RDF triples.

W3C and XBRL International are looking for your help in understanding what some people are calling “Investor relations 2.0″, and we invite you to attend a workshop at the FDIC training facility in Arlington, Virginia, this October. We want your help with identifying the opportunities and challenges for interactive access to business and financial data expressed in XBRL and related languages. This doesn’t just apply to the investor community, as the same technologies also offer huge potential for data published by governments on sites like data.gov (see demos). For more details on the workshop see the call for papers.

JustSystems have kindly agreed to sponsor me as a W3C Fellow for work on XBRL and the Semantic Web. XBRL gives precise semantics to financial reports and has the backing of financial institutions around the world. There is tremendous potential for combining XBRL with the Semantic Web as a means to support the analysis and exploration of huge amounts of financial data. I hope to explore this potential in collaboration with XBRL International, the research groups working in this area, and the many individuals and companies interested in XBRL. Some of the things under consideration include open source tools, ontologies for relating XBRL taxonomies, and an exploration of ramifications for both XBRL and the Semantic Web, e.g. provenance and authenticity, the closed world assumption, and mathematical relationships within financial data. A likely starting point could be the launch of an Interest Group or even an Incubator Group to explore possible standardization activities complementing the role of XBRL International.

XBRL is an XML language designed for filing company reports and backed by the SEC and regulatory authorities in Europe and Japan. It makes it possible to be very precise about the accounting concepts used in a particular report, including the means to define extensions to existing taxonomies. XBRL makes extensive use of XLink and as a result is hard to process with XSLT. I am exploring how to translate XBRL into RDF turtle syntax with C and libxml2, and preliminary experiments are very promising. The code processes the XBRL instance, its schema and all associated linkbases to extract RDF triples which are loaded into a scalable triple store such as sesame. XBRL viewers can then be implemented as server side scripts that query the triple store via SPARQL, which is much easier than manipulating the original XML files.

This also opens the theoretical possibility for XBRL filings to be submitted in one of the RDF syntaxes, e.g. turtle. The current XML syntax makes use of XML Schema to assist with validation of XBRL filings, and it will be interesting to look at validation using Semantic Web technologies as an alternative. I am looking forward to exploring the use of RDF with the rendering linkbase that is under development at XBRL.org.