W3C


As the W3C Team lead for financial data and the Semantic Web, I am looking at how the Web is changing the way investors assess the value of companies.

Public companies worldwide are required to file regular reports setting out the financial health of the company. These are available from corporate investor relations websites and from regulatory agencies like the Securities and Exchange Commission (SEC). If you want to analyze this data, you have to re-key it, which involves a lot of work and introduces errors. That is all about to change.

The SEC and kindred agencies around the world are starting to require companies to file reports in XBRL (the extensible business reporting language). XBRL ties each reported item of data to the reporting concept used to collect it, and moreover, does so in a way that computers can make sense of, avoiding the need for re-keying data.

XBRL will allow investor relation sites to support interactive access and sharing of tagged financial data. This will build upon Web 2.0 and the phenomenon of user provided content on wiki’s, blogs and social networking sites aimed at investors, e.g. wikinvest, where investors share data, insights and analyses. Youtube provides a powerful precedent in the way it allows people to share content by embedding a view or a link to a view in their blogs.

For XBRL, this means providing a way for people to browse the data, and to pull out tables and charts as needed for their blogs. These items could be rendered by the investor relations site and shared à la Youtube, or the blog could itself make use of a script to query data across one or more investor relations sites and render it locally. This is where the Semantic Web and linked open data comes in. I’ve previously reported on techniques for converting XBRL into RDF triples.

W3C and XBRL International are looking for your help in understanding what some people are calling “Investor relations 2.0″, and we invite you to attend a workshop at the FDIC training facility in Arlington, Virginia, this October. We want your help with identifying the opportunities and challenges for interactive access to business and financial data expressed in XBRL and related languages. This doesn’t just apply to the investor community, as the same technologies also offer huge potential for data published by governments on sites like data.gov (see demos). For more details on the workshop see the call for papers.

I recently joined the PrimeLife Project which is funded by the European Commission’s 7th Framework Programme. It aims to bring sustainable privacy and identity management to future networks and services, and builds upon the former Prime Project. Privacy is something that most people take for granted, but we leave a digital trail as we interact with websites, and this can lead to abuse ranging from identity theft, discrimination, or even mild embarrassment. Privacy enhancing technologies have the potential to restore the balance and give all of us better control over data we would prefer to keep private.

One of the challenges is the ease with which interactions can be linked across websites. Having to remember user names and passwords for a large number of websites is hard. The increasing use of email addresses in place of user names for signing into websites makes it easier to link interactions across sites since email addresses are globally unique names. OpenID offers users the means to use a single digital identity for accessing participating websites, and relies on the user providing an HTTP URL as a globally unique identifier, with the same drawback as using an email address.

Having to remember lots of user names is much too hard, but using a gloabally unique identifier just makes it easier for people to track your detailed behavior. What’s the solution? I have been thinking about the possible role of a trusted privacy provider. With OpenID you are asked to provide your HTTP URL to the website you are connecting to. Imagine instead, that you are asked to disclose your privacy provider (e.g. through a drop down list or typing a URL). The website then re-directs the browser to your privacy provider to sign in. If this is the first time you have visited the website, your privacy provider will ask you for your privacy preferences for interacting with that site. The approach allows you to effortlessly use a different identity for each website if you wish, and like OpenID avoids the need for you to sign in with every website you visit. There are lots of further opportunities for privacy management, but I will leave those to another blog.

JustSystems have kindly agreed to sponsor me as a W3C Fellow for work on XBRL and the Semantic Web. XBRL gives precise semantics to financial reports and has the backing of financial institutions around the world. There is tremendous potential for combining XBRL with the Semantic Web as a means to support the analysis and exploration of huge amounts of financial data. I hope to explore this potential in collaboration with XBRL International, the research groups working in this area, and the many individuals and companies interested in XBRL. Some of the things under consideration include open source tools, ontologies for relating XBRL taxonomies, and an exploration of ramifications for both XBRL and the Semantic Web, e.g. provenance and authenticity, the closed world assumption, and mathematical relationships within financial data. A likely starting point could be the launch of an Interest Group or even an Incubator Group to explore possible standardization activities complementing the role of XBRL International.

We are all used to the way in which bookmarks and links stop working as websites come and go, or as organizations feel the need for a change. Is there anything that can be done to reduce link rot, especially for organizations like W3C which seek to provide persistent URIs for key documents, but this is problematic as W3C won’t be around forever.

The inability of organizations to guarantee persistence for bindings from identitifiers to resources would seem to merit work on solutions that survive beyond the life of such organizations. Another angle on this is that who is to say now which resources will be more valuable in a few decades time: an obsolete spec from a standards organisation or a poem from a personal website? This is something that is hard to be sure of in advance.

A possible solution, or at least one worthy of study, is the idea of a distributed cache of resources that isn’t dependent on any one organization. The Google cache is promising, but we have no guarantees for how long items are kept and made available, nor whether Google itself will still be around in fifty years.

A distributed cache would need sufficient redundancy to preserve copies of occasionally accessed resources. The value of a resource could perhaps be measured on how often it is accessed. Static copies of dynamically generated resources may be okay for some purposes, but it may also be worth considering how to cache services and associated metadata.

This provides hints of a next generation Web where addresses are resolved through a distributed system rather than via direct contact to the named HTTP server. This would live alongside the existing web and would be an opt in solution for individual websites. There is no need for a change of addressing scheme.

You can think of this as a mass migration to virtual websites where the hosting service defines a framework for metadata, static and dynamic resources (executable service descriptions). It would need to provide careful attention to privacy, identity, security, and perhaps payment mechanisms. The framework would be implemented in a distributed way involving multiple cooperating providers.

How would such providers be rewarded for the resources they provide? I believe that multiple mechanisms are needed and would change over time. A very popular website would consume vastly more resources than one that is accessed infrequently. There are also considerations of differences in value systems across cultures and national boundaries. So a single solution is unlikely to work, and further study is needed to better understand how to balance a healthy business model for providers with the disparate needs of users.

It is time to move on from the perennial discussions of URNs versus URLs, and to consider the kind of Web we want to leave for our descendants. Do we want persistence and open data or are we willing to embrace an era where everything is ephemeral? Is Web Science up to the challenge?

There is huge potential for mobile web applications that can access device capabilities from client-side scripts. There has been a lot of work on J2ME APIs for Java based applications, but we lack standards for exposing local device capabilities to applications running in web browsers. The time has surely come for W3C to bring interested parties together to work on fixing this as a matter of priority. (more…)