Open Data and auto-discovery

Hi, my name is Christopher Gutteridge, I work for the innovation and development team of the University of Southampton, created the first version of their open data service data.soton.ac.uk and am one of the founders of data.ac.uk </bragsheet>

For a long time I’ve been interested in open data from organisations. Each organisation owns its own data but there’s lots of value in many organisations publishing similar open data in similar ways. Your organisation isn’t special it almost certainly has some of:

  • sites, buildings, rooms, desks
  • people, teams, departments, job roles
  • key webpages: contact us, search, freedom-of-information, message from the boss
  • a product catalogue
  • places (physical or online) where you can get a service which may have opening hours and specific offers of a service at a price, from coffee to brain surgery to car parking
  • research outputs or publications
  • social media accounts
  • news and notices
  • events

The exact data you store or publish about these things may vary (this includes the links between things, eg people-in-buildings). However, the basic concepts should be the same for many organisations and we’ve been looking at ideas around how to share this information without the need for Google or Facebook to act as an intermediary. The schema.org route is cool, but it doesn’t solve the problem I want to solve because web crawling embedded data isn’t the best way to get a dataset. Also, there’s no trust that data found by crawling

http://www.badgers.ac.uk/jeff/ is really official information, and not just a demo but Jeff the PhD student.

Screenshot from Auto-discovery documentAt data.ac.uk we have created a simple mechanism to discover such predictable information sets from an organisation from the web homepage. We are using this to autodiscover lists of research equipment in the UK academic sector and it has proved both effective and cheap (sustainable) while protecting the community from the risks normally associated with a hub that collates data suddenly going away. At the time of writing, 16 organisations, including 5 of the Russell Group, have implemented the OPD (organisation profile document), which is basically an auto-discoverable FOAF profile in Turtle which also describes the information sets an organisation has. While we’ve piloted this technique, it is by design anarchistic — anybody can expand and add to it. I want a web of data which doesn’t require silicon valley heavy hitters to let me work with open data.

Oh, there’s also equipment.data.ac.uk which now has open data from 40 contributing ac.uk institutions. Actually, there’s a whole lot of other datasets: http://www.data.ac.uk/data

I’ll be attending the Open Data Camp on Sunday and I’d love to tell you more about our work, either one-on-one or maybe in a session.

cjg@ecs.soton.ac.uk

@cgutteridge

@dataacuk

One thought on “Open Data and auto-discovery

Comments are closed.