All posts by Adam Tinworth

Open Data Case Study: How Belfast found £350,000 in rates revenues using open FHRS data

How do you prove the value of open data?

Here’s how.

FSA/SBRI case study

The Food Standards Agency’s Food Hygiene Rating Schemes data is released as open data in near real-time, and the Department for the Economy in Northern Ireland found a use for it.

Like every authority in the country, Belfast has a ratings shortfall – there are business rates that should be being collected, but aren’t for various reasons. And a bunch of smart people across various parts of the government and city council had a feeling that they could use datasets to improve the collection rate within the city.

To test the theory, the Small Business Research Initiative, which one of those people, Eoin McFadden of the Department for the Economy, describes as “public procurement of R&D to fix wicked issues”, invested in four early stage proof of concept projects that could solve the problem of identifying formerly empty premises that were back in use.

One project used footfall data, another wifi and bluetooth signals, but the other two applied machine learning techniques to a range of public datasets, both open and closed. The project, which ran from July 2016 to March 2017, ran in two stages – and the two machine learning project made it through to the second phase.

Massive return on investment

Over the two week test period, they identified around £350,000 of uncollected rates – from a total project cost of £130,000. If you want to prove the benefit of open data, money talks…

So, how did the FHRS open data contribute to this?

Well, the normal method of checking for missing rates was to target high value empty properties, and manually inspect them. That had a success rate of around 20%. But what if you could work out which properties were likely to be occupied, and target your inspections on them?

The machine learning projects used a mix of closed public datasets, including the ratings data and water rates, and two open data sets:

  • FHRS
  • Companies House data

All those data sets are good indicators of a property in use, but matching them is hard. Hence the application of machine learning derived fuzzy logic to identify properties which are likely to be back in use.

Making it work

There was a fair amount of persuasion needed to make this work. Public bodies needed to be persuaded to provide closed data sets to private companies, under tight non-disclosure agreements, but the effect was remarkable.

Once the systems were up and running, inspection visits based on a list of probably occupation had a 51% success rate – about 2.5 times the average before. Given that rates make up 75% of the councils’ revenue, the protectional for economic benefit from these systems has only just begun.

The Food Hygiene data is valuable in this, both because it is updated in near real time (and updates hit daily), and because it indicates businesses that have begun trading associated with an address. A month’s delay in publication would make it significantly less useful to the project.

Of course, it’s not only hospitality and food businesses that get missed – two examples of business that got found were a hostel and, to everyone’s amusement, a firm of accountants.

Future revenue potential

The two successful projects – one based in Belfast and one in Southampton – are both continuing to develop out their systems, and Belfast is in the process of going forwards with a procurement process to make this a routine part of their ratings work.

That’s a pretty good return on some open data and £130,000 of up-front investment.

Don’t let anyone tell you that open data successes are only small scale and personal…

Making Open Data Camp matter – to local economies and more

What is the value to the local economy of open data – and open data unconferences? The wider benefit of open data to local economies is harder to quantify. There’s no E-MC^2 equation of open data benefit yet.

So let’s talk about unconferneces, and Open Data Camp in particular.

Local value of open data camp

Some organisers have a sense that it stimulates the economy, but no sense of how to measure that. There’s local sponsorship – so they’re expecting some return on that investment. It might be an opportunity to meet potential customers, or to improve their operational intelligence.

Corporate social responsibility is one reason people sponsor: it’s both a community benefit, but it also benefits companies to have a thriving open data ecosystem.

Escaping the gravity of the capital

Just NOT having it in London is a good thing. Holding events away from London can be an incentive.

There’s a distinction between the benefit of open data on the city, versus the value of an open data conference in the city. There’s a clear basic financial benefit to the city in terms of hotel rooms, food and entertainment from events, as long as people are prepared to travel to attend the event. One event in an area succeeding gives other confidence to happen.

Travel is not just about money, but also about time to travel there.

Buy in from the host city makes a big difference. The city saying “no” to some investment in an event can kill it. There needs to be some vision for the city of the benefit, so you can sell it to them.

Unconferences can be wooly as to what the benefit is. Open Data Camp is deliberately avoiding London, both because too many events happen in London, and because people can be resident to it. You get local character and flavour, and you get people who might never come to an event if it wasn’t near to them. And they still get national organisations coming – because they don’t get out of London much, and get to make connections with local projects.

Connections make benefits

Those connections can turn into valuable projects. You’re not just connecting geographies, but different forms of organisation. People who don’t do data can start to see it in a physical way – to understand the data that describes the city they can see around them.

Can we improve the outcomes by theming the event? Or would that corrupt unconferences? People tend to take advantage of the location to discuss local issues – like the interface/divided communities session at this event. And that can be very valuable – giving people insights into unexpected uses of data.

The Queen’s University, Belfast computing department is often empty of a weekend. Why not use the space for events like this. Let people come in and find out new things. Being physically in different places give opportunities to explore new technologies, like iBeacons or VR tech.

Look at the data you have, and the data you can get, and the technologies are coming along – and then the space to think about how to combine them. Ideas start at those sorts of meetings – and we need those case studies.

Catalysing other fields

Bring in other kinds of people – English Lit students could find open data techniques useful in extracting what they need from books. You can avoid massive wastes of time and effort by bring people together in a way that allows them to realise what they can offer to each other.

At the moment, Open Data Camps are open data people talking to open data people. Could we have a Friday where we open up our experts to other people. That means we could say we advised start-ups and students, and contributed to the economy.

Pre-activation of people – letting them plan for hacker spaces, or offering open data surgeries would be possible ideas.

We’re trying to capture the sessions via Drawnalism, and we’re putting that on the blog. But should we be pushing onwards wit it, telling case studies and stories around events or projects that spin out of open data camp sessions and meetings?

But what about the wider benefit of open data to local economies? There’s no E-MC^2 equation of open data benefit yet.

Session Notes

Maps, Maps, Maps: good maps, bad maps and accessible maps

What do you do if you find QGIS too easy (and like pain) – you start mapping in R.

But what do people in the room do with mapping, and what data sets do they use?

Putting Open Data on Maps. A good thing?

In Birmingham they used Edubase to plot previous ‘catchment’ areas for schools. Some schools do it from the centre of schools, some from the school gates. And some schools have more than one gate… Some were basing it on distance to the nearest train station. It was about creating boundaries, and then you could set up a tool based on postcodes to see if people are within the boundaries are not.

Mapping challenges

Newer cross rails stations might positively benefit house prices – so it was an interesting thing to map. But what is a railway station? That matters if you are determining distances – is it the station or the platform or the car park? These things matter.

The Ordnance Survey released a green spaces map – which seems to have quite a few holes in it. There are some open tree and green spaces databases. There hasn’t been any analysis done on the relationship between house prices and green spaces. The research tends to focus on negative factors, not positive ones. There’s research about well-being and green spaces, but not house prices.

Tom Smith and the Data Science Campus have started using machine learning to identify areas – and then layer other data on top of that, like house price data. Scottish LiDAR is available, as is Northern Ireland, but it’s somewhat patchy. The centre of Belfast isn’t covered! It’s an issue with how the data was collected.

There’s been too much of companies just shoving data on a map. Often totally useless visualisations on a map are created, with too much data. Should it be four different maps? Or do we need to segment by seriousness. What’s the use case for the map? There are some cases where putting data on a map is actively unhelpful. Wales has such massive variations in population density that just putting data on a map is often not useful.

You need to ask yourself what the narrative is of the data you’re putting on a map.

Mapping disasters

All the Trafford Centre crime was moved to a nearby street having a huge impact on the perception of the crime. There have been other examples, where all the crime without a geo reference was assigned to the police station – which has a massive impact on the surrounding area, which was suddenly a massive crime blackspot, according to the figures.

In the 1920s the American petrol companies took all the railways off the maps they sold, changing people’s perceptions of travel.

A small GIS change can be done at a national level, that has a huge ramification at a local level – like school applications. And that creates real anger. One school deliberately changed their measurement point to the middle of their playing fields to exclude a council estate. It’s about more than points on maps, but special analysis – patterns and correlations might not be readily available.

Mapping Tools

Pub data from Open Street Map can be used for fun maps – like pub densities. The hotspot is in Islington. One person recommended ScapeToad to create cartograms.

There’s plenty of debate on the value of R in mapping. Some people (the Python fans) love it. Some really don’t. There’s options to integrate statistic and mapping stuff with Shiny. QGIS is great for special analysis, by R Shiny is great for pushing tools out that people can explore data with. It can have problems with scaling, though. Choose the context for use carefully.

R is good for learning to code – Jupiter with R is a great combination.

But what about paper maps?

There is an issue of access – 20% of people still aren’t digital connected, and you need paper to access them. And there’s a role for paper maps in exhibitions. People will screenshot Interactive tools to put them in reports – we’re not great at producing aesthetically pleasing maps for those uses. If they screenshot, things like dates and copyright statements are missing. We should make automatic “print this” options.

One attendee has a printed paper map of a count in the 17th century, used for writing. Big, printed maps have their uses. They’re great tools for community interaction in groups.

The French Open Street Maps community used Field Maps to engage with people, getting them to colour in where, for example, schools are.

Map accessibility

This raises the question: are pop-ups in an interactive map a bad thing? Paper maps appear to show everything. But they didn’t really – there’s a real art to what they leave out, to make sure nothing undermines the purpose of the map.

Maps really are the ultimate visualisation tool, but you have to ask what the map is for. But can you really guess what people are actually going to use maps for? Probably not – they will always surprise you.

Do you think about red/green colour blindness – it’s such a simple thing to miss, but it can destroy the usefulness of your map for a percentage of the population or, worse, lead to them misinterpreting them.

3D Maps? Depends on the context. You can do that in Excel 2016. It can be really effective. Physical maps are really useful for blind people – but consult with the people you’re targeting. Not all blind people are the same. Acuity Design are good at this. UCL are experimenting with soundscape for the blind.

Session Notes

ODI Nodes: a state of the nation discussion

One of the first sessions on Sunday morning session at Open Data Camp 5 gave people from the ODI Nodes network the chance to meet and discuss progress, under the Chatham House rule.

ODI Node Map

There’s some tension between the ODI’s suggestion that the nodes might become more commercial, and some nodes aren’t really keen on that direction. Some – including Bristol – have reorganised on a way that would allow the work to continue even if they are no longer a node.

On the other hand, ODI Leeds was founded to be commercialised. It’s unique in that. Any money made is reinvested – it’s a not-for-profit. That allows things like pretty much all free events. They have a huge network of data scientists, agencies, developers – and that allows them to put project teams together. They’ve been working with Adobe (on a better PDF…). They also work a lot with the sponsoring councils, including on Yorkshire flooding.

ODI HQ put out open calls on their website, based on people who approach them. Either nodes or private organisations can bid for those tenders. There tend to be a lot of them in the autumn, to be completed by the spring. Probably something to do with budgets.

Open data in Northern Ireland

What about ODI Belfast? They’ve been running for two and a half years. There’s a team of three, down from four. Their impression is that open data is a bit behind in Northern Ireland compared to the rest of the UK, so a lot of their work is campaigning. The government came out with a strategy to establish a node – after the node was established!

But they need more data release to kickstart the data supply chain. They’re a community-based, storytelling node. They had a 120 person conference, that had a huge impact. The connections made there have led to more projects between commercial organisations and government.

They have a strategy, they have a government commitment, they have infrastructure going into place – and no government at the moment. They’re all hoping for something positive to happen with the government.

This does emphasise how important senior buy-in is – some people feel that government commitment has waned since the changes at the top of the government there.

The Open Data view from elsewhere

ODI Cardiff is run by a commercial organisation. They were banging on about open data before that, but the node status has definitely helped open doors. Wales is a bit like Northern Ireland, a little behind. But a positive debate in the Welsh parliament gave a sense that both sides of the house were behind open data. And ODI Cardiff were cited in that debate.

ODI Birmingham is hosted by Birmingham Innovation, and it has started using the logo on their footer. ODI HQ is going to be doing some training in Birmingham. Several Nodes expressed a desire to offer more training. The commitment needed for the five day train the trainers course in London is a big one – and you need an ODI-accredited trainer to deliver ODI branded courses.

There’s a distinction between awareness-raising sessions and non-branded training, though, some people have found flexibility to deliver what is needed.

There’s also need for a recognition of income disparities in different parts of the country, and the prices they can be reasonable expected to pay.

Relationship with ODI HQ

Some Nodes feel that it’s better to ask for forgiveness than permission, because people in London don’t always have a perfect understanding of local political context.

Everyone seems to want a big meetup of Nodes with the ODI in London. But there’s still a strong sense that affiliation with the ODI is valuable and desirable both practically and emotional. It has both incredible lineage and some significant power. There’s a commercial value in the ODI name.

Money flows from the Nodes to the ODI, not the other way around. There is discussion about non-commercial nodes not paying, and commercial nodes only paying after an income threshold. And that money may be ring-fenced for the benefit of the Node network.

There’s some disappointment that there’s no official ODI HQ representation at Open Data Camp – but also that the Nodes can self-organise, and arrange their own meetings and communication.

Session Notes

Open Data Camp Belfast: Day Two Pitches

A good turnout for a Sunday morning, as we get ready for the pitches. But we have someone significant here…

And we’re off…

The Pitches

ODCAmp 5 Pitches

  • How do you effectively engage the tech community to use open government data?
  • Accessible file formats. Is there life beyond the five stars of open?
  • ODI Nodes – what are they and why should you care?
  • Open data and art – storytelling.
  • The impact of private sector demand on open data.
  • Why map open data? What do you get out of it? What data sources do you use?
  • Create a shared doc about the reasons people don’t use open data – and solutions
  • How do you overcome data quality concerns? Good enough – is it good enough?
  • Who owns the data? How do you get it open, how do you share it – and how do you deal with people withdrawing?

  • How you use open data for public good…
  • Dog-fooding brain-picking. How? What’s the best way?
  • Case Studies of NI open data: what have we done for you lately?
  • Data Catalogues – what are they and why do you publish them?

  • Data Ethics Canvas – ethics before compliance!
  • Open Data Challenges – how can we do them better?
  • Gender data in the UK
  • What do we do about wrong or inaccurate open data? Where are the 100m trees in Belfast?
  • Meta-metadata – why do we publish it? Why should we publish it? What annoys us?
  • Open data careers – there’s a lot of public or ex-public sector people here. How do you get involved – and what’s next?
  • Building APIs for open data
  • How can the community overcome arcane attitudes to data?

  • FSA – the open data journey (with a focus on reference data)
  • What’s the value of Open Data Camp to the local economy?
  • Chatham House session on government data in the face of GDPR
  • How can we help charities do more with open data?

Could free wifi use data be useful to Belfast?

If you walk through a wifi area and have wifi enabled on your phone, the system can track a certain amount about your presence and movement. They could have that data for Belfast’s city council run wifi networks, which are on around 70 buildings – so what can be done with it? If they had enough compelling use cases they could partner with other organisations to grow the data set.

That data includes things like the device MAC address, the SSID of the network you’ve connected to, and so on.

When you login, you give consent for that data to be collected and used. You don’t if you haven’t connected. Most mobile phones announce their presence to find wifi hotspots.

2017 10 21 MB ODCamp Day One 5

What value would a wifi nerd see in this?

Tracking people in aggregate, leaning about their movements, and traffic pattern changing over time. It could identify the need for public transport, crossing or even taxi provision. It can all be anonymised – and Transport for London already does this to understand London’s transport needs.

You could even use real-time analytics to spot incidents happening through traffic pattern disruptions. Accumulation of this data over time makes it more useful.

The consent conundrum

“If you get consent, keep it forever,” said one attendee. Belfast would want at least a few years, for seasonality analysis, for example.

People are quite happy to give up this information in exchange for a benefit. There’s the tangible benefit of wifi access, but the other benefit is the positive change that it can create in the city. The stories of that need to be told.

That said, exceptional movement can be tracked down to the individual, of there are repeating patterns – so you want to apply some form of filter to that. This was an issue with London’s bike scheme. Maybe hash the MAC address with a date/time salt to make it extremely difficult to reverse that down to a particular information. But – you lose potentially useful patterns through doing that. Under GDPR, you’ll need to hold on to the unique identifier, but you can still anonymise the data and release it as usefully as possible.

Remember, the phone networks already know where you are, and are commercialising that data.

Real-time data

Heat maps for events could be useful – where did they go, and what was the impact? Did they go on to other attractions? It would give you a bigger sense of the impact of an event on the city. But, of course, that also makes it more of a target… But then, you don’t need that wifi data to know that an event is popular. That doesn’t mean the security panels won’t come back to you on it, though.

In this context, live data is more dangerous than delayed publication – which is one option to look at. Belfast doesn’t need to do this alone – lots of places are looking at this.

Some businesses are already using this data – supermarkets are using it to generate footfall data for store planning.

Session Notes

Getting the open data you need for good Neighbourhood Planning

Neighbourhood plans are a crucial part of the UK’s planning infrastructure, allowing people to have a serious say in the development of their own area. People in Bramcote decided to take advantage of this – the move to do a neighbourhood plan was driven by a desire to preserve the green belt in the area.

They decided to work on Bramcote ward – a political ward – for simplicity’s sake.

Getting open data for neighbourhood plans

Judith’s first step in building the maps and plans needed for the plan was working out what’s there already. She sought open data that showed what existed within the ward, from walks to infrastructure to the areas of green belt. Local wildlife sites were easily defined – the shapes were downloaded from data.gov.uk, but some local sites weren’t there. They were found at Nottingham Insight mapping, but it wasn’t downloadable. A printout isn’t super-useful for GIS work – and the data wasn’t released for anything but personal use. And the data owners wouldn’t allow permission.

Greenbelt boundaries have been published, so they could see how they’ve been changed. But consultation on planning shapefiles weren’t available for use.

Why the copyright restrictions on public data?

Where did the constraints come from? Possibly a local supplier added copyright out of information extracted from Ordnance Survey. But the OSNI does enforce copyright data on anything that uses its data. There’s the option to buy the data of course, but that’s expensive.

In the end, self-generating maps is one solution. The other is that you can go and talk programmatically to the services underlying the OS data. Of course, just because the data is available that way, doesn’t mean its necessarily open data.

It’s the copy & paste paradox. Just because you can copy it doesn’t mean you can paste it.

The general conclusion was that the restrictions were almost certainly more cultural than anything else. But how do you deal with that? Campaigning – people need to be aware that these things should be available. And many times, realising the data isn’t anywhere near central to the officer’s job or priorities.

Solving the problem in a neighbourly way

How about co-operation with other neighbourhoods creating their plans? That brings a weight of political pressure to bear. Another way to exert pressure is to put a request in, mentioning that you’ll use a Freedom of Information request to get it instead. That tends to shake it loose.

Useful tip: there’s an active QGIS user group who are working in neighbourhood planning. That’s useful network of expertise and experience.

Turning the question on its head: are they releasing their neighbourhood plan polygons? The local planning officer would probably leap at that, and you’re setting things up for an exchange.

Open Street Map? It was tried, but there were problems registering it.
OS the GridInQuest software could be useful for dealing with that.

Session Notes

Open Data Horror Stories: 2017 Edition

There’s a tendency to focus on personal data as the major risk of open data. But there has to be more than that.

Open Data: The Horror – by Drawnalism's Matthew Buck

ODI Devon has made a policy of holding its meetings around the county. This avoids everything becoming Exeter-centric, but there is a cost to hiring the meeting rooms, and as they publish their spending as open data, it’s led to some criticism.

There’s lots of work going on around databases of GPs. That could be used for ranking GPs, on a simple scale. That could be too simplistic. And there’s not really consumer choice in GPs – so how useful would that be? Could you end up with property price issues as you do with schools.

Fun fact: there are no such thing as school catchments, there’s only derived areas when the school is over-subscribed…

Trafford has a selective education system, with an exam splitting pupils between grammar and high schools. The net result? The grammars are full of children whose parents can afford tutors. So, people started looking at the ward by ward data, to move the discussion beyond anecdote, through use of a visualisation people could explore. The Labour councillors could see that their wards were being discriminated against in favour of people from outside Trafford – but then nothing really happened.

Data does not come with intent. But it can then enable dynamics which lead to inequality or gaming the system. Is it right, ethically, to withhold the data because of that? The instinct seems to be “no” – but the system needs to be looked at.

Personal data problems

If we cock up and release personal data – that’s on us. It’s not the fault of the open data system. It’s good that people examine how we spend money – because it’s their money! But be a dialogue, not a broadcast – let them come back and discuss what they find in the data.

Does open data make accidental personal data releases more likely?

Well, possibly, if you put deadline and statutory pressure on people, without the resources and expertise to do it well.

Matching data sets is one concern: where you can de-anonymise data by matching sets together. It’s very complex to deal with. You don’t only have to think about your own data, but also be aware of what else is out there. That’s challenging. Pollution data is very easily connected with location and individual farms, for example. The converse risk is aggregating it upwards until it become meaningless.

There’s also the risk of releasing data that harms people economically.

Analysing the extent of risk

Astrophysics is rarely front-page news. Medical research is. Medical researchers can’t self-publish. In physics you can self-publish. Open data needs this – a sense of the potential damage a dataset can. For some it will be negligible, for some it will be serious.

There are two dimensions worth considering:

  • Likelihood of risk, from unlikely to almost certain
  • Severity of risk, from minor boo boo to full-scale zombie outbreak

At some places, no data is released until it’s analysed through that process. However, it assumes that you have experts that have the knowledge to do it well. You also have issues of impartiality – repetitional risk shouldn’t be a factor, but it will be for some organisations. Innate bias, political, racial or sexual could influence the person making the decisions or scoring.

How do you balance this against the opportunity cost of NOT releasing the data?

There are a small number of high wall reservoirs that are at high risk for catastrophic damage if they fail. The government won’t release which they are, because they could become terrorist targets, but equally, the people who live in the area at risk have no idea and can’t prepare.

Session Notes

Can Open Data help Northern Ireland bring down its interfaces?

The interface team in Northern Ireland is tasked with dealing with the peace walls – Interfaces – which separate Protestant and Catholic areas of Belfast and elsewhere – which are due to come down by 2023. The program has a Twitter account and Facebook accounts to increase engagement with individuals and communities concerned.

Cupar Way is the largest of the interface structures.

In order to get them down, then government has committed to only removing them with the consent of the involved communities – but actually reaching this point present significant challenges. And some of these areas are the most deprived in Northern Ireland.

The data accuracy problem

They have some data, but it’s not open yet. They’re developing mapping data, and have existing data on crime, health, bonfires and so on. Could there be an open data platform to bring this all together? There are some data sharing agreements with the various sources of data – and there are some problems surfacing because in some places the data sources aren’t accurate (or detailed) enough. That needs to be solved before it’s opened, because of the sensitivities involved.

It’s clearly very important to get this right. They need to the best possible information before they can make decisions if the walls are safe to come down.

Academia must have useful data for this process. Is there some? How would they get hold of it?

How can they ensure that the general public engage with the data? A portal would be ideal – but they’re a long way from that. There’s a lack of technical expertise in the team, but there’s a lot of interest that needs transforming into resources and actual help. They’re more than keen to add new people to the “dream team” behind it.

The definition problem

There’s some contention about the number and length of interfaces out there. How do you define communities for consultation purposes? Residents? Businesses? Churches?

Once you’ve done that, how do you consult?

Academia have been doing some interesting work mapping religion and communities around the walls – some communities live right up to them, some don’t as the nearby houses are now gone.

There’s some debate about what is an interface or not. The DoJ is responsible for 59 structures, and they have been reduced to 49 to date. There is a physical map of interfaces – but it’s not owned by them. They have their own data – which they would like to publish. Something as simple as GPS co-ordinates linked as walls could serve. Postcodes are not open in Northern Ireland, which doesn’t help.

A need for more informed consultation

The Interface team engages with people day to day in a grassroots manner. But they’d like more data on service duplication, travel time increases and so on, that could help persuade communities that they’d be better off without the interfaces. It will help them understand the benefits and impact.

Current responses form these communities are genuinely mixed. They’ve been building up their engagement over the last year, but they’re still not reaching the local residents enough. There are issues of power and control over the communities to deal with.

One attendee pointed out that data shows that many communities either side of an interface are identically in terms of economic, health and crime data. The only difference is religion. Can that data be used to help reconcile people?

There is trans-generational trauma at work in some communities, which makes just testing opening up doors in the interface problematic. They can’t just go in with sledgehammers – you need to bring the communities along with the idea. Tech’s A/B testing doesn’t normally lead to petrol bombs…

In summary

They need assistance. Anyone who can get the data portal idea to move forwards, or who has ideas should get in contact.

Open Data for Newbies (2017 edition)

It’s OK to accept that bright, engaged people might not know what Open Data is. So, here’s a beginner’s guide for them, liveblogged at Open Data Camp 5 in Belfast.

What is open data?

It’s a data set that anyone can access, and which has a licence, and which has been published. Primarily, it’s in a machine-readable format.

Data, in this context, is anything! A photo can be open data. Generally, we’re talking Data that can be presented in rows and columns in a CSV (comma separated variables) file. It’s an open format akin to (and compatible) with Excel, but which isn’t dependent on owning Microsoft office.

Data is the first stage – it’s just data, not yet information.

Where is it published?

On a website – in a way that you can easily access, ideally without limitation or need to register. You shouldn’t have to pay.

There are various platforms and portals that make open data available.

What about the licences?

It should be published with an open data licence attached. The licence can tells you what you can do with it, and hinder what conditions (like attribution, for example). The Open Government Licence (OGL) is one example. Creative Commons is another one.

Data without a licence isn’t really open, because you don’t know how you can use it.

Technically,if you abuse the licence, you can be cut off from using it – but that’s hard to enforce.

Do you need an ethics licence?

Open data should never be personal.

There’s a data spectrum between closed, private data, via shared data (which is available to a subset of people), and then there’s public data (like Twitter’s feed, for example), and finally open data.

Open Data is data that is free for anyone to access or share. Even if it is derived from personally-identifiable data, that data should be anonymised.

What is metadata?

Metadata is data about data. It’s information like the source of the data, or how it was collected. Sometimes the metadata is great, but sometimes it doesn’t exist. Metadata is where you can give your data context.

What is the ODI?

The Open Data Institute was founded five years ago as a charity to connect and inspire people around the world to use open data.

ODI Nodes are local groups of open data enthusiasts and advocates. They’re a bit like a franchise. There’s no trickle down funding, so the nodes have to raise their own funds and use volunteers.

What is an API?

An API is an application programming interface. It allows you to automate extraction of data from a data source, via coding. Basically, someone who owns data on a server has written some code that allows you to access that data. APIs are more interesting for realtime data – which is constantly changing. TfL publishing’s loads of realtime transport data about London via APIs. The CityMapper app uses that API.

In Bristol there are air quality monitors that report every 24 hours via an API.

It’s a way of automating updated data access.

What is Linked Data?

Linked data is a data point that a computer can read, that allows referencing fo data. So, if you’re citing data in a paper, you can provide a hyperlink to the original data so people can check the provenance.

What are the five stars?

These were determined by Sir Tim Berners-Lee

  1. Make it open
  2. Make it machine readable (tabular data in a spreadsheet, for example)
  3. Same as above, but in a non-proprietary format.
  4. Using an URI – uniform resource identifier
  5. Linking your data to other data.

More info on 5 star open data. Generally speaking, three star is good enough.

What are registers?

These are definitive data sets. The Government Digital Service are building these for some key pieces of information such as the definitive list of countries in the world.

Do we have data standards?

A standard is everyone agreeing to do something in the same way. We don’t have a definitive list of standards for open data. They make things much easier – but are hard to agree and enforce. Standards make it much easier for machines tho read data and connect different data sets. Humans make this worse by having preferences. There are a growing body of code snippets that allow data to be transformed into a preferred format, if it wasn’t supplied at that way to start.

There are standards bodies which think very hard about standards, agree them and publicise them. W3C, ISO and so on. It’s hard to enforce, but you can persuade.