Open Data Horror Stories: 2017 Edition

There’s a tendency to focus on personal data as the major risk of open data. But there has to be more than that.

Open Data: The Horror – by Drawnalism's Matthew Buck

ODI Devon has made a policy of holding its meetings around the county. This avoids everything becoming Exeter-centric, but there is a cost to hiring the meeting rooms, and as they publish their spending as open data, it’s led to some criticism.

There’s lots of work going on around databases of GPs. That could be used for ranking GPs, on a simple scale. That could be too simplistic. And there’s not really consumer choice in GPs – so how useful would that be? Could you end up with property price issues as you do with schools.

Fun fact: there are no such thing as school catchments, there’s only derived areas when the school is over-subscribed…

Trafford has a selective education system, with an exam splitting pupils between grammar and high schools. The net result? The grammars are full of children whose parents can afford tutors. So, people started looking at the ward by ward data, to move the discussion beyond anecdote, through use of a visualisation people could explore. The Labour councillors could see that their wards were being discriminated against in favour of people from outside Trafford – but then nothing really happened.

Data does not come with intent. But it can then enable dynamics which lead to inequality or gaming the system. Is it right, ethically, to withhold the data because of that? The instinct seems to be “no” – but the system needs to be looked at.

Personal data problems

If we cock up and release personal data – that’s on us. It’s not the fault of the open data system. It’s good that people examine how we spend money – because it’s their money! But be a dialogue, not a broadcast – let them come back and discuss what they find in the data.

Does open data make accidental personal data releases more likely?

Well, possibly, if you put deadline and statutory pressure on people, without the resources and expertise to do it well.

Matching data sets is one concern: where you can de-anonymise data by matching sets together. It’s very complex to deal with. You don’t only have to think about your own data, but also be aware of what else is out there. That’s challenging. Pollution data is very easily connected with location and individual farms, for example. The converse risk is aggregating it upwards until it become meaningless.

There’s also the risk of releasing data that harms people economically.

Analysing the extent of risk

Astrophysics is rarely front-page news. Medical research is. Medical researchers can’t self-publish. In physics you can self-publish. Open data needs this – a sense of the potential damage a dataset can. For some it will be negligible, for some it will be serious.

There are two dimensions worth considering:

  • Likelihood of risk, from unlikely to almost certain
  • Severity of risk, from minor boo boo to full-scale zombie outbreak

At some places, no data is released until it’s analysed through that process. However, it assumes that you have experts that have the knowledge to do it well. You also have issues of impartiality – repetitional risk shouldn’t be a factor, but it will be for some organisations. Innate bias, political, racial or sexual could influence the person making the decisions or scoring.

How do you balance this against the opportunity cost of NOT releasing the data?

There are a small number of high wall reservoirs that are at high risk for catastrophic damage if they fail. The government won’t release which they are, because they could become terrorist targets, but equally, the people who live in the area at risk have no idea and can’t prepare.

Session Notes

Can Open Data help Northern Ireland bring down its interfaces?

The interface team in Northern Ireland is tasked with dealing with the peace walls – Interfaces – which separate Protestant and Catholic areas of Belfast and elsewhere – which are due to come down by 2023. The program has a Twitter account and Facebook accounts to increase engagement with individuals and communities concerned.

Cupar Way is the largest of the interface structures.

In order to get them down, then government has committed to only removing them with the consent of the involved communities – but actually reaching this point present significant challenges. And some of these areas are the most deprived in Northern Ireland.

The data accuracy problem

They have some data, but it’s not open yet. They’re developing mapping data, and have existing data on crime, health, bonfires and so on. Could there be an open data platform to bring this all together? There are some data sharing agreements with the various sources of data – and there are some problems surfacing because in some places the data sources aren’t accurate (or detailed) enough. That needs to be solved before it’s opened, because of the sensitivities involved.

It’s clearly very important to get this right. They need to the best possible information before they can make decisions if the walls are safe to come down.

Academia must have useful data for this process. Is there some? How would they get hold of it?

How can they ensure that the general public engage with the data? A portal would be ideal – but they’re a long way from that. There’s a lack of technical expertise in the team, but there’s a lot of interest that needs transforming into resources and actual help. They’re more than keen to add new people to the “dream team” behind it.

The definition problem

There’s some contention about the number and length of interfaces out there. How do you define communities for consultation purposes? Residents? Businesses? Churches?

Once you’ve done that, how do you consult?

Academia have been doing some interesting work mapping religion and communities around the walls – some communities live right up to them, some don’t as the nearby houses are now gone.

There’s some debate about what is an interface or not. The DoJ is responsible for 59 structures, and they have been reduced to 49 to date. There is a physical map of interfaces – but it’s not owned by them. They have their own data – which they would like to publish. Something as simple as GPS co-ordinates linked as walls could serve. Postcodes are not open in Northern Ireland, which doesn’t help.

A need for more informed consultation

The Interface team engages with people day to day in a grassroots manner. But they’d like more data on service duplication, travel time increases and so on, that could help persuade communities that they’d be better off without the interfaces. It will help them understand the benefits and impact.

Current responses form these communities are genuinely mixed. They’ve been building up their engagement over the last year, but they’re still not reaching the local residents enough. There are issues of power and control over the communities to deal with.

One attendee pointed out that data shows that many communities either side of an interface are identically in terms of economic, health and crime data. The only difference is religion. Can that data be used to help reconcile people?

There is trans-generational trauma at work in some communities, which makes just testing opening up doors in the interface problematic. They can’t just go in with sledgehammers – you need to bring the communities along with the idea. Tech’s A/B testing doesn’t normally lead to petrol bombs…

In summary

They need assistance. Anyone who can get the data portal idea to move forwards, or who has ideas should get in contact.

A tale of two datasets

Controversially, Gavin Freeguard, head of data and transparency at the Institute for Government, was allowed a PowerPoint presentation at Open Data Camp 4. However, it was in a good cause.


His slides enabled him to give some concrete examples of the data in the Whitehall Monitoring Project, which he runs. The project monitors the shape and size of government, the morale of civil servants, and other factors.

Open Data: the policy problem

Owen Boswarva

There used to be a strategy board and an open data user group, and many other groups steering open data at the policy level. But most of these have now gone away. The one that seems to have survived in the Data Steering Group – but that has a wide range of interests – and we don’t know how interested they are in open data. Other groups seem to have evaporated. None of them have met since 2013/14.

Some sector boards still seem to be in effect. Should these surviving groups be steered from inside or outside government? There are some clearly missing. There’s a good pool of practitioners – but how do people outside the community find out about open data now? And how do we push for more release?

