Tag Archives: Open Data

Making Open Data Camp matter – to local economies and more

What is the value to the local economy of open data – and open data unconferences? The wider benefit of open data to local economies is harder to quantify. There’s no E-MC^2 equation of open data benefit yet.

So let’s talk about unconferneces, and Open Data Camp in particular.

Local value of open data camp

Some organisers have a sense that it stimulates the economy, but no sense of how to measure that. There’s local sponsorship – so they’re expecting some return on that investment. It might be an opportunity to meet potential customers, or to improve their operational intelligence.

Corporate social responsibility is one reason people sponsor: it’s both a community benefit, but it also benefits companies to have a thriving open data ecosystem.

Escaping the gravity of the capital

Just NOT having it in London is a good thing. Holding events away from London can be an incentive.

There’s a distinction between the benefit of open data on the city, versus the value of an open data conference in the city. There’s a clear basic financial benefit to the city in terms of hotel rooms, food and entertainment from events, as long as people are prepared to travel to attend the event. One event in an area succeeding gives other confidence to happen.

Travel is not just about money, but also about time to travel there.

Buy in from the host city makes a big difference. The city saying “no” to some investment in an event can kill it. There needs to be some vision for the city of the benefit, so you can sell it to them.

Unconferences can be wooly as to what the benefit is. Open Data Camp is deliberately avoiding London, both because too many events happen in London, and because people can be resident to it. You get local character and flavour, and you get people who might never come to an event if it wasn’t near to them. And they still get national organisations coming – because they don’t get out of London much, and get to make connections with local projects.

Connections make benefits

Those connections can turn into valuable projects. You’re not just connecting geographies, but different forms of organisation. People who don’t do data can start to see it in a physical way – to understand the data that describes the city they can see around them.

Can we improve the outcomes by theming the event? Or would that corrupt unconferences? People tend to take advantage of the location to discuss local issues – like the interface/divided communities session at this event. And that can be very valuable – giving people insights into unexpected uses of data.

The Queen’s University, Belfast computing department is often empty of a weekend. Why not use the space for events like this. Let people come in and find out new things. Being physically in different places give opportunities to explore new technologies, like iBeacons or VR tech.

Look at the data you have, and the data you can get, and the technologies are coming along – and then the space to think about how to combine them. Ideas start at those sorts of meetings – and we need those case studies.

Catalysing other fields

Bring in other kinds of people – English Lit students could find open data techniques useful in extracting what they need from books. You can avoid massive wastes of time and effort by bring people together in a way that allows them to realise what they can offer to each other.

At the moment, Open Data Camps are open data people talking to open data people. Could we have a Friday where we open up our experts to other people. That means we could say we advised start-ups and students, and contributed to the economy.

Pre-activation of people – letting them plan for hacker spaces, or offering open data surgeries would be possible ideas.

We’re trying to capture the sessions via Drawnalism, and we’re putting that on the blog. But should we be pushing onwards wit it, telling case studies and stories around events or projects that spin out of open data camp sessions and meetings?

But what about the wider benefit of open data to local economies? There’s no E-MC^2 equation of open data benefit yet.

Session Notes

Whose data is it anyway?

The question of who data belongs to, and whether individuals can have a say in what happens to their data, tends to come up very quickly in some areas. Health, for example.

But there is a concern that the whole issue of data collection and use could become much more fraught with the arrival of the General Data Protection Regulation. This is an EU regulation, that is being incorporated into UK law at the moment, via the Data Protection Bill.

The GDPR will require organisations to think about the impact of projects on data privacy at an early stage and to appoint a data protection officer. It will introduce large fines for data breaches, tighten up rules on consent, and introduce some new rights; including a right to be forgotten.

The session heard this last right, introduced following a court case involving Google, could have a big impact on open data sets. Because if people remove themselves from datasets, they become less complete.

As the session leader, Kim Moylan, said: “What happens if people pluck themselves out of data? Do we leave a blank line, or just take what is there?”

“You will have to remove information that is no longer relevant. But what is no longer relevant in a medical record? What about the census? That is a big open data set, if people remove themselves from the census, then what do we do then?”

Aggregate data

One participant felt the public debate needed to be recast. Instead of talking about ownership, he said, the discussion should be about legal rights and restrictions. “If I walk through Belfast, I know will be filmed, but are legal safeguards on that. Talking about ownership just confuses the issue.”

However, the GDPR is coming in. And the general feeling in the session was that if it is going to cause problems for the open data movement, they will arise at the aggregation stage.

As various participants pointed out, when data is released as open data it is anonymised; the open data movement doesn’t deal in identifiable patient data, so it doesn’t need explicit consent to use the data it uses.

However, if people decide to opt-out of a particular data collection, or ask to be ‘forgotten’ and removed from a collection that has taken place, then that will affect the size, and potentially quality, of the dataset being released.

In which case, the big question is how many people will opt out or exercise their right to be forgotten. On this, opinions were divided. One participant pointed out that people already have rights to opt out of their medical data being used in shared care records and some data collections; and hardly any use it.

The Caldicott Review of information governance and security in the NHS will give people new opt-out rights; but there is no reason to think a lot of people will use them.

Practical problems

Still, the practical implications could be hard to deal with. One participant, who uses surveys to collect information asked whether someone who came back and said they no longer wanted their data to be included could ask for it to be removed at every level; the original survey, the aggregation, and the anonymised release.

The answer seems to be yes. “But philosophically, I have a real problem with this, because a policy decision might have been made on that data, and now it has changed.”

Also, surveys might need to be larger in the future, to make sure they would still be statistically valid if a predictable number of people removed their data later on.

Overall, though, the session was positive. It remains the case that most instances in which data is generated and used are covered by well-established legislation that will not be affected by the GDPR.

The Data Protection Bill builds on existing data protection legislation, which is reasonably well understood. Even the right to be forgotten is already two years old.

GDPR – good news?

Indeed, there is an argument that more debate about data protection, and more awareness of the new rules, can only be a good thing, because it will build public trust.

One participant said: “These conversations are happening more and more. We have privacy groups bringing cases. Privacy notices will have to be much more transparent. But I’m quite optimistic. I think once people understand their rights they will actually be more comfortable with uses of their data. The impact will be on companies that are not doing very well at the moment.”

 

Impact of the private sector on demand

Open Data Camp 5, day two, opened with a discussion of the impact of the private sector on open data.

The session was led by Shelby Switzer, who explained she was interested in the subject because she worked for a company in the US [Healthify] that uses a lot of open data about social issues and services.

Open Data in the private sector

The problem: “We find the data sucks and we have to put a lot of effort into making it better,” she said. “So, I want to talk about how we get the providers to do better.

“Also, how to prove to my company that it is better to help to improve the data at source, instead of spending so much time cleaning it.”

Small errors, big problems

When Shelby asked who worked for the private sector, a lot of hands went up.  One participant said he was working on a project about energy sites, and wind turbines.

“Some of the government data on that is wrong. So if someone wants to build a wind turbine, it comes out as being on a farm, or in the nearest village,” he said.

Asked whether he told the relevant government department about these problems, he said he did, and they did get corrected – eventually. But he felt more levers would help. “Sometimes I sell people the idea that if they ask to get it corrected, it can unblight their house.”

A participant working for a government department asked how he found the right person to tell. Which is definitely a problem. “It’s very hard, because you are down to individual records. So one [turbine] record is 500 yards out. To the government, it’s ‘big deal’; but for an individual that can matter a lot.”

Another participant, who is also working on the sustainable energy sector, but in Northern Ireland, where the government is trying to encourage improvements in housing, said she had come across similar problems.

It could be hard to be sure, for a house or flat, who the letting agent was, or the insurer. “We found it matters a lot who you feed back to. You need someone who really wants to meet the original [policy] objective.”

Solutions – or partial solutions

Another participant asked if companies that wanted to use open data had tried building a model for the open data providers to work to? Shelby said her company was looking at that: “if we build a model, can we get agreement they will keep the data up to date?”

What, she asked would encourage a government department to respond? To which the session had a simple solution: money.

However, participants also flagged up some problems. For example, one said: “You start getting people saying well, if people want this data they should pay for it, and we don’t want that.”

A suggested solution was to work out non-monetary deals; for example, time-limited, exclusive access to a new or improved data set.

However, a further participant, who had worked for a council, said it often had FoI requests for parking spaces from a particular company. The council would have liked this dataset, but didn’t have it. It would have cost a lot to generate.

It talked to the company about a deal to generate it, but the business wanted exclusivity, and the council wasn’t comfortable with this. So it didn’t happen.

 

What about private open data?

At this point, the discussion changed track, when an ODCamp volunteer asked why the debate about open data invariably focused on government open data. Why weren’t private companies releasing their information?

Which raised the obvious question: “Why should they?” A company’s data is valuable; to it. However, one participant said the French government is looking at getting companies to release information on, for example, farming practices that impact on efficiency.

“They think that information should be available, and not the property of [a tractor maker],” he said. “But that’s quite French. It doesn’t respect private property: it wouldn’t fly in the US.”

One lever might be to require companies that get public money to release information about their contracts and their impacts as open data. One participant said this had been done, with the Buses Bill; companies must release information on issues like routes, “so you can tell where the buses are.”

However,  this would require most government departments to get much better at writing contracts. Another suggestion was that large companies, which use open data, but don’t declare this, should be required to do so – if only to show how much use and value it has.

Overall, the session felt that while the journey to open government data had been a long one, the journey to open public data was going to be even longer. Not least because there were fewer levers to use.

A participant working for a government department said: “The argument that open data is ‘right’ has not been persausive: instead, the idea that has been persuasive is that it has already been paid for, by the public, so the public should be able to use it.

“And that isn’t there for the private sector.” However, an earlier speaker argued that ideas of reciprocal value might work; companies that release data as open data can then work with other companies looking to get further value from it, to mutal benefit.

Open Data Horror Stories: 2017 Edition

There’s a tendency to focus on personal data as the major risk of open data. But there has to be more than that.

Open Data: The Horror – by Drawnalism's Matthew Buck

ODI Devon has made a policy of holding its meetings around the county. This avoids everything becoming Exeter-centric, but there is a cost to hiring the meeting rooms, and as they publish their spending as open data, it’s led to some criticism.

There’s lots of work going on around databases of GPs. That could be used for ranking GPs, on a simple scale. That could be too simplistic. And there’s not really consumer choice in GPs – so how useful would that be? Could you end up with property price issues as you do with schools.

Fun fact: there are no such thing as school catchments, there’s only derived areas when the school is over-subscribed…

Trafford has a selective education system, with an exam splitting pupils between grammar and high schools. The net result? The grammars are full of children whose parents can afford tutors. So, people started looking at the ward by ward data, to move the discussion beyond anecdote, through use of a visualisation people could explore. The Labour councillors could see that their wards were being discriminated against in favour of people from outside Trafford – but then nothing really happened.

Data does not come with intent. But it can then enable dynamics which lead to inequality or gaming the system. Is it right, ethically, to withhold the data because of that? The instinct seems to be “no” – but the system needs to be looked at.

Personal data problems

If we cock up and release personal data – that’s on us. It’s not the fault of the open data system. It’s good that people examine how we spend money – because it’s their money! But be a dialogue, not a broadcast – let them come back and discuss what they find in the data.

Does open data make accidental personal data releases more likely?

Well, possibly, if you put deadline and statutory pressure on people, without the resources and expertise to do it well.

Matching data sets is one concern: where you can de-anonymise data by matching sets together. It’s very complex to deal with. You don’t only have to think about your own data, but also be aware of what else is out there. That’s challenging. Pollution data is very easily connected with location and individual farms, for example. The converse risk is aggregating it upwards until it become meaningless.

There’s also the risk of releasing data that harms people economically.

Analysing the extent of risk

Astrophysics is rarely front-page news. Medical research is. Medical researchers can’t self-publish. In physics you can self-publish. Open data needs this – a sense of the potential damage a dataset can. For some it will be negligible, for some it will be serious.

There are two dimensions worth considering:

  • Likelihood of risk, from unlikely to almost certain
  • Severity of risk, from minor boo boo to full-scale zombie outbreak

At some places, no data is released until it’s analysed through that process. However, it assumes that you have experts that have the knowledge to do it well. You also have issues of impartiality – repetitional risk shouldn’t be a factor, but it will be for some organisations. Innate bias, political, racial or sexual could influence the person making the decisions or scoring.

How do you balance this against the opportunity cost of NOT releasing the data?

There are a small number of high wall reservoirs that are at high risk for catastrophic damage if they fail. The government won’t release which they are, because they could become terrorist targets, but equally, the people who live in the area at risk have no idea and can’t prepare.

Session Notes

The EveryPolitician project

Open Data Camp 5 is taking place in the Computer Science Department of Queen’s University. It’s a modern institituion that wants to make sure its students are ready for work.

So there are rooms that are carpeted with artificial turf, filled with trees, and furnished with garden benches. Of course there are.

The second session of the morning gathered in the garden room to discuss the EveryPolitician project [everypolitician.org], a bid to collect information about every politician in the world, anywhere.

Not an easy job…

The EveryPolitician site says that it has information on 74,939 politicians from 233 countries – and counting. However, Tony Bowden from mySociety, who started the project with Lucy Chambers, explained that this has been very difficult to collect.

The project started by “running a lot of scrapers on a lot of sites” but, because of licensing issues, “we couldn’t quite tell anybody they could use it” and “we weren’t sure it was sustainable.”

So, the project is moving to Wikidata, in the hope that this will become “the canonical source of information about politicians”.

Why Wikidata?

Bowden explained that Wikidata is connected to Wikipedia. There is no single Wikipedia; there is one for every country. So, Wikidata collects information from all the different pages, in a structured or semi-structured form, because otherwise they get out of sync with each other.

On the political front, for instance, Bowden said that if there is an election in an African country, local people will update that quite quickly, but the page in Welsh might not be updated for some time. The idea of Wikidata is to keep them aligned.

Still, collecting information “in a structured or semi-structured form” is not as easy as it sounds. For example, a session participant asked how EveryPolitician defined a political party, given that the idea will be fluid across different political systems.

Bowden acknowledged that, for EveryPolitician: “We came up with a simplified view of the world. We felt it was more important to have something good enough for every country to use, than to capture all the nuances.

“If you want to do comparative analysis across the world, you can’t start with a two-year anaysis of systems. It’s ok to say there will be some kind of political grouping.”

An evolving project

Bowden added that he thought these kinds of issues would be resolved, as people started to use the data. “We think it will be something like OpenStreetMap,” he said. “So, initially, there will be some broad concepts, but as it goes along, there will be people who go along and do every tree in their area, and the nuance will start to come through.”

Another issue is that there may not be a ‘single source of truth’ about politicians in some countries. For example, the Electoral Office of Northern Ireland knows who has been elected to a seat, but may not log changes  – for example, if someone stands down and somebody else is co-opted.

Or there might be official sites, but they might not be very good. Kenya only lists politicians with names starting with the initials a-m (and one person starting p). Nigeria’s politician pages are up to date, but its index is three months behind.

Bowden said EveryPolitician is building tools so that individuals can scrape official sites, and then upload the information to Wikidata, and fill in gaps or correct errors.

What is interesting, he said, is that once a country gets a good list, with people committed to maintaining it, that information tends to be much better – and better used – than official sites.

“If there is an election, the Parliament site might not update until Parliament sits, but the Wikidata pages will update overnight,”Bowden explained. “Then journalists can mine that for stories. So they can instantly tell people things like: ‘the youngest ever MP for Wales has just been elected’.”

Find out more:

The EveryPolitician website has collected 2.8 million pieces of information so far. Tony Bowden has explained the move to Wikipedia and its benefits in a blog post on the MySociety Website. He also Tweets about the EveryPolitician project. 

Open Data GP Registers

Northern Ireland has always needed to keep registers of GPs and other health providers. Now, at least some people in its government and health and social care service looking to release the GP register as open data.

A single list of GPs in Northern Ireland that is available in machine readable format.

Why? Session leader Steven Barry explained: “Lots of government departments have lots of service information, but it is often collected manually, so when somebody leaves it stops, or people do it differently.

“Working in the health service, our statistics guys were spending a lot of time presenting data instead of putting it out in a standard profile and letting the community do things with it.

“Simon Hampton, our minister of finance, is a very young guy, and he was very interested in what is happening in Estonia. So he has been right behind this.”

Technical challenges

There are technical challenges with creating registers. Most obviously, how are they going to be populated? If there is a register of practices and a register of GPs, how are they going to be aligned?

And if there is more information, such as how big a population the practice serves, how is this going to be kept up to date?

One session participant said that in England and Wales, at least, there is no agreement on what a GP is. But at least six organisations hold lists of GPs. Which have different categories of information within them.

Barry’s register tackles some of these issues. But having seen it, some session participants pointed out that it still has limitations. There are identifiers, for example, but no information about what they are or who assigns them or how they can be used.

And the register is only available in a limited number of formats. One participant pointed out, though, that the technical problems can be solved. What is needed is people who want to solve them. Projects need stakeholders and users.

Finding uses and users

So, with a register in place, what might people be able to do with it? “In England, NHS Digital has been allowing people to rate their GPs,” Barry said.

“So if you use Bing, you can search for a GP, and see practices by rating. I looked at the GPs in Paddington and they were all two stars. I thought that must be a bit demoralising for the GP.

“And what I wanted to know was things like how many people does the GP look after, are they male or female, when and where were they trained, and what services do they offer?”

UK governments all want to give people more choice of GP, so other people are going to want the same information. Even though finding a GP with an open list can be a whole new challenge…

And health and care services have been investing in chief clinical information officers; doctors and other health professionals with an interest in IT and data. So they might be advocates for this kind of development. There are champions for change out there, the session concluded. The trick will be finding and using them.

Open Data Camp 4: bigger, better, wetter

This post is a repost of Giuseppe’s Medium blog post

I am slowly coming back to life after Open Data Camp. Being in Cardiff was amazing, if not for the weather, which is partly to blame for my current heavy cold. I have not been that wet since I walked up the Snowdon six years ago. Despite the weather, this Open Data Camp has been probably the most amazing we have run since starting in Winchester two years ago — with some caveats. Here are some stats coming from the participants who shared their data (>50%).

The highest participation ever recorded at an Open Data Camp

The most mind-boggling figure from this camp is the total number of attendees: we checked in 125 people on day one, and 103 people on day two (most of them, but not all, returners from day one). To put things in perspective, the highest participation on any day one had been at Open Data Camp 3 in Bristol, with a total tally of 93.

What’s more striking is the very low drop-out rate. We counted 134 unique attendees out of 147 tickets sold. In the unconference industry, a drop out rate of 30% is considered normal, and ours was only 9%. A 91% attendance level for a free event is something I would have never expected. It is testament not just to Open Data Camp being a great event — hey, I’m blowing my own trumpet here! — but to the community being very committed to attending.

Most of the UK was covered

Look at the pin map on the left (or play with Angharad’s interactive map). People who declare their travel origin are from all over the place: Sunderland to the North, Norwich to the East, Hastings to the South. However, the map on the right suggest the magnitude of attendance is higher somewhere in the South of the country. Let’s make a chart…

 
Participant origin in the UK

We can still do better, location wise

Looking at the data, it is evident that most attendees come from the English South or Wales. Is transport an issue? Potentially. However, one of the ideas behind Open Data Camp is in fact to bring Open Data around the country, rather than getting people to attend, so I would not be extremely negative about it. If anything, these chart suggest where to bring Open Data Camp next — if almost 70% of the attendees come from Wales and the South (including London), we should focus on making the next event happen where we only have few attendees: Yorkshire, the North East, Scotland, Northern Ireland.

Attendance by town and region of origin

Some people travelled a long distance

I had a huge grin on my face when I realised we had attendees from 4 continents. The non-European attendees were just 3, but their contribution was really useful. Hearing about Open Data in Seattle and Bangalore is certainly something that can make UK Open Data better.

The median distance travelled was 86 miles, which is more or less how far Southampton, Oxford, or Plymouth are from Cardiff.

 People travelled some distance to get to Open Data Camp

Diversity

Of course, some thought needs to be given to the diversity of Open Data Camp. The organising team did relatively well on gender balance, with over half of the members being women. So I was a bit disappointed upon realising that overall the event saw twice as many men as women (I leave those who did not declare their gender here in the chart, as I think this might be another symptom we need to address). What was your feeling as an attendee?

Open Data Camp seems to be pretty well received among people of a diverse range of ages, but if there is anything we can do to improve please let us know.

We have no data about ethnic background at this stage, but it might be something we would need to monitor in the future.

Gender and age data

On the way to Open Data Camp 5

It’s going to be difficult to beat Open Data Camp 4:

  • the biggest Open Data Camp so far
  • the first in a (former) parliament building
  • the first with armed police
  • the first in which I pitched a session (pushed by Jeni — and actually this was the first session I ever pitched at an unconference…)

When we started organising Open Data Camp in late 2014, I was skeptical: I thought this would be a one-off event and I was resigned that interest would move elsewhere. Instead, if I can summarise Open Data Camp 4 in one key learning point, I can say that interest in Open Data is getting hotter and hotter.

There were many users of data, new to the community, who are extremely keen on data releases and clear, open processes; there was an incredibly well attended session, “Open Data for beginners”, which had to be repeated due to demand; we had, for the first time, data journalists attending, interested in keeping pressure on the government to publish data timely and accurately; we had professionals who work in fact-checking, now using government data to partially automate their fact-checking processes.

Equally, there is a demand for better data, open standards, and clear processes by veterans of Open Data. Open Data shouldn’t come at a massive expense to the taxpayer, but I still think it is beneficial to the efficiency of the public sector if processes that generate data are made clear, de-duplicated, and documented — and the Open Data agenda has clearly been pushing in this direction.

It seems evident to me that Open Data needs to be consolidated — and, preferably, approach releases from a problem-driven perspective (as I somewhat suggest here) — but it is also evident that the community is becoming richer thanks to people belonging to different areas of expertise, interest, and activism, starting to join in. I look forward to continuing the discussion at the next camp.

What makes for a good API?

One of the first questions to come up on day two of Open Data Camp was “what is an API?” One of the last issues to be discussed was “what makes a good API?”

 

Participants were asked for examples of application programming interfaces that they actually liked. The official postcode release site got a thumbs up: “It was really clear how to use it and what I’d get, and I can trust that the data will come back in the same way each time.”

Continue reading What makes for a good API?