All posts by Lyn Whitfield

Open data careers

What is the professional background of the people who have found themselves working in open data? And how are their careers likely to develop in the future?

The answer to the first question is that: it’s very diverse. A session at Open Data Camp 5 heard from people who had started out as foresters, commercial under-writers and as architects. And from people who had begun their careers in large DIY chains and councils.

Just one participant had been recruited to an open data project from university. And he had studied history while he was there.

Become and expert – then do open data

If there was a common thread, it was that people had found themselves working with data, picked up technical skills, and then worked out that open data was a better way of doing what they wanted to do with data than other approaches.

One or two had even started open data projects as a hobby; and then been able to use their open data skills to impress their bosses and get a better job or pay package.

Will that be sustainable in the future? When the session moved on to discussing the skills that organisations need from open data experts, the consensus was that it might be.

You want a data scientist. Are you sure about that?

 
While many government departments, public bodies and even charities advertise for people with data skills, they often advertise for data scientists. But they don’t do spreadsheets, and might not like – or be good at – communicating the results to policy makers, stakeholders and the public.
Someone with sector expertise, who picks up an open data interest and skill-set to support their policy, communication or core interests may be a better fit.

Organiser Henrick Grothuis did a quick search on LinkedIn for open data jobs. Companies were looking for lots of different skills. “This camp looks perfect for people who want to develop diverse careers,” he said. “Open data is a route to picking up skills that a lot of people want.” 

Whose data is it anyway?

The question of who data belongs to, and whether individuals can have a say in what happens to their data, tends to come up very quickly in some areas. Health, for example.

But there is a concern that the whole issue of data collection and use could become much more fraught with the arrival of the General Data Protection Regulation. This is an EU regulation, that is being incorporated into UK law at the moment, via the Data Protection Bill.

The GDPR will require organisations to think about the impact of projects on data privacy at an early stage and to appoint a data protection officer. It will introduce large fines for data breaches, tighten up rules on consent, and introduce some new rights; including a right to be forgotten.

The session heard this last right, introduced following a court case involving Google, could have a big impact on open data sets. Because if people remove themselves from datasets, they become less complete.

As the session leader, Kim Moylan, said: “What happens if people pluck themselves out of data? Do we leave a blank line, or just take what is there?”

“You will have to remove information that is no longer relevant. But what is no longer relevant in a medical record? What about the census? That is a big open data set, if people remove themselves from the census, then what do we do then?”

Aggregate data

One participant felt the public debate needed to be recast. Instead of talking about ownership, he said, the discussion should be about legal rights and restrictions. “If I walk through Belfast, I know will be filmed, but are legal safeguards on that. Talking about ownership just confuses the issue.”

However, the GDPR is coming in. And the general feeling in the session was that if it is going to cause problems for the open data movement, they will arise at the aggregation stage.

As various participants pointed out, when data is released as open data it is anonymised; the open data movement doesn’t deal in identifiable patient data, so it doesn’t need explicit consent to use the data it uses.

However, if people decide to opt-out of a particular data collection, or ask to be ‘forgotten’ and removed from a collection that has taken place, then that will affect the size, and potentially quality, of the dataset being released.

In which case, the big question is how many people will opt out or exercise their right to be forgotten. On this, opinions were divided. One participant pointed out that people already have rights to opt out of their medical data being used in shared care records and some data collections; and hardly any use it.

The Caldicott Review of information governance and security in the NHS will give people new opt-out rights; but there is no reason to think a lot of people will use them.

Practical problems

Still, the practical implications could be hard to deal with. One participant, who uses surveys to collect information asked whether someone who came back and said they no longer wanted their data to be included could ask for it to be removed at every level; the original survey, the aggregation, and the anonymised release.

The answer seems to be yes. “But philosophically, I have a real problem with this, because a policy decision might have been made on that data, and now it has changed.”

Also, surveys might need to be larger in the future, to make sure they would still be statistically valid if a predictable number of people removed their data later on.

Overall, though, the session was positive. It remains the case that most instances in which data is generated and used are covered by well-established legislation that will not be affected by the GDPR.

The Data Protection Bill builds on existing data protection legislation, which is reasonably well understood. Even the right to be forgotten is already two years old.

GDPR – good news?

Indeed, there is an argument that more debate about data protection, and more awareness of the new rules, can only be a good thing, because it will build public trust.

One participant said: “These conversations are happening more and more. We have privacy groups bringing cases. Privacy notices will have to be much more transparent. But I’m quite optimistic. I think once people understand their rights they will actually be more comfortable with uses of their data. The impact will be on companies that are not doing very well at the moment.”

 

Impact of the private sector on demand

Open Data Camp 5, day two, opened with a discussion of the impact of the private sector on open data.

The session was led by Shelby Switzer, who explained she was interested in the subject because she worked for a company in the US [Healthify] that uses a lot of open data about social issues and services.

Open Data in the private sector

The problem: “We find the data sucks and we have to put a lot of effort into making it better,” she said. “So, I want to talk about how we get the providers to do better.

“Also, how to prove to my company that it is better to help to improve the data at source, instead of spending so much time cleaning it.”

Small errors, big problems

When Shelby asked who worked for the private sector, a lot of hands went up.  One participant said he was working on a project about energy sites, and wind turbines.

“Some of the government data on that is wrong. So if someone wants to build a wind turbine, it comes out as being on a farm, or in the nearest village,” he said.

Asked whether he told the relevant government department about these problems, he said he did, and they did get corrected – eventually. But he felt more levers would help. “Sometimes I sell people the idea that if they ask to get it corrected, it can unblight their house.”

A participant working for a government department asked how he found the right person to tell. Which is definitely a problem. “It’s very hard, because you are down to individual records. So one [turbine] record is 500 yards out. To the government, it’s ‘big deal’; but for an individual that can matter a lot.”

Another participant, who is also working on the sustainable energy sector, but in Northern Ireland, where the government is trying to encourage improvements in housing, said she had come across similar problems.

It could be hard to be sure, for a house or flat, who the letting agent was, or the insurer. “We found it matters a lot who you feed back to. You need someone who really wants to meet the original [policy] objective.”

Solutions – or partial solutions

Another participant asked if companies that wanted to use open data had tried building a model for the open data providers to work to? Shelby said her company was looking at that: “if we build a model, can we get agreement they will keep the data up to date?”

What, she asked would encourage a government department to respond? To which the session had a simple solution: money.

However, participants also flagged up some problems. For example, one said: “You start getting people saying well, if people want this data they should pay for it, and we don’t want that.”

A suggested solution was to work out non-monetary deals; for example, time-limited, exclusive access to a new or improved data set.

However, a further participant, who had worked for a council, said it often had FoI requests for parking spaces from a particular company. The council would have liked this dataset, but didn’t have it. It would have cost a lot to generate.

It talked to the company about a deal to generate it, but the business wanted exclusivity, and the council wasn’t comfortable with this. So it didn’t happen.

 

What about private open data?

At this point, the discussion changed track, when an ODCamp volunteer asked why the debate about open data invariably focused on government open data. Why weren’t private companies releasing their information?

Which raised the obvious question: “Why should they?” A company’s data is valuable; to it. However, one participant said the French government is looking at getting companies to release information on, for example, farming practices that impact on efficiency.

“They think that information should be available, and not the property of [a tractor maker],” he said. “But that’s quite French. It doesn’t respect private property: it wouldn’t fly in the US.”

One lever might be to require companies that get public money to release information about their contracts and their impacts as open data. One participant said this had been done, with the Buses Bill; companies must release information on issues like routes, “so you can tell where the buses are.”

However,  this would require most government departments to get much better at writing contracts. Another suggestion was that large companies, which use open data, but don’t declare this, should be required to do so – if only to show how much use and value it has.

Overall, the session felt that while the journey to open government data had been a long one, the journey to open public data was going to be even longer. Not least because there were fewer levers to use.

A participant working for a government department said: “The argument that open data is ‘right’ has not been persausive: instead, the idea that has been persuasive is that it has already been paid for, by the public, so the public should be able to use it.

“And that isn’t there for the private sector.” However, an earlier speaker argued that ideas of reciprocal value might work; companies that release data as open data can then work with other companies looking to get further value from it, to mutal benefit.

Playing with open data in virtual worlds for real benefits

Christopher Gutteridge and Lucy Knight

Open data can be fun and educational. That was the message of the final session of Open Data Camp 5, day one, as Christopher Gutteridge explained how he came to combine his twin passions of Minecraft and open data.

Playing with data 1

“The story of this goes back quite a way. I kept going to an art gallery on the Isle of Wight, and I wanted to join in. So, I decided to build the seafront in Minecraft,” he said.

“I got OpenStreetMap, and traced it, and then modelled it in the Minecraft world. I printed it out in 3D and put the prints in a gallery. And people paid for them! You can still buy them if you go to Ventnor.”

Lidarcraft

Since then, Gutteridge, who works at the University of Southampton, has used increasingly sophisticated data sets to underpin his Minecraft models. So, he obtained Lidar data from Hampshire county council, to model its trees and identify those with tree preservation orders.  

Then detailed coastal information from Southampton’s Oceanographic Institute to model the coast. Then, very sophisticated datasets of London to create 3D Minecraft models that can be manipulated in real time.

“We have given this to children, and one little girl came in and spent all her time correcting the things I had got wrong,” Gutteridge said. “I had built the white cliffs of Dover in green blocks, and she turned them into white blocks with green carpet. Which was just awesome.”

He is now building a model of Plymouth. Lucy Knight, from Devon county council, explained that this has 20 builders, who are coming up with wilder and wilder ideas. For example: “If you stand in a bus stop, it will teleport you to where the bus is going.”

From virtual reality to real reality

It’s not just about fun. These projects have real-world benefits: “It gives people a real connection with place,” Knight said. And taken into schools and other organisations, it encourages them to come up with great ideas for improving it.

Gutteridge is also getting into Minecraft archaeology; and reviving another idea – open data gamification, or using some of the common tropes of games to add further layers to the Minecraft models.

For instance, Knight said, areas that people felt were dangerous might be built in darker colours; while popular trees might have tree spirits pop out of them. Or tokens might be scattered around a city, to see how people would navigate it while collecting them.

The key point, Gutteridge concluded, is that by using accurate mapping data to underpin the models, it is possible to both encourage people to interact with open data; and to start making changes in the real world that it represents.  

Also, he added, he can easily generate models of towns in England. Just get in touch. If nothing else: “They make impressive but very cheap Christmas presents for difficult friends and relatives.”

Follow Chris on Twitter at: @CGutteridge

Maximum Open Data impact for minimum effort

A very popular session at Open Data Camp 5 discussed how to measure the benefit of open data.

Session leader Deirdre Lee, the founder and chief executive of Derilinx, which works with the Republic of Ireland and city of Dublin on open data projects, argued that in the early days, people were focused on publishing data sets.

Now that a lot of data is available, debate is moving onto getting people to use the data – and to realise benefit. So, she said, the questions now are: how do you measure the impact of open data, how do you prioritise which data sets to release, and how do you get government departments to embed this into their everyday work?

Irritating problems 1

Impact:

Participants said it was easy to measure how many people downloaded something; but very hard to get anecdotes about how they were using it.
One suggestion was to put applications or services on top that add perceived value. This may go against the principles of some open data advocates, who don’t think that the people who release open data should build their own services around it.

But the counter argument is that it is more important to make things useful to people than it is to worry about driving downloads.

Another suggestion was just to release more data; and to make sure that more people know about it. Also, to carry out proper benefit evaluations.
Transport for London was in the news this week, after calculating that releasing its timetable and other information as open data was generating something like £130 million of benefit for the capital; via companies building new services with it, or people using it to avoid delays.

Lee agreed that evaluations were vital, and needed to include qualitative as well as quantitative benefits. One session participant suggested this could be done quickly and easily, by asking people to fill in a quick survey or comment about how they were going to use the information when the downloaded it.

Lee said something like this was about to be launched onto Ireland’s data portal. Another participant suggested feedback should be compulsory: if people didn’t value the service enough to rate it, then the government shouldn’t spend taxpayers’ money on it.

Demand driven open data?

Moving on to her second question, Lee said it boiled down to whether data releases should be demand driven. Should departments continue with the current approach, and publish the data they had available, or release datasets they were asked for?

Essentially, the feeling of the session was – both. If organisations though information was going to be useful, they should release it. But if there was a clear demand for a certain type of data, they should see if they could meet it; if they could get a business case together.

Open by default

How does publishing open data become business as usual for public bodies? Participants said the first problem is that lots of government departments don’t know what information they hold. So, the first step is to make sure they have good registers of data assets.

Northern Ireland has a strategy for doing this: and will build terms and conditions into future contracts so information about them can be logged and released.

Several participants argued that departments also need good, solid arguments. For example, releasing data as open data can reduce the number of Freedom of Information Act requests that they receive.

A participant from a government department that has become open by default said it examined FoI requests, to see if data releases could become regular, open data publications. The same participant flagged another big driver: to make sure that departments that released information as open data were also using that data.

Belfast’s Low Power Wide Area Network: how to use it?

Led by: Mark McCann: smart technology team, Belfast council.

Background: There was a competition for a low power wide area network outside London, which has a LP WAN already.

A consortium led by Ulster University won the competition and will pay for a LP WAN that can be used by universities and companies for research. Councils have provided pots of money to address challenges in the city that an LP WAN might address.

NB: A low power network can be used for small amounts of data, intermittently. So, 4G connects all the time; but this uses up power very quickly. Low power networks allow, for example, sensors to transmit small amounts of data at set times, so they retain their power much longer.

Low Power Wide Area Network 1

Care and feeding of the LP-WAN

Questions: What could the network be used for in the city? And what could the data that is generated from such projects be used for, taking account of privacy issues, and other datasets that might be put alongside it.

An example (from a session participant): Dublin wanted to know about pollution at a very granular level. They decided to deploy sensors to pick up on that, so they could make changes, to the bus lanes or whatever.

Ideas (Mark McCann): There are three use cases we have in mind. One is transport: if we could deploy sensors on bicycles, or pedestrians, then we could find out where people go. Another is tourism: we know people come to Belfast and they leave, but we don’t know much about where they go.

And the other idea is logistics: lorries, and the supply chain for retail. Belfast has a sea-port that is a hub for the rest of the country, so we could deploy sensors to find out where things are going. But we would like to run a citizen challenge, to open up the data to people to run projects.

Getting practical

Practical issues (participants): An issue in one area that tried this was that the companies refused to take the sensors; so that is an issue. Planning permission can be a problem: even though the council is behind this, it may need planning permission.
Another city that wanted to do this wanted to put sensors on street lights, but then discovered it didn’t own the lights anymore, so it had to put up poles. Also, you have to be able to maintain and calibrate sensors. What do you do with low quality data?

Privacy: With data that involves people, there is also a real issue with privacy: with the general data protection act coming in, you have to be aware of the basis for collecting information, and to think about how you are going to be able to release it. There is IoT information on the IOC website.

Dealing with results (participants). Cities like Cambridge that have run bike projects have found that tensions can arise when the data reveals how bikes are used. Are they used by visitors or residents? That can change the contracting basis on which facilities are provided, so you need to be ready for challenge on that.

Similarly, traffic and pollution sensors often reveal problems around areas like schools, as people drop children off. How are you going to handle that?

Getting the right projects (participants): You need to get companies, but also arts groups and populations involved. Sometimes, cities do data and IoT projects, and it seems great, but nobody actually uses it. Sensors are very, very cheap. You can go to communities to find out what they want.

Changing policy: Cities that have put sensors on libraries have found that people go in for the day, and that’s because they are looking for community – they don’t want to borrow a book. So you have to be ready for projects like this to change things.
 

The EveryPolitician project

Open Data Camp 5 is taking place in the Computer Science Department of Queen’s University. It’s a modern institituion that wants to make sure its students are ready for work.

So there are rooms that are carpeted with artificial turf, filled with trees, and furnished with garden benches. Of course there are.

The second session of the morning gathered in the garden room to discuss the EveryPolitician project [everypolitician.org], a bid to collect information about every politician in the world, anywhere.

Not an easy job…

The EveryPolitician site says that it has information on 74,939 politicians from 233 countries – and counting. However, Tony Bowden from mySociety, who started the project with Lucy Chambers, explained that this has been very difficult to collect.

The project started by “running a lot of scrapers on a lot of sites” but, because of licensing issues, “we couldn’t quite tell anybody they could use it” and “we weren’t sure it was sustainable.”

So, the project is moving to Wikidata, in the hope that this will become “the canonical source of information about politicians”.

Why Wikidata?

Bowden explained that Wikidata is connected to Wikipedia. There is no single Wikipedia; there is one for every country. So, Wikidata collects information from all the different pages, in a structured or semi-structured form, because otherwise they get out of sync with each other.

On the political front, for instance, Bowden said that if there is an election in an African country, local people will update that quite quickly, but the page in Welsh might not be updated for some time. The idea of Wikidata is to keep them aligned.

Still, collecting information “in a structured or semi-structured form” is not as easy as it sounds. For example, a session participant asked how EveryPolitician defined a political party, given that the idea will be fluid across different political systems.

Bowden acknowledged that, for EveryPolitician: “We came up with a simplified view of the world. We felt it was more important to have something good enough for every country to use, than to capture all the nuances.

“If you want to do comparative analysis across the world, you can’t start with a two-year anaysis of systems. It’s ok to say there will be some kind of political grouping.”

An evolving project

Bowden added that he thought these kinds of issues would be resolved, as people started to use the data. “We think it will be something like OpenStreetMap,” he said. “So, initially, there will be some broad concepts, but as it goes along, there will be people who go along and do every tree in their area, and the nuance will start to come through.”

Another issue is that there may not be a ‘single source of truth’ about politicians in some countries. For example, the Electoral Office of Northern Ireland knows who has been elected to a seat, but may not log changes  – for example, if someone stands down and somebody else is co-opted.

Or there might be official sites, but they might not be very good. Kenya only lists politicians with names starting with the initials a-m (and one person starting p). Nigeria’s politician pages are up to date, but its index is three months behind.

Bowden said EveryPolitician is building tools so that individuals can scrape official sites, and then upload the information to Wikidata, and fill in gaps or correct errors.

What is interesting, he said, is that once a country gets a good list, with people committed to maintaining it, that information tends to be much better – and better used – than official sites.

“If there is an election, the Parliament site might not update until Parliament sits, but the Wikidata pages will update overnight,”Bowden explained. “Then journalists can mine that for stories. So they can instantly tell people things like: ‘the youngest ever MP for Wales has just been elected’.”

Find out more:

The EveryPolitician website has collected 2.8 million pieces of information so far. Tony Bowden has explained the move to Wikipedia and its benefits in a blog post on the MySociety Website. He also Tweets about the EveryPolitician project. 

Open Data GP Registers

Northern Ireland has always needed to keep registers of GPs and other health providers. Now, at least some people in its government and health and social care service looking to release the GP register as open data.

A single list of GPs in Northern Ireland that is available in machine readable format.

Why? Session leader Steven Barry explained: “Lots of government departments have lots of service information, but it is often collected manually, so when somebody leaves it stops, or people do it differently.

“Working in the health service, our statistics guys were spending a lot of time presenting data instead of putting it out in a standard profile and letting the community do things with it.

“Simon Hampton, our minister of finance, is a very young guy, and he was very interested in what is happening in Estonia. So he has been right behind this.”

Technical challenges

There are technical challenges with creating registers. Most obviously, how are they going to be populated? If there is a register of practices and a register of GPs, how are they going to be aligned?

And if there is more information, such as how big a population the practice serves, how is this going to be kept up to date?

One session participant said that in England and Wales, at least, there is no agreement on what a GP is. But at least six organisations hold lists of GPs. Which have different categories of information within them.

Barry’s register tackles some of these issues. But having seen it, some session participants pointed out that it still has limitations. There are identifiers, for example, but no information about what they are or who assigns them or how they can be used.

And the register is only available in a limited number of formats. One participant pointed out, though, that the technical problems can be solved. What is needed is people who want to solve them. Projects need stakeholders and users.

Finding uses and users

So, with a register in place, what might people be able to do with it? “In England, NHS Digital has been allowing people to rate their GPs,” Barry said.

“So if you use Bing, you can search for a GP, and see practices by rating. I looked at the GPs in Paddington and they were all two stars. I thought that must be a bit demoralising for the GP.

“And what I wanted to know was things like how many people does the GP look after, are they male or female, when and where were they trained, and what services do they offer?”

UK governments all want to give people more choice of GP, so other people are going to want the same information. Even though finding a GP with an open list can be a whole new challenge…

And health and care services have been investing in chief clinical information officers; doctors and other health professionals with an interest in IT and data. So they might be advocates for this kind of development. There are champions for change out there, the session concluded. The trick will be finding and using them.

What makes for a good API?

One of the first questions to come up on day two of Open Data Camp was “what is an API?” One of the last issues to be discussed was “what makes a good API?”

 

Participants were asked for examples of application programming interfaces that they actually liked. The official postcode release site got a thumbs up: “It was really clear how to use it and what I’d get, and I can trust that the data will come back in the same way each time.”

Continue reading What makes for a good API?