Open data careers

What is the professional background of the people who have found themselves working in open data? And how are their careers likely to develop in the future?

The answer to the first question is that: it’s very diverse. A session at Open Data Camp 5 heard from people who had started out as foresters, commercial under-writers and as architects. And from people who had begun their careers in large DIY chains and councils.

Just one participant had been recruited to an open data project from university. And he had studied history while he was there.

Become and expert – then do open data

If there was a common thread, it was that people had found themselves working with data, picked up technical skills, and then worked out that open data was a better way of doing what they wanted to do with data than other approaches.

One or two had even started open data projects as a hobby; and then been able to use their open data skills to impress their bosses and get a better job or pay package.

Will that be sustainable in the future? When the session moved on to discussing the skills that organisations need from open data experts, the consensus was that it might be.

You want a data scientist. Are you sure about that?

 
While many government departments, public bodies and even charities advertise for people with data skills, they often advertise for data scientists. But they don’t do spreadsheets, and might not like – or be good at – communicating the results to policy makers, stakeholders and the public.
Someone with sector expertise, who picks up an open data interest and skill-set to support their policy, communication or core interests may be a better fit.

Organiser Henrick Grothuis did a quick search on LinkedIn for open data jobs. Companies were looking for lots of different skills. “This camp looks perfect for people who want to develop diverse careers,” he said. “Open data is a route to picking up skills that a lot of people want.” 

Open Data Case Study: How Belfast found £350,000 in rates revenues using open FHRS data

How do you prove the value of open data?

Here’s how.

FSA/SBRI case study

The Food Standards Agency’s Food Hygiene Rating Schemes data is released as open data in near real-time, and the Department for the Economy in Northern Ireland found a use for it.

Like every authority in the country, Belfast has a ratings shortfall – there are business rates that should be being collected, but aren’t for various reasons. And a bunch of smart people across various parts of the government and city council had a feeling that they could use datasets to improve the collection rate within the city.

To test the theory, the Small Business Research Initiative, which one of those people, Eoin McFadden of the Department for the Economy, describes as “public procurement of R&D to fix wicked issues”, invested in four early stage proof of concept projects that could solve the problem of identifying formerly empty premises that were back in use.

One project used footfall data, another wifi and bluetooth signals, but the other two applied machine learning techniques to a range of public datasets, both open and closed. The project, which ran from July 2016 to March 2017, ran in two stages – and the two machine learning project made it through to the second phase.

Massive return on investment

Over the two week test period, they identified around £350,000 of uncollected rates – from a total project cost of £130,000. If you want to prove the benefit of open data, money talks…

So, how did the FHRS open data contribute to this?

Well, the normal method of checking for missing rates was to target high value empty properties, and manually inspect them. That had a success rate of around 20%. But what if you could work out which properties were likely to be occupied, and target your inspections on them?

The machine learning projects used a mix of closed public datasets, including the ratings data and water rates, and two open data sets:

  • FHRS
  • Companies House data

All those data sets are good indicators of a property in use, but matching them is hard. Hence the application of machine learning derived fuzzy logic to identify properties which are likely to be back in use.

Making it work

There was a fair amount of persuasion needed to make this work. Public bodies needed to be persuaded to provide closed data sets to private companies, under tight non-disclosure agreements, but the effect was remarkable.

Once the systems were up and running, inspection visits based on a list of probably occupation had a 51% success rate – about 2.5 times the average before. Given that rates make up 75% of the councils’ revenue, the protectional for economic benefit from these systems has only just begun.

The Food Hygiene data is valuable in this, both because it is updated in near real time (and updates hit daily), and because it indicates businesses that have begun trading associated with an address. A month’s delay in publication would make it significantly less useful to the project.

Of course, it’s not only hospitality and food businesses that get missed – two examples of business that got found were a hostel and, to everyone’s amusement, a firm of accountants.

Future revenue potential

The two successful projects – one based in Belfast and one in Southampton – are both continuing to develop out their systems, and Belfast is in the process of going forwards with a procurement process to make this a routine part of their ratings work.

That’s a pretty good return on some open data and £130,000 of up-front investment.

Don’t let anyone tell you that open data successes are only small scale and personal…

Making Open Data Camp matter – to local economies and more

What is the value to the local economy of open data – and open data unconferences? The wider benefit of open data to local economies is harder to quantify. There’s no E-MC^2 equation of open data benefit yet.

So let’s talk about unconferneces, and Open Data Camp in particular.

Local value of open data camp

Some organisers have a sense that it stimulates the economy, but no sense of how to measure that. There’s local sponsorship – so they’re expecting some return on that investment. It might be an opportunity to meet potential customers, or to improve their operational intelligence.

Corporate social responsibility is one reason people sponsor: it’s both a community benefit, but it also benefits companies to have a thriving open data ecosystem.

Escaping the gravity of the capital

Just NOT having it in London is a good thing. Holding events away from London can be an incentive.

There’s a distinction between the benefit of open data on the city, versus the value of an open data conference in the city. There’s a clear basic financial benefit to the city in terms of hotel rooms, food and entertainment from events, as long as people are prepared to travel to attend the event. One event in an area succeeding gives other confidence to happen.

Travel is not just about money, but also about time to travel there.

Buy in from the host city makes a big difference. The city saying “no” to some investment in an event can kill it. There needs to be some vision for the city of the benefit, so you can sell it to them.

Unconferences can be wooly as to what the benefit is. Open Data Camp is deliberately avoiding London, both because too many events happen in London, and because people can be resident to it. You get local character and flavour, and you get people who might never come to an event if it wasn’t near to them. And they still get national organisations coming – because they don’t get out of London much, and get to make connections with local projects.

Connections make benefits

Those connections can turn into valuable projects. You’re not just connecting geographies, but different forms of organisation. People who don’t do data can start to see it in a physical way – to understand the data that describes the city they can see around them.

Can we improve the outcomes by theming the event? Or would that corrupt unconferences? People tend to take advantage of the location to discuss local issues – like the interface/divided communities session at this event. And that can be very valuable – giving people insights into unexpected uses of data.

The Queen’s University, Belfast computing department is often empty of a weekend. Why not use the space for events like this. Let people come in and find out new things. Being physically in different places give opportunities to explore new technologies, like iBeacons or VR tech.

Look at the data you have, and the data you can get, and the technologies are coming along – and then the space to think about how to combine them. Ideas start at those sorts of meetings – and we need those case studies.

Catalysing other fields

Bring in other kinds of people – English Lit students could find open data techniques useful in extracting what they need from books. You can avoid massive wastes of time and effort by bring people together in a way that allows them to realise what they can offer to each other.

At the moment, Open Data Camps are open data people talking to open data people. Could we have a Friday where we open up our experts to other people. That means we could say we advised start-ups and students, and contributed to the economy.

Pre-activation of people – letting them plan for hacker spaces, or offering open data surgeries would be possible ideas.

We’re trying to capture the sessions via Drawnalism, and we’re putting that on the blog. But should we be pushing onwards wit it, telling case studies and stories around events or projects that spin out of open data camp sessions and meetings?

But what about the wider benefit of open data to local economies? There’s no E-MC^2 equation of open data benefit yet.

Session Notes

Whose data is it anyway?

The question of who data belongs to, and whether individuals can have a say in what happens to their data, tends to come up very quickly in some areas. Health, for example.

But there is a concern that the whole issue of data collection and use could become much more fraught with the arrival of the General Data Protection Regulation. This is an EU regulation, that is being incorporated into UK law at the moment, via the Data Protection Bill.

The GDPR will require organisations to think about the impact of projects on data privacy at an early stage and to appoint a data protection officer. It will introduce large fines for data breaches, tighten up rules on consent, and introduce some new rights; including a right to be forgotten.

The session heard this last right, introduced following a court case involving Google, could have a big impact on open data sets. Because if people remove themselves from datasets, they become less complete.

As the session leader, Kim Moylan, said: “What happens if people pluck themselves out of data? Do we leave a blank line, or just take what is there?”

“You will have to remove information that is no longer relevant. But what is no longer relevant in a medical record? What about the census? That is a big open data set, if people remove themselves from the census, then what do we do then?”

Aggregate data

One participant felt the public debate needed to be recast. Instead of talking about ownership, he said, the discussion should be about legal rights and restrictions. “If I walk through Belfast, I know will be filmed, but are legal safeguards on that. Talking about ownership just confuses the issue.”

However, the GDPR is coming in. And the general feeling in the session was that if it is going to cause problems for the open data movement, they will arise at the aggregation stage.

As various participants pointed out, when data is released as open data it is anonymised; the open data movement doesn’t deal in identifiable patient data, so it doesn’t need explicit consent to use the data it uses.

However, if people decide to opt-out of a particular data collection, or ask to be ‘forgotten’ and removed from a collection that has taken place, then that will affect the size, and potentially quality, of the dataset being released.

In which case, the big question is how many people will opt out or exercise their right to be forgotten. On this, opinions were divided. One participant pointed out that people already have rights to opt out of their medical data being used in shared care records and some data collections; and hardly any use it.

The Caldicott Review of information governance and security in the NHS will give people new opt-out rights; but there is no reason to think a lot of people will use them.

Practical problems

Still, the practical implications could be hard to deal with. One participant, who uses surveys to collect information asked whether someone who came back and said they no longer wanted their data to be included could ask for it to be removed at every level; the original survey, the aggregation, and the anonymised release.

The answer seems to be yes. “But philosophically, I have a real problem with this, because a policy decision might have been made on that data, and now it has changed.”

Also, surveys might need to be larger in the future, to make sure they would still be statistically valid if a predictable number of people removed their data later on.

Overall, though, the session was positive. It remains the case that most instances in which data is generated and used are covered by well-established legislation that will not be affected by the GDPR.

The Data Protection Bill builds on existing data protection legislation, which is reasonably well understood. Even the right to be forgotten is already two years old.

GDPR – good news?

Indeed, there is an argument that more debate about data protection, and more awareness of the new rules, can only be a good thing, because it will build public trust.

One participant said: “These conversations are happening more and more. We have privacy groups bringing cases. Privacy notices will have to be much more transparent. But I’m quite optimistic. I think once people understand their rights they will actually be more comfortable with uses of their data. The impact will be on companies that are not doing very well at the moment.”

 

Impact of the private sector on demand

Open Data Camp 5, day two, opened with a discussion of the impact of the private sector on open data.

The session was led by Shelby Switzer, who explained she was interested in the subject because she worked for a company in the US [Healthify] that uses a lot of open data about social issues and services.

Open Data in the private sector

The problem: “We find the data sucks and we have to put a lot of effort into making it better,” she said. “So, I want to talk about how we get the providers to do better.

“Also, how to prove to my company that it is better to help to improve the data at source, instead of spending so much time cleaning it.”

Small errors, big problems

When Shelby asked who worked for the private sector, a lot of hands went up.  One participant said he was working on a project about energy sites, and wind turbines.

“Some of the government data on that is wrong. So if someone wants to build a wind turbine, it comes out as being on a farm, or in the nearest village,” he said.

Asked whether he told the relevant government department about these problems, he said he did, and they did get corrected – eventually. But he felt more levers would help. “Sometimes I sell people the idea that if they ask to get it corrected, it can unblight their house.”

A participant working for a government department asked how he found the right person to tell. Which is definitely a problem. “It’s very hard, because you are down to individual records. So one [turbine] record is 500 yards out. To the government, it’s ‘big deal’; but for an individual that can matter a lot.”

Another participant, who is also working on the sustainable energy sector, but in Northern Ireland, where the government is trying to encourage improvements in housing, said she had come across similar problems.

It could be hard to be sure, for a house or flat, who the letting agent was, or the insurer. “We found it matters a lot who you feed back to. You need someone who really wants to meet the original [policy] objective.”

Solutions – or partial solutions

Another participant asked if companies that wanted to use open data had tried building a model for the open data providers to work to? Shelby said her company was looking at that: “if we build a model, can we get agreement they will keep the data up to date?”

What, she asked would encourage a government department to respond? To which the session had a simple solution: money.

However, participants also flagged up some problems. For example, one said: “You start getting people saying well, if people want this data they should pay for it, and we don’t want that.”

A suggested solution was to work out non-monetary deals; for example, time-limited, exclusive access to a new or improved data set.

However, a further participant, who had worked for a council, said it often had FoI requests for parking spaces from a particular company. The council would have liked this dataset, but didn’t have it. It would have cost a lot to generate.

It talked to the company about a deal to generate it, but the business wanted exclusivity, and the council wasn’t comfortable with this. So it didn’t happen.

 

What about private open data?

At this point, the discussion changed track, when an ODCamp volunteer asked why the debate about open data invariably focused on government open data. Why weren’t private companies releasing their information?

Which raised the obvious question: “Why should they?” A company’s data is valuable; to it. However, one participant said the French government is looking at getting companies to release information on, for example, farming practices that impact on efficiency.

“They think that information should be available, and not the property of [a tractor maker],” he said. “But that’s quite French. It doesn’t respect private property: it wouldn’t fly in the US.”

One lever might be to require companies that get public money to release information about their contracts and their impacts as open data. One participant said this had been done, with the Buses Bill; companies must release information on issues like routes, “so you can tell where the buses are.”

However,  this would require most government departments to get much better at writing contracts. Another suggestion was that large companies, which use open data, but don’t declare this, should be required to do so – if only to show how much use and value it has.

Overall, the session felt that while the journey to open government data had been a long one, the journey to open public data was going to be even longer. Not least because there were fewer levers to use.

A participant working for a government department said: “The argument that open data is ‘right’ has not been persausive: instead, the idea that has been persuasive is that it has already been paid for, by the public, so the public should be able to use it.

“And that isn’t there for the private sector.” However, an earlier speaker argued that ideas of reciprocal value might work; companies that release data as open data can then work with other companies looking to get further value from it, to mutal benefit.

Maps, Maps, Maps: good maps, bad maps and accessible maps

What do you do if you find QGIS too easy (and like pain) – you start mapping in R.

But what do people in the room do with mapping, and what data sets do they use?

Putting Open Data on Maps. A good thing?

In Birmingham they used Edubase to plot previous ‘catchment’ areas for schools. Some schools do it from the centre of schools, some from the school gates. And some schools have more than one gate… Some were basing it on distance to the nearest train station. It was about creating boundaries, and then you could set up a tool based on postcodes to see if people are within the boundaries are not.

Mapping challenges

Newer cross rails stations might positively benefit house prices – so it was an interesting thing to map. But what is a railway station? That matters if you are determining distances – is it the station or the platform or the car park? These things matter.

The Ordnance Survey released a green spaces map – which seems to have quite a few holes in it. There are some open tree and green spaces databases. There hasn’t been any analysis done on the relationship between house prices and green spaces. The research tends to focus on negative factors, not positive ones. There’s research about well-being and green spaces, but not house prices.

Tom Smith and the Data Science Campus have started using machine learning to identify areas – and then layer other data on top of that, like house price data. Scottish LiDAR is available, as is Northern Ireland, but it’s somewhat patchy. The centre of Belfast isn’t covered! It’s an issue with how the data was collected.

There’s been too much of companies just shoving data on a map. Often totally useless visualisations on a map are created, with too much data. Should it be four different maps? Or do we need to segment by seriousness. What’s the use case for the map? There are some cases where putting data on a map is actively unhelpful. Wales has such massive variations in population density that just putting data on a map is often not useful.

You need to ask yourself what the narrative is of the data you’re putting on a map.

Mapping disasters

All the Trafford Centre crime was moved to a nearby street having a huge impact on the perception of the crime. There have been other examples, where all the crime without a geo reference was assigned to the police station – which has a massive impact on the surrounding area, which was suddenly a massive crime blackspot, according to the figures.

In the 1920s the American petrol companies took all the railways off the maps they sold, changing people’s perceptions of travel.

A small GIS change can be done at a national level, that has a huge ramification at a local level – like school applications. And that creates real anger. One school deliberately changed their measurement point to the middle of their playing fields to exclude a council estate. It’s about more than points on maps, but special analysis – patterns and correlations might not be readily available.

Mapping Tools

Pub data from Open Street Map can be used for fun maps – like pub densities. The hotspot is in Islington. One person recommended ScapeToad to create cartograms.

There’s plenty of debate on the value of R in mapping. Some people (the Python fans) love it. Some really don’t. There’s options to integrate statistic and mapping stuff with Shiny. QGIS is great for special analysis, by R Shiny is great for pushing tools out that people can explore data with. It can have problems with scaling, though. Choose the context for use carefully.

R is good for learning to code – Jupiter with R is a great combination.

But what about paper maps?

There is an issue of access – 20% of people still aren’t digital connected, and you need paper to access them. And there’s a role for paper maps in exhibitions. People will screenshot Interactive tools to put them in reports – we’re not great at producing aesthetically pleasing maps for those uses. If they screenshot, things like dates and copyright statements are missing. We should make automatic “print this” options.

One attendee has a printed paper map of a count in the 17th century, used for writing. Big, printed maps have their uses. They’re great tools for community interaction in groups.

The French Open Street Maps community used Field Maps to engage with people, getting them to colour in where, for example, schools are.

Map accessibility

This raises the question: are pop-ups in an interactive map a bad thing? Paper maps appear to show everything. But they didn’t really – there’s a real art to what they leave out, to make sure nothing undermines the purpose of the map.

Maps really are the ultimate visualisation tool, but you have to ask what the map is for. But can you really guess what people are actually going to use maps for? Probably not – they will always surprise you.

Do you think about red/green colour blindness – it’s such a simple thing to miss, but it can destroy the usefulness of your map for a percentage of the population or, worse, lead to them misinterpreting them.

3D Maps? Depends on the context. You can do that in Excel 2016. It can be really effective. Physical maps are really useful for blind people – but consult with the people you’re targeting. Not all blind people are the same. Acuity Design are good at this. UCL are experimenting with soundscape for the blind.

Session Notes

ODI Nodes: a state of the nation discussion

One of the first sessions on Sunday morning session at Open Data Camp 5 gave people from the ODI Nodes network the chance to meet and discuss progress, under the Chatham House rule.

ODI Node Map

There’s some tension between the ODI’s suggestion that the nodes might become more commercial, and some nodes aren’t really keen on that direction. Some – including Bristol – have reorganised on a way that would allow the work to continue even if they are no longer a node.

On the other hand, ODI Leeds was founded to be commercialised. It’s unique in that. Any money made is reinvested – it’s a not-for-profit. That allows things like pretty much all free events. They have a huge network of data scientists, agencies, developers – and that allows them to put project teams together. They’ve been working with Adobe (on a better PDF…). They also work a lot with the sponsoring councils, including on Yorkshire flooding.

ODI HQ put out open calls on their website, based on people who approach them. Either nodes or private organisations can bid for those tenders. There tend to be a lot of them in the autumn, to be completed by the spring. Probably something to do with budgets.

Open data in Northern Ireland

What about ODI Belfast? They’ve been running for two and a half years. There’s a team of three, down from four. Their impression is that open data is a bit behind in Northern Ireland compared to the rest of the UK, so a lot of their work is campaigning. The government came out with a strategy to establish a node – after the node was established!

But they need more data release to kickstart the data supply chain. They’re a community-based, storytelling node. They had a 120 person conference, that had a huge impact. The connections made there have led to more projects between commercial organisations and government.

They have a strategy, they have a government commitment, they have infrastructure going into place – and no government at the moment. They’re all hoping for something positive to happen with the government.

This does emphasise how important senior buy-in is – some people feel that government commitment has waned since the changes at the top of the government there.

The Open Data view from elsewhere

ODI Cardiff is run by a commercial organisation. They were banging on about open data before that, but the node status has definitely helped open doors. Wales is a bit like Northern Ireland, a little behind. But a positive debate in the Welsh parliament gave a sense that both sides of the house were behind open data. And ODI Cardiff were cited in that debate.

ODI Birmingham is hosted by Birmingham Innovation, and it has started using the logo on their footer. ODI HQ is going to be doing some training in Birmingham. Several Nodes expressed a desire to offer more training. The commitment needed for the five day train the trainers course in London is a big one – and you need an ODI-accredited trainer to deliver ODI branded courses.

There’s a distinction between awareness-raising sessions and non-branded training, though, some people have found flexibility to deliver what is needed.

There’s also need for a recognition of income disparities in different parts of the country, and the prices they can be reasonable expected to pay.

Relationship with ODI HQ

Some Nodes feel that it’s better to ask for forgiveness than permission, because people in London don’t always have a perfect understanding of local political context.

Everyone seems to want a big meetup of Nodes with the ODI in London. But there’s still a strong sense that affiliation with the ODI is valuable and desirable both practically and emotional. It has both incredible lineage and some significant power. There’s a commercial value in the ODI name.

Money flows from the Nodes to the ODI, not the other way around. There is discussion about non-commercial nodes not paying, and commercial nodes only paying after an income threshold. And that money may be ring-fenced for the benefit of the Node network.

There’s some disappointment that there’s no official ODI HQ representation at Open Data Camp – but also that the Nodes can self-organise, and arrange their own meetings and communication.

Session Notes

Open Data Camp Belfast: Day Two Pitches

A good turnout for a Sunday morning, as we get ready for the pitches. But we have someone significant here…

And we’re off…

The Pitches

ODCAmp 5 Pitches

  • How do you effectively engage the tech community to use open government data?
  • Accessible file formats. Is there life beyond the five stars of open?
  • ODI Nodes – what are they and why should you care?
  • Open data and art – storytelling.
  • The impact of private sector demand on open data.
  • Why map open data? What do you get out of it? What data sources do you use?
  • Create a shared doc about the reasons people don’t use open data – and solutions
  • How do you overcome data quality concerns? Good enough – is it good enough?
  • Who owns the data? How do you get it open, how do you share it – and how do you deal with people withdrawing?

  • How you use open data for public good…
  • Dog-fooding brain-picking. How? What’s the best way?
  • Case Studies of NI open data: what have we done for you lately?
  • Data Catalogues – what are they and why do you publish them?

  • Data Ethics Canvas – ethics before compliance!
  • Open Data Challenges – how can we do them better?
  • Gender data in the UK
  • What do we do about wrong or inaccurate open data? Where are the 100m trees in Belfast?
  • Meta-metadata – why do we publish it? Why should we publish it? What annoys us?
  • Open data careers – there’s a lot of public or ex-public sector people here. How do you get involved – and what’s next?
  • Building APIs for open data
  • How can the community overcome arcane attitudes to data?

  • FSA – the open data journey (with a focus on reference data)
  • What’s the value of Open Data Camp to the local economy?
  • Chatham House session on government data in the face of GDPR
  • How can we help charities do more with open data?

Playing with open data in virtual worlds for real benefits

Christopher Gutteridge and Lucy Knight

Open data can be fun and educational. That was the message of the final session of Open Data Camp 5, day one, as Christopher Gutteridge explained how he came to combine his twin passions of Minecraft and open data.

Playing with data 1

“The story of this goes back quite a way. I kept going to an art gallery on the Isle of Wight, and I wanted to join in. So, I decided to build the seafront in Minecraft,” he said.

“I got OpenStreetMap, and traced it, and then modelled it in the Minecraft world. I printed it out in 3D and put the prints in a gallery. And people paid for them! You can still buy them if you go to Ventnor.”

Lidarcraft

Since then, Gutteridge, who works at the University of Southampton, has used increasingly sophisticated data sets to underpin his Minecraft models. So, he obtained Lidar data from Hampshire county council, to model its trees and identify those with tree preservation orders.  

Then detailed coastal information from Southampton’s Oceanographic Institute to model the coast. Then, very sophisticated datasets of London to create 3D Minecraft models that can be manipulated in real time.

“We have given this to children, and one little girl came in and spent all her time correcting the things I had got wrong,” Gutteridge said. “I had built the white cliffs of Dover in green blocks, and she turned them into white blocks with green carpet. Which was just awesome.”

He is now building a model of Plymouth. Lucy Knight, from Devon county council, explained that this has 20 builders, who are coming up with wilder and wilder ideas. For example: “If you stand in a bus stop, it will teleport you to where the bus is going.”

From virtual reality to real reality

It’s not just about fun. These projects have real-world benefits: “It gives people a real connection with place,” Knight said. And taken into schools and other organisations, it encourages them to come up with great ideas for improving it.

Gutteridge is also getting into Minecraft archaeology; and reviving another idea – open data gamification, or using some of the common tropes of games to add further layers to the Minecraft models.

For instance, Knight said, areas that people felt were dangerous might be built in darker colours; while popular trees might have tree spirits pop out of them. Or tokens might be scattered around a city, to see how people would navigate it while collecting them.

The key point, Gutteridge concluded, is that by using accurate mapping data to underpin the models, it is possible to both encourage people to interact with open data; and to start making changes in the real world that it represents.  

Also, he added, he can easily generate models of towns in England. Just get in touch. If nothing else: “They make impressive but very cheap Christmas presents for difficult friends and relatives.”

Follow Chris on Twitter at: @CGutteridge