Open Data Camp 5 is taking place in the Computer Science Department of Queen’s University. It’s a modern institituion that wants to make sure its students are ready for work.
So there are rooms that are carpeted with artificial turf, filled with trees, and furnished with garden benches. Of course there are.
The second session of the morning gathered in the garden room to discuss the EveryPolitician project [everypolitician.org], a bid to collect information about every politician in the world, anywhere.
Not an easy job…
The EveryPolitician site says that it has information on 74,939 politicians from 233 countries – and counting. However, Tony Bowden from mySociety, who started the project with Lucy Chambers, explained that this has been very difficult to collect.
The project started by “running a lot of scrapers on a lot of sites” but, because of licensing issues, “we couldn’t quite tell anybody they could use it” and “we weren’t sure it was sustainable.”
So, the project is moving to Wikidata, in the hope that this will become “the canonical source of information about politicians”.
Bowden explained that Wikidata is connected to Wikipedia. There is no single Wikipedia; there is one for every country. So, Wikidata collects information from all the different pages, in a structured or semi-structured form, because otherwise they get out of sync with each other.
On the political front, for instance, Bowden said that if there is an election in an African country, local people will update that quite quickly, but the page in Welsh might not be updated for some time. The idea of Wikidata is to keep them aligned.
Still, collecting information “in a structured or semi-structured form” is not as easy as it sounds. For example, a session participant asked how EveryPolitician defined a political party, given that the idea will be fluid across different political systems.
Bowden acknowledged that, for EveryPolitician: “We came up with a simplified view of the world. We felt it was more important to have something good enough for every country to use, than to capture all the nuances.
“If you want to do comparative analysis across the world, you can’t start with a two-year anaysis of systems. It’s ok to say there will be some kind of political grouping.”
An evolving project
Bowden added that he thought these kinds of issues would be resolved, as people started to use the data. “We think it will be something like OpenStreetMap,” he said. “So, initially, there will be some broad concepts, but as it goes along, there will be people who go along and do every tree in their area, and the nuance will start to come through.”
Another issue is that there may not be a ‘single source of truth’ about politicians in some countries. For example, the Electoral Office of Northern Ireland knows who has been elected to a seat, but may not log changes – for example, if someone stands down and somebody else is co-opted.
Or there might be official sites, but they might not be very good. Kenya only lists politicians with names starting with the initials a-m (and one person starting p). Nigeria’s politician pages are up to date, but its index is three months behind.
Bowden said EveryPolitician is building tools so that individuals can scrape official sites, and then upload the information to Wikidata, and fill in gaps or correct errors.
What is interesting, he said, is that once a country gets a good list, with people committed to maintaining it, that information tends to be much better – and better used – than official sites.
“If there is an election, the Parliament site might not update until Parliament sits, but the Wikidata pages will update overnight,”Bowden explained. “Then journalists can mine that for stories. So they can instantly tell people things like: ‘the youngest ever MP for Wales has just been elected’.”
Find out more:
The EveryPolitician website has collected 2.8 million pieces of information so far. Tony Bowden has explained the move to Wikipedia and its benefits in a blog post on the MySociety Website. He also Tweets about the EveryPolitician project.