The Open Data Delusion

Jun 6, 2025

The Open Data Delusion

Open Data is about the “Data” as much as it is about the “Open.” Some stories from my experience as an Open Data activist and adviser illustrate it.

This article first appeared on 20 May 2016 on the now defunct online magazine Broken Toilets. It’s still available on Web Archive.

I first met Gail Ramster in 2010 at an event about the release of London-wide Open Data by the Greater London Authority. A researcher on “toilet usability,” she was trying to gather public data to compile a list of toilets accessible to elderly people. Six years later I met Gail again, this time at her office at the Royal College of Art, to discuss her experience. The past 5 years have been for me a whirlwind of Open Data advocacy; first working to increase awareness of Open Data in academia, then as a ministerial adviser in the now defunct Open Data User Group – or ODUG, an advisory panel at the UK Government Cabinet Office. ODUG operated in 2012-2015 to help the Government prioritize data releases, assign funding, and produce policy recommendations.

…

Read more ⟶

QGIS – An ATLAS of buildings by council

Jan 23, 2025

Another little QGIS step-by-step note to my future self.

Prerequisites: This time I was looking to create a visualization of London councils showing the buildings that come from Open Street Map, using the handy downloads from Geofabrik to get the buildings shapefile into QGIS. In order to add the “by council” element, I also downloaded an official shapefile of the boundaries of London councils (e.g. from here) and then joined the two layers using QGIS geoprocessing tools. I’ll leave this to you as exercise, but the final output is a layer with the buildings and a local authority unique ID.

…

Read more ⟶

Playing With Gpx Data from Strava

Jan 11, 2025

I’ve been wanting to try using gpx data for a while. You can record your runs with a variety of apps, and even edit those with a text editor, as gpx is a format that is relatively human-readable. What I realised, though, is that I’d been using Strava quite a lot when we were in lockdown in 2020, my hour of daily air being, often, a jog. Personal best after personal best (which didn’t take much, as I’d never really been a runner before) I got to know areas close to where I lived that were suitable for running, and also venturing for longer runs, including some over 10k.

…

Read more ⟶

Three things about data...

Dec 6, 2024

First published on LinkedIn

Quite a few interesting discussions at Think Data yesterday, both on and off stage.

Three main thoughts from me:

the “future of data” is… overrated. There is a lot of work needed to get the basics right in terms of data quality, seamless data pipelines, understanding of the goods and bads of data. We need to move out of that mentality of blind faith in data as “the solution”. I’ve read a fantastic article recently on how any data pipeline, taken uncritically, introduces dangerous bias at every step: selection bias when gathering data, recency bias when interpreting it, and confirmation bias when getting insight from it. And bias means ineffective data. We need to be applying critical thinking in building solid data foundations, or the future of data will be broken, disappointing, and potentially dangerous.
…

Read more ⟶

Calculating the average face from a set of photos using OpenCV on Colab

Dec 16, 2023

TL;DR Here is the Colab notebook.

A few years back, the the UK Parliament released photo portraits of each Member of Parliament. So, I thought, it would be cool to do something data-driven with that image set. I had worked before with algorithms that allowed to find reference points on a face, using Terence Eden’s code that found the most similar painting to a face.

My first thought was: I could calculate the similarity of each face, and pick the face with the median similarity as the “average face” of the dataset. This, although a potentially good approach, had two shortcomings:

…

Read more ⟶

Making a map of the closest capital using QGIS

Aug 18, 2023

I saw this map on Facebook. It asks what capitals are closest to each Italian town, creating what looks like a continuous map to display the result. I like this kind of thing, so I set out to replicate it using QGIS, and making the process as replicable as possible.

The TL;DR is as follows

I first got the data and boundaries
I found a formula to calculate the closest capital by distance
I found a way to apply a parametric colouring to the map.

In fact, I did this first by using colouring by province. This is because I am familiar with polygons and how to apply colouring to them, even parametrically. But this wasn’t the intended final result, which is at a much closer level. In order to achieve a quasi-replica of the Facebook map, I ended up creating a 5km-grid the same shape as Italy, then colouring each element of the grid using the same function.

…

Read more ⟶