data science hackday

On parliamentary language analysis


picture credits: Tracy Green

If you know me, you also know that I never miss the UK Parliament Hackday. This year turned into a joint event between Parliament, the National Audit Office and the Office of National Statistics, AccHack14 has been run over 2 days at the superbly located NAO offices in Victoria. And this year, I’ve won a prize for “Best Parliamentary App”!

I have long been fascinated by Hansard, the archive of Parliamentary debates.

I’m obsessed by Hansard. Hansard keeps me awake at night.

With these words, I’ve opened my presentation. The app I developed is pretty much a tool to search and analyse Hansard in an uncommon way, using a N-grams viewer. N-grams are pretty much sentences of N words. 1-grams are simple words (like “fox”), 2-grams are two words in sequence (like “quick fox”) and so on.  The tool I developed, Parli-N-Grams, allows the user to search n-grams in the Hansard corpus, inspired by Google Books Ngram Viewer.

You might ask: why is this a good way to search?

The best kind of search is a search that lets you discover by the simple action of searching.

Searching is about finding a result – in this case, finding a certain debate where a word or sequence of words was mentioned. However, by showing graphically the distribution of those n-grams over the years, Parli-N-Grams also lets the user navigate through the data to discover more:

  • how language evolves
  • how topics become more important in certain historical periods
  • how certain words replace certain other words

and so on.

MPs love to talk about benefits

Try plotting the word benefits:


MPs are seemingly using the word benefits in debates more an more. For comparison, plot both benefits and welfare:


You see how they started pretty much at the same level and then slowly diverge over the years? This is utterly fascinating. Another great example is the word war:


War enters abruptly the political discourse in the late 1930s, when the Second World War became increasingly inevitable, peaks in 1944 and slowly declines. What is really interesting is that not even the repeated military efforts in Iraq (see the peaks in 1991 and 2003) have made the frequency of the word return to the levels seen in the 40s. Are MPs consciously keeping the word war away from the debates? You will remember how the debate was getting heated as to whether the Iraqi War should be defined a “peacekeeping mission”. But it’s not just Iraq: not even the Falklands War (1982) gets a very high peak. If you’re curious, see also “terrorism“.

The way MPs refer to politics is seemingly changing, too. See what happens when plotting the word party:


Here we have a massive spike in 1983, a General Election year. Is it maybe because of the Labour-SDP split? After peaking in 1998 (the year after Labour get back into power) the word seems to decline all the way until peaking again in 2009, just before the General Election returned a hung parliament. Another great use I can envision for Parli-N-Grams is to analyse the evolution of language. See, for example, the distribution of frequencies for basically:


How many other such patterns could we discover?

I need to fix a couple of things…

Parli-N-Grams is intended for social and political historians, and passionate language researchers, but it’s not a stable product yet.

I’m working on it, but be aware that:

  • the scripting is a bit rusty, so the website might crash here and then; restart the page if things don’t show up
  • my harvesting procedures were done in a rush, in PHP, without being particularly efficient: as a consequence, at the moment Parli-N-Grams only works for 1-grams (i.e. single words)
  • during the demo I also showed a nice visualizer for the debate transcripts, but I’ve now disabled this feature; I will re-enable as soon as I decide on a way to efficiently search through the files (ideally using ElasticSearch or similar product)
  • I haven’t normalised the results, Google’s Ngram Viewer doesn’t either, and I’m still thinking if it’s more interesting this way or not; I’ll blog about it soon
  • more bad stuff that likely escaped me (if you like sed, please have a look at the filter and laugh at me).

Would you like to know which party mentions “benefits” more?

As I’ve said, I’m working to make Parli-N-Grams stable and usable, so that it can be enjoyed by historians, journalists, and whoever shares my obsession with hansard. The roadmap is as follows:

  1. get Parli-N-Grams to be stable, reactive, working on most browsers
  2. add 2-grams, 3-grams, 4-grams and 5-grams (I’ll likely stop here)
  3. add optional segmentation controls; for example, split by political party, to answer questions like which party mentions “benefits” more?
  4. make the harvesting an ongoing procedure; I would like Parli-N-Grams to update automatically every time we receive new transcripts from Hansard.
  5. add an API to allow people to embed the charts.


If you have any comment/request/idea, please get in touch. Meanwhile, these are the relevant links:

Let me conclude by thanking Nick (National Audit Office), Tracy (UK Parliament) and Matt (Office of National Statistics): you’ve done a great job 🙂 Big credit to the super-smart folks at DXW, who’ve won the overall “Best In Show” prize with a very smart and elegant website investigating housing data, Right-to-Buy-Bye. They’ve also blogged about their experience.

geo open data open source

Open Addresses: a great opportunity

The privatisation of Royal Mail came with a massive Open Data defeat, having the Postcode Address File been left within the assets of the sold company. The Open Data User Group – and many others players in this space – voiced their dissent, but the decision had been taken: PAF had to go with the rest of Royal Mail.

It’s time to move on.

The newly funded Open Addresses project is a great opportunity in this context and a Symposium run by the ODI on August 8th has just reinforced my impression that some of the smartest people in the “Open” community are working incessantly to create a credible alternative to PAF (and to Ordnance Survey’s Address-point). Let’s call this OAF.

The Open Addresses Project represents a great opportunity.

First of all, it’s an opportunity to show that crowdsourcing can be as good as a top-down approach, if not better.

We’ve seen this happening with OpenStreetMap. In many parts of the world, the coverage and accuracy of OSM is not just way beyond that of commercial solutions, but it is kept constantly in check by an army of volunteers and users.

Why hasn’t OSM gone mainstream to show the power of crowdsourcing? Primarily because Average Joe doesn’t get that map means archive of location-based data points. Average Joe reads map and thinks Google Maps. “You can’t use OSM” is a common objection, and this misses the point about OSM.

OAF would incur this danger considerably less. Addresses are readily understandable by people, and there would be no confusion about what the database represents.

In an address file, location is an attribute of the address; whereas in a map, address is a attribute of the location.

There is of course another problem of OSM: it is not perceived as authoritative. I remember a discussion with a travelled contact of mine heading to India and looking for maps in constantly changing area. “Use OpenStreetMap”, I said; “it will never be good enough!”, he replied. Except that when he checked, he found out that OSM had more data than any other map available to him.

Open Addresses could definitely risk a perceived lack of authority. Advocacy will play an important role, together with case studies, independent evaluations, and early adopters, in showing that the Open Address File can become the authority in this space.

Secondly, this is an opportunity for Royal Mail and Ordnance Survey to review their practices about addresses and to improve their own products by providing users with an independent way to assess their quality. It’s maybe an opportunity for them to finally and uncontroversially understand that Openness is a way to increase their business influence – and revenues – and not a way to jeopardise it.

There are unquestionably several challenges ahead, both technical and non-technical.

The technical challenges can be easily enumerated: what is an address? What is the minimum unit of space we want to represent? Where do we stop – street level, floors, flats, units, rooms? None of these will be easy to solve, but the beauty of this project is in its process that intends to bring together addressing experts with users of addresses.

However, considerably more challenging will be the cultural problems children of the monopoly of Royal Mail and Ordnance Survey in this space, the myths about what PAF could deliver and the reliance on restrictively licensed products that might hinder the smooth transition to an open product.

Let me mention some of these cultural challenges:

  • PAF was built as a way to allow postmen to deliver post; it’s a collection of delivery points, not a way to identify buildings, houses, or business premises, although some users have come up to intend it this way; OAF will need to re-state its goals and make them easy to understand for its users
  • PAF can be authoritative in the minds of its users, but it’s not as accurate as they generally believe: duplications and errors are rather common; OAF has the opportunity to be more accurate and have more coverage and will need to be up to this for at least a number of clear use cases. Let me quote ODUG’s response to the PAF consultation:

Royal Mail usually states the completeness of PAF (the principal measure of quality) as being in excess of 98%. However, as Royal Mail determines what a delivery point is, no external body can identify missing delivery points to confirm that measure.

  • evidently, there is no benchmark to assess the quality of PAF; OAF will need to be built in a way that makes this assessment possible and desirable to its users
  • feedback loops will need to be clear, i.e. how to allow third parties to add addresses into OAF
  • the NLPG is often hailed as a solution, when in fact is just another face of this problem, coming with restrictive licensing

All of these issues are difficult, but not unsolvable. I believe that early adopters will play an important part in advocating this new data product, showing that it’s not just as good as the existing commercial not-really-open solutions, but better in terms of reliability, accuracy, coverage.

Most use cases of an address file revolve around the function of address lookup. As such, they come with a great feature: they immediately detect if an address is missing or incorrect. Feedback will play a relevant part in the process envisioned to build Open Addresses.

Hence, let me close with an appeal: if you use addresses in your business, for your job, for marketing purposes, keep an eye on this project and start building your services in a way that allows you to use an Open Addresses file.

social media

IRC journalism in 1999: a nostalgic take on the evolution of the Internet

In 1999, I was 17 and I had been on Internet, as we used to say, for about 3 years then. I was still at school – in Southern Italy – and as part of our curriculum we had to write a monthly essay, having a choice of topics and styles including that of a newspaper article. The only reason why I chose this was that it was the only one for which our teacher would pre-announce the topic (all other being surprise).

Back then, the Yugoslav Wars had “ended”, pretty much, in 1995. No one had ever heard of Kosovo, a small region of what became Serbia, with a relevant Albanian minority. After alleged genocide and the international community condemning Serbia, despite a veto from the UN Security Council, NATO decided to bomb Yugoslavia.

In those years, there was no Twitter, no Facebook, not even Orkut. Actually, not even Google. Blogging was, maybe, something people would write on their home page (mine was provided by my local ISP, and it had a grey background). But the world fascination with the Internet was getting bigger and bigger. People would start talking to random people in public and private chats. Those were the years of ICQ, the beginnings of MP3 as a music medium – when MP3 files could be searched for on Altavista.

My essay about the Belgrad bombing was born on IRC. I joined the channels #kosovo and #belgrad. I started calling random people, declaring my intentions: I wanted to ask about the war, what it was like being there, what were their feelings about NATO, Kosovo, Serbia.

Most chatters, on both sides, were happy someone was asking about them. Some of them were expats trying to get in touch with relatives; some others managed to get an dial-up internet connection in their garage, turned bomb refuge. I spent a number of nights mostly trying to collect balanced views, but more importantly establishing a direct, sometimes emotional connection with people who were living not that far away – my town being about 600km and a fly away from Pristina – and experiencing a terrible situation, in a very polarised way.

IRC helped me gather those view and write my essay. Internet became to me, in the 90s, a superb means of getting to know people who were far away from me, understanding them, perceiving their experience; despite a limited, text-based, medium, that connection could happen. Real-time journalism was a reality; it was by its nature a one-to-one experience, not a collection of tweets on a given hashtag.

We’re moving away from that, and let me be nostalgic: something has got lost. We’re now damn good at aggregating content, but with all of this aggregation we’re probably failing to make that one-to-one connection. The early Internet facilitated that connection; the current, social-based Internet, paradoxically cannot.

gov open data

In response to comments to ODUG’s GP benefits case

I would like to respond to Owen Boswarva’s comments about ODUG’s GP benefits case.

Most of his comments are appropriate, but I think – and I say this very respectfully – that they are slightly missing the point by taking in isolation a series of remarks that, in ODUG’s view, make sense when viewed as a whole.

The short version is: I think that a single and authoritative GP dataset is needed because
1 – there is user demand for this dataset (several data requests on
2 – it would bring improved accuracy and quality to existing datasets
3 – it would streamline an already existing process, unifying the procedures of several entities
4 – it is a natural candidate for the National Information Infrastructure.

Some direct comments to Owen’s points follow.

Those datasets are all reusable under the Open Government Licence, i.e. they are open data.

Some datasets are OGL, but the HSCIC, currently the most complete of the datasets provided, specifically does not permit the use of data “for the purpose of promoting commercial products or services to the public”.
Following from my three points above, the goal of the benefits case is to generate a single dataset under the same licence, and from our demand-led point of view we can’t but ask for a more open licensing.

The same applies to Owen’s point number 3:

ODUG maintains that the GP practices data on the HSCIC site is not open data, and points to a page about “responsibilities in using the ODS data”. However HSCIC has recorded that dataset (EGPCUR) on as reusable under the OGL. (The ODS “responsibilities” page seems to written for NHS users. A literal reading only permits use of the data in connection with NHS-related activities, which is obviously not the actual licensing position.)

We are asking for clarity. If explaining all of this requires several searches and a blog post, I believe we are in the right asking for an improvement to the current situation.

The ODUG criticises the NHS Choices dataset as follows:


“the branding of the NHS Choices dataset as a ‘Freedom Of Information’ dataset is troubling from an Open Data perspective, mainly for is “on demand” nature: a FOI data release, being a reactive response to a request, does not establish an ongoing process; while data release under an Open licence often comes proactively from the publishing entity, which in doing so creates a sustainable data update procedure”.


I think this is rather over the top. NHS Choices hasn’t “branded” the data as a FOI dataset. It has merely made it available, along with a number of other useful data files, in the FOI section of its site. It would be nice if the NHS Choices site also had a dedicated open data landing page. However it’s perfectly sensible to draw users’ attention to existing datasets that they may want to know about before submitting a FOI request. NHS Choices says the data files are updated daily, so they are clearly not being published as a “reactive response” to FOI requests.

I’m sorry if Owen takes offence at the wording of this, we are not critical of NHS Choices. Our engagement on this topic with NHS England, moreover, has been totally collaborative and so far positive.

Our point can only be understood by remembering that it’s in ODUG’s DNA to approach Open Data from the user’s point of view. FOI and Open Data cover different aspects of the transparency agenda – both aspects are immensely important but come with different expectations from the users. Releasing Open Data under a FOI portal is confusing and, my personal opinion, semantically incorrect. “Branding” here is used in this sense, it doesn’t intend to be controversial.

Owen’s point that it’s good to attract users to existing datasets before they submit a FOI request is absolutely spot on and I totally agree with it. ODUG is not against a better engagement between FOI and Open Data, actually the opposite. It’s just that in this specific case we question the user experience.

There’s nothing wrong with arguing that existing datasets could be made more useful by improving the quality, or updating them more frequently, or appending data from other sources.

That’s the point of this benefits case. Showing the way for an improved quality of the dataset.

But we can have those arguments about most of the nation’s information infrastructure. A dataset doesn’t need to be ideal to be authoritative in practice.

The HSCIC and NHS Choices datasets are produced by the relevant official body, they are in wide use, and there are currently no better equivalents. The datasets are therefore, on the face of it, authoritative.

We need to start somewhere 🙂

“Authoritative” goes in conjunction with “to whom”. Yes, they are authoritative to the relevant official body; they are not perceived as sufficiently authoritative by the end users we aim to represent. We wish to make sure that the final authoritative dataset is a sum of the accuracies of the several datasets currently existing.

ODUG proposes that DoH establishes “an ongoing process to build, update and maintain on an authoritative dataset of medical practices and operating practitioners, drawing on the datasets made available by HSCIC and NHS Choices”.


I’m not sure how ODUG expects DoH to build an authoritative dataset by drawing on datasets it has dismissed as non-authoritative. ODUG’s call is to DoH, but in practice DoH would surely delegate any such new process to HSCIC. So what is ODUG proposing HSCIC should do differently?

The datasets has not been dismissed as non-authoritative. What we are arguing is that the three datasets are only partially overlapping, and this is not controversial. We are calling for an “official” process of aggregation of such datasets so that it can be referred to as the authoritative source.

So the question is not what DoH or HSCIC could do differently; it’s about collaboration. The best outcome, the most accurate and reliable dataset would come from an integration of the various processes that collect such data, each of which aims at different objectives and follows different procedures. Collaboration and integration would create efficiency and build capacity in the bodies involved. It would be a win-win situation.

Maintaining the new dataset on is also unlikely to add credibility, given the current state of the DGU catalogue and other functionality. HSCIC already has its own platforms and they seem serviceable for the publication of data. What in the ODUG proposal requires the involvement of

DGU has certainly some problems, but in our view we need to ask for more. We want DGU to be the index – not necessarily, in all cases, the repository – of all public open data. Hence, it makes sense for us to argue for having these datasets on DGU. Whether they end up being hosted on it, or just linked from it, it’s not for us to decide: we are confident that initiating the process of data integration is more important, and once the procedure has been identified, linking to or hosting the dataset will just come as a due consequence.

I’ve never been entirely on board with the idea of submitting “benefits cases” for release of open data, because it seems to conflict with the principle of “open by default”.

Why see this as contrasting goals? There is of course an ongoing struggle between the ideal situation (everything open by default) and the practicalities of actually establishing procedures for the release of datasets. We use benefits cases to priorities such releases or, in this case, to make sure the data is aggregated in a way that meets users’ requests.

In this instance ODUG seems to be arguing for creation of a new data product, combining the existing HSCIC/NHS Choices datasets with data from other sources such as GMC’s Medical Register and patient acceptance criteria for each GP practice.


That last source in particular would probably involve quite a bit of ongoing administration and processing, as patient acceptance criteria are not held centrally or in a standard format.

The fact data are not held centrally doesn’t mean that the data do not exist or that they should not be, in some form, available to the public. Electoral datasets are a clear example of this. Whether it’s new data, or newly collected data it doesn’t make a massive difference in our view of demand-led prioritisation of data releases.

Arguing for release of existing data is one thing. Arguing for the creation of new data products and new processes is something more.


I have no doubt there is room for improvement in the existing open data that HSCIC publishes on GP and dental practices. However public datasets are mainly produced to support a public task. I will be surprised if DoH takes up these ODUG recommendations without a more detailed demonstration of why the existing data and processes are inadequate to meet the requirements of the agencies and public bodies it supports.

Not even the official terms of reference ask ODUG to stop at simply arguing for the release of existing data; moreover, we aim at sustainable and efficient data releases. Sustainable data releases require an established process; where this process is already established, we can help identifying inefficiencies and overlaps and act as a bridge between the several organisations involved. We believe that a new dataset, collected in the way discussed, would greatly improve the public tasks of the agencies involved; preliminary discussions with representatives of such organisations suggested the conversations between them would benefit from a single data source.

My last point: we are contributing to the definition of what the NII should be. A GP dataset is for many reasons an obvious candidate for the NII. Given we are at this stage, and given the possibly great impact the NII might have on public tasks, it is in my opinion the right time to argue for a review of the way the GP data is collected and made available, and achieving results in this area would be an encouraging blueprint to follow in other contexts.

data science my projects research

Husband and wife: analysing gender issues through literary big data

Some time ago a friend made me realise the peculiar distribution of the word gay in English literature: relatively common in the 1800s, then in decline, then in massive recovery after the 1970s. Of course, the word here is used with two different meanings, the first one (“light-hearted, care free”) more common in the past few centuries, with the second (“homosexual”) going mainstream in the latter part of the 20th century. All of this can be easily visualised using Google Ngrams.

I became rather curious about this because I realised that gender issues have often been written about in literature; also, the ways in which familiar scenes have been depicted could easily be a proxy to understanding the relationship between the genders, especially in their strict, unchanging view often purported by traditionalists in our society.

So I charted four words: manwoman, husbandwife. The result is enlightening.

You see, it’s not just that “man” dominates. This can be explained in many ways, especially by the common use of “man” as a synonym of “human being”. The sudden growth in the latter part of 1700 is pointing to several phenomena happening in those years, from Enlightenment to the French Revolution.

Some data points:

  • “husband” is rarely used, compared with “man”; the ratio is about 1 to 10
  • conversely, “woman” and “wife” follow a similar trend with a much smaller ratio
  • “wife” has been used more than “woman” until the late 1800
  • “woman” becomes increasingly more important than “wife” after the 1970s.

Isn’t that a rather accurate description of what happens not just in the English literary corpus but, more widely, in society?

open data

On the open ended nature of Openness

Some days ago I was joking with a friend of making t-shirts with a “Open Data is my mission” slogan. The problem of that mission is that its object is not particularly well defined.

I was involved in a couple of interesting discussions via Twitter about this, with a couple of people whose opinion I really value. On the day TfL announced their new I tweeted my happiness about their Open Data licence. My happiness was not shared by Adrian Short:


Adrian suggested their API was all but an open one; that as it had restrictions, especially the requirement to register for a key, it could not be linked to the adjective “Open”. (The whole conversation can be accessed here).

In a similar direction went a quick exchange with Aral Balkan:


These two conversations highlight a curious problem in the “Open” communities, whether they are -Source, -Data, -Whatever: we’re talking about some very loosely defined concepts. As I say in my response to Aral: Open Data is just a phrase – what really matters is the licence attached to the data.

Openness is measured on a continuous scale. If there is a threshold below which we shouldn’t call some data “open”, that threshold has not been defined yet. It’s relative (to the data, to the context, to the country, to the user), it’s flexible, it’s got several possible meanings.

My personal position is to call open data whatever comes with no use restrictions (i.e.: you can use the data for whatever purpose you like). In legal terms, however, this gets complicated because we need to assign a licence to the data. When working with ODUG, for example, I always make a point of not accepting data releases with anything less than an Open Government Licence (or its Creative Commons / Open Database Licence equivalents).

Furthermore, in the not-so-public sector (which is what I generally call TfL), things are clearly complex, especially given the expectation (which I do not personally agree with, but this has no effect to this discussion) that a transport agency in a metropolis should be profit-making. TfL’s licence is probably not the best worded ever, but it is an Open Licence:


Yes, as Adrian notes, TfL can revise the Licence at any point. But until they do, they allow to copy, adapt, exploit the information with only requirement that of attribution. This is not much different from OGL.

Does the requirement to sign up for an API key justify the critique? This is clearly a complication that comes from the real-time nature of this data. A system with such a huge amount of data generated in a short time needs provisioning, and the best provisioning comes from knowing how many users can access the system. In this case I don’t think that having to register for a key is affecting the openness of the data because there is no restriction on who can register. Of course an improvement would be to have the possibility of anonymous registrations and I would support this; however, the SLA might still give priority to users who are not anonymous, simply because it knows more about their requirements. Openness is a compromise, one that comes from opposing needs clashing.

The non-real time datasets could be distributed without registration, this is where I agree with Adrian, but I don’t think this can justify the negativity against this data release, a step that goes in the right direction. Does anyone want to bring this up with TfL?

On a similar note, Aral initiated a somewhat long and inflamed thread about a similar issue: the use of Open Data and the expectation that from something open should descend something open. In this case the focus was on my friends at @transportapi, whom I think are doing a great job of showing how Open Data can create business.


(Full conversation here).

Some interesting questions emerge from the thread:

  • Aral Balkan: “How’s this not closing off open data via a proprietary system only to license it commercially via an API?”
  • Emer Coleman: “We are DaaS provider. 1,000 hits a day for free then charge per hit with SLA’s once exceeded but also don’t have any IP on downstream products or services and more open licensing”.

The thread goes on and on with similarly opposing views. The question emerging is one: is there a (moral) obligation for Open Data users to be as open as the starting data?

I will take the “risk” of being seen as an Open Data moderate: my view is that this question doesn’t have a straight answer, it all depends on the level of maturity of the Open Data movement in that specific context and the product. Once again, as in TfL’s case, we’re talking about a relevant amount of real-time data. In this specific case, the data is heavily modified by Transport API to make it cleaner. It is a relevant chunk of work. It would be unsustainable to provide it for free and without registration the service level would soon degrade. Hence, once again we need a compromise. Building sustainable businesses on top of Open Data is still something new. But sticking to the legal: the licence does not place limitations on the use of the data. This can be open enough for some and not for others. “Open Data is a broad church”, says Jonathan Raper in the same thread. Sustainable Open Data-powered businesses create a virtuous circle that encourages more data releases, and I think we should welcome it.

One final note: we should probably stop capitalising the words “open data” and accept that multiple views will always be possible. Once again, open data is a compromise, as this debate shows. By keeping it on we can make that compromise produce useful results and the openness agenda advance.

smart devices Web 2.0

The smart thermostat is hot

“Of all possible devices”, remarks a friend, “a thermostat is a curious choice for the first mass-marketed smart device”.

His analysis is about Nest, recently acquired by Google and now widely advertised on billboards everywhere, including in the Tube. “This is not the kind of object you use every day; it’s also too simple – you just switch it on when it’s cold and off when it’s warm. People know how to programme a thermostat, my grandma does it.”

One cannot but agree with observation of simplicity, except I think this is exactly what makes a thermostat a great choice for a smart device.

First of all, it’s obvious that a thermostat makes your life better by allowing you to pre-heat your home at given times of the day. A standard home thermostat is not particularly flexible, though. Some thermostat allow a different programming for week days and week-ends, but any such complication is seen as clunky and requires the user to adoperate a somewhat tricky interface. Curiously, the mechanical thermostats (those in which you just pull a little lever up and down) are amazingly simple to use, but have you ever had to programme one of those digital thermostat? I still find it difficult to do it without a couple of missed attempts.

Nest does just one further step: it simplifies, in what is after all a short time, the need to programme a thermostat. It learns its users’ preferences. By doing this, it has further streamlined an already simple process. That’s what makes it a winner: it makes your life better and easier.

Another simple observation is also that Nest is not a dangerous device. It replaces a well understood process. It’s hard to operate it in a way that it can cause real damage. Compare it with the other possible “smart” devices and it’s pretty obvious that the balance between danger and functionality is another winner for Nest. The fact that it learns, also, means it will correct any statistically non-normal configuration pretty quickly. (Now, please, don’t use it to kill your great-grand-mother by overheating her room).

Needless to say, using a smart thermostat can also have a big impact on your heating bills. I think this will be a positive impact for most users, resulting in savings. Other smart devices cannot make a similar claim, and this is another reason why Nest makes sense as the first smart device to go mass-market.

A smart thermostat is not a curious choice because it is all about simplification, improving life, and allowing savings. Not many other smart devices could do the same, and I reckon that Nest is the trojan horse that will make the general public finally appreciate the need for smart devices.

my projects

2013 wrap-up

I like to pretend I have this tradition of doing a year-end blog on 31st December listing some of the best things happened in my life as a techie. This year, I’d just like to list a couple of highlights. Here you go:

  • last February I re-launched, for the third year in a row, my Live Rugby app. Differently from the previous versions, it was without Opta data. As Opta data didn’t really help with user acquisition (I’m not blaming them – rather the way I packed the data) I decided to invest in my own time and delivered a free app with my live commentary to the matches. Results: over 10,000 downloads, once again some great coverage on the press, and at one point the app was more popular than the official one, reaching #13 on the App Store, and bringing some revenue via ads;
  • meanwhile, I kept working  on, my “day job’s” Linked Open Data portal which I launched in the first half of 2013. A project that started last year with some lack of interest – if not open opposition – was finally welcomed as a way to deliver FOI requests and increase transparency;
  • I took part to a research project investigating death rates in the Hospital Episodes Statistics dataset, helping with the geographical analysis, and co-authored a research paper which was published on PLOS One; I also co-authored a paper about geographical reporting tools for international cooperation, presented at FOSS4G;
  • partially as a result of and in recognition of my experience with data-related projects, I was appointed to Cabinet Office’s Open Data User Group in September;
  • after some months of market research and trials, I helped Italian indie comic publisher develop and launch their digital books distribution platform, Digitail. I became CTO of Digitail last December;
  • I was a judge at Young Rewired State and at the Big Bang Fair this year. Both experiences really got me enthusiastic about the future in tech and science, there are so many talented young people around!

Many of these projects will be continued in 2014. What I’d love to add to the table:

  • cheese-making: despite some success making mozzarella, I’d still like to get more experience and share it; stay tuned!
  • photography: I’m starting to find some assignments to take professional portraits. Let’s see how it goes.
  • more apps: starting from Digitail, of course.

Hence, I’m really looking forward to the new year and its challenges 🙂


Mr Speaker, hire a hacker

Inspired by the flow of tweets coming from the #eParliament event by Hansard Society, and just on the way back from being a judge at the Land Registry Open Data Challenge, I tweeted about an old idea of mine: that of a hacker-in-residence at the Houses of Parliament.

twit1I had some interesting responses to this, and someone even asked Mr Speaker what he would think about it.

twit2-HansardSociety-status-405804772667568129However, some of the most common questions or comments I received asked

  • if a hacker in the houses is going to break anything
  • if a hacker in the houses is going to replace or, worse, be in competition with the current IT staff
  • if a hacker in the houses will have to deliver specific results under direction from the parliamentary ICT management

and so on.

These questions suggest a general failure to understand the general concept of a hacker-in-residence (or, more likely, my failure to explain it), a concept that has been experimented with by many other institutions in the past few years. Medway Council are hiring a geek-in-residence, for example; the Australian Government offers funding to Arts Organisations that want to hire one.

To me, these questions are simply missing the point: a hacker-in-residence is not your standard IT person. The goal of hiring a hacker in residence are, actually, opposite to that and only a blind management would want to get just-another-IT-guy.

A hacker-in-residence is akin to an artist.

The idea is to get a person that responds to ideals of independence, creativity, tinkering, and liberal experimentation; this person should be hired for a limited period of time, and be left to play with things. This is what a hacker is meant to do.

A hacker-in-residence, in other words, should not be a corporate animal; “Reporting to management” shouldn’t be part of his or her vocabulary; competition with current IT staff should not take place, as their goals and targets are different: a hacker-in-residence, differently from IT staff, is not mandated to return any deliverable at the end of their tenure.

The hacker-in-residence, in other words, is tasked with approaching freely and creatively ordinary problems and suggest alternative ways to attack them.

When I say this role is similar to that of an artist, I’m actually talking from experience. During my time as a research fellow at UCL, the institution hired an artist-in-residence.

The artist-in-residence would simply walk around the building, talk to researchers, academics and students; watch over their daily problems, learn about their research projects, see their outcomes; and, eventually, produce art inspired by these experiences.

A hacker-in-residence should just do the same.



My 40 days without a mobile phone

I’m often clumsy with my hands. If I call someone and that person does not respond, I keep calling, and I often get increasingly nervous. And increasingly clumsy.

In about 13 years, I dropped many phones, as you can imagine. This was not a massive problem when mobiles were as big as a speakerphone. Early Nokias were impressively sturdy. I dropped phones from 2 floors and they survived. Some of them ended up on the treadmill while I was exercising, flew away, and survived.

Unfortunately, my Nexus 4 didn’t survive its very first drop. Not even 2 months after I bought it. It wasn’t a particularly strong hit; I read it’s a particularly delicate phone.

Being the third phone I destroyed in 12 months, I decided it was time to try and live without a phone for some time. Today it’s been 40 days.

What’s life like without a smart phone, you ask? Let me give you some example.

What time is it? The first shock you have is that you no longer have a way to tell what time it is. I mean, many people still carry wrist watches, but most actually don’t and rely on their mobile to tell the time. The first time I realised about this it was roughly… erm, I don’t know how long it was! But definitely a short time. I had to get into a shop to check the wall-mounted clocks.

Your train is cancelled Checking the status of public transport was something I got used to. I’m a big user of National Rail trains, much more than the Tube. I would simply check my favourite/closest stations and trains and plan a walk or a suitable journey to get there in time. Cancellations would show up on the screen and I would adapt accordingly. Without a phone, you have no choice. You get to the station in time (provided you have resolved your lack of watch) and hope for the train to be confirmed and on time.

Let’s meet around I lived my youth in the 90s and early noughties. Back then we didn’t have such a big penetration of mobile phones in the market, especially in my age group  (my parents strongly opposed buying me a mobile until I got old enough to buy one myself, in 2000). You would arrange meeting your friends at a certain time, at a given meeting point, or even have a group-meetups every time in the same spot. In my home town the traditions of a comitiva (20+ strong group of friends) meeting for years in the same spot is still going on. Living without a phone is a bit going back to my youth. Most people are sympathetic and accept the idea of having a meeting place and time, rather than a generic “See you around”. You become less inclined to be late, if you know the other person might be waiting. You also accept that after waiting for some time you can assume the other person won’t be showing up and do something else.

The world around you This gives you so much more time to think, walk, look around, see shops, look at people; to read while travelling in the tube without having that urge to take your phone out and check your e-mail at every stop, when a connection becomes briefly available. It gives you time to discover and experience the world around you.

Hand drawn maps My favourite bit, being a geo geek and cartography fan, is that if I need to go to somewhere I haven’t been to before, I can no longer use Google Maps/Bing Maps/Nokia Maps/Apple Maps/OpenStreetMap. The alternatives are buying a London A-Z or other available printed map, or… draw one myself. And, as you probably imagine, I tend to do the latter. It’s a great experience that helps you relate with the space around you, appreciate distances, think in terms of route and points of interest on that route.

Wrist pain For a long-standing sufferer of tendonitis (luckily never ended up as RSI), one of the biggest effects of not using a mobile is that I no longer have that constant sharp wrist pain that used to come along with me everywhere. And I now know the reason why!

Bye bye mayorships On a lighter note, I’ve also lost all of my Foursquare mayorships, starting with my gym and local cafe. Seriously appalling. I’m no longer spamming my friends using Path about my caffeinic whereabouts. But… does that really affect my life? Not really. I can actually tell my friends about my favourite places, and discuss their own.

You might be wondering if I’ve become a luddite. I’m definitely not and if you know me you probably appreciate how my life is intertwined with technology and gadgets. I’m not saying I will go without a phone forever. But I thought I could have never, ever lived without a phone; without that constant stream of information; without being constantly online. That finding myself in that situation would have been unthinkable and very hard. Well, I’m here to say it wasn’t that difficult.