gov, open data, policy

20 UkGovCamp thoughts

1. One of the best camps I’ve attended recently
2. I want @davebriggs’ shirt
3. “Only thing that makes you special is tax payer funding” is the most stupid thing I’ve read
4. “We are special because we’re here on a Saturday, in our own time, trying to solve the taxpayer’s problems” is the best reply I managed to give
5. So many local government officers around is a very good sign
6. So little councillors/politicians around is a not so very good sign
7. Tree macro-areas for discussion and action: Transparency, Participatory democracy, Data geekism
8. I’ve finally met Baskers!
9. In the LocalGov/PublicSector communities I know more people than I thought
10. Social Media strategy evaluation is difficult (not just in the public sector): how can you evaluate a conversation?
11. Defining the goals of that strategy is the most interesting part – and the outcome of that evaluation is not necessarily a number
12. Kudos to @LinkedGov and @danpaulsmith for an amazing service and session showing how Linked Data can become interesting and useful to everyone
13. UkGovCamp is political, but it involves people with very different ideological backgrounds
14. I’d like to see more people from the public sector taking part to this: many problems are similar, as it is resource availability
15. Wow, I can present myself to an audience with a microphone. I used to go piping red doing that!
16. Never catch flu the last day of such a great event
17. Next time: get speaker/organiser’s name on the agenda. It helps identifying it.
18. I might accept @the_anke’s offer of conversations in German, next time ;-)
19. Open Data is great but we need to define what it is, how to share it, and how to get people engaged.
20. Government (GDS) involvement is a great and exciting thing, but open data (and the movement) will succeed only with citizens/developer/activists maintaining ownership of action

Standard
gov, open data, policy, Web 2.0

Making Open Data Real, episode 3: the corporation

I have submitted my views to the Public Data Corporation consultation. Here are the answers.

Charging

Q1 How do you think Government should best balance its objectives around increasing access to data and providing more freely available data for re-use year on year within the constraints of affordability? Please provide evidence to support your answer where possible.

 I strongly believe that the Government should do its best to keep free as much as data it’s possible. In all honesty, I believe that all data should be kept free as there are two possible situations:

– data are already available, or refer to processes that already produce data, in which case the cost of publishing can be kept relatively low;

– data are not available, in which case one should ask why this dataset is required.

In the second case, I would suggest that the agency releasing such dataset could gain in efficiency, justifying the release of the data for free to the public.

There is also a consideration of what a data-based business model should look like. I think companies and individuals using public data as a basis for their business are finding it very hard to generate ongoing profit based on data only. Which brings me to the idea that charging for such data might actually make such companies lose their interest in using them, with a loss of business and service to the community. 

A good example to this point is represented by real-time transport-related mobile apps: they provide, often for a price that is very low, an invaluable service to the public. These are data that are already available to some agencies, as they are generated by a process of driving the transport business to higher efficiency and effectiveness by knowing the location of the transport agents (buses, trains, etc…). Although in some cases this requires costs for servers to support a high demand, in absolute and relative terms we are talking about limited resources. Such limited resources create a great service to the public, effectiveness for the transport company, and possibly some profit for the entity releasing the software. The wider benefit of the release of these data for free is much more important than the recovery of costs through a charge. That’s why I question in first place the need for a Public Data Corporation, if its goal is just that of charging for access to data.

 Q2 Are there particular datasets or information that you believe would create particular economic or social benefits if they were available free for use and re-use? Who would these benefit and how? Please provide evidence to support your answer where possible.

 Surely, transport and location based datasets are the most important: they allow careful planning by the public and, as a result, a more efficient society. But I would not talk about specific datasets. I would rather suggest the Government to have an ongoing relationship with the data community: hear what developers, activists, volunteers, charities ask for, and see if such requests can be satisfied by issuing a dataset appropriately.

Q3 What do you think the impacts of the three options would be for you and/or other groups outlined above? Please provide evidence to support your answer where possible.

 As I outlined in Question 1, I think data should be kept free. Hence, the best option is Option 1, provided that there is a genuine commitment to release more data for free. As I said the real question is whether data are available or not. When data are available, publishing and managing their update is a marginal cost to the initial process. When data are not available, the focus should be moved to understanding whether their publication can improve ongoing processes.

The freemium model works in the assumption that there is a big gap in the provision of a basic version of the data with respect to a more advanced service. I do not believe that this assumption holds for most of the datasets in the public domain.

Q4 A further variation of any of the options could be to encourage PDC and its constituent parts to make better use of the flexibility to develop commercial data products and services outside of their public task. What do you think the impacts of this might be?

I think that organisations involved in the PDC should keep to their public task. 

The risk in letting them develop commercial data product outside the public task is that the quality of the free portion of the data would plummet.

Q5 Are there any alternative options that might balance Government’s objectives which are not covered here? Please provide details and evidence to support your response where possible. 

I cannot see any other viable alternative, unless we consider the very unpopular idea of asking the developers for part of their profit, if any, in a way that shadows the mobile apps market. However, I think that the overhead in doing so is not worth setting up such a system.

 

Licensing

Q1  To what extent do you agree that there should be greater consistency, clarity and simplicity in the licensing regime adopted by a PDC? 

I think that realistically developers and other people interested in getting access to public data want to have clear and simple terms and conditions. I am not a legal expert and cannot possibly comment on the content of such licensing regime, but I would like it to be clear, short, and understandable to people who are not lawyers. The Open Government License, and any Creative Commons derivative, is a good example.

Q2  To what extent do you think each of the options set out would address those issues (or any others)? Please provide evidence to support your comments where possible.

Once again, I would like to stress the fact that the Open Government Licence is the ideal licence for any open-data. This would suit Option 3: creating a single PDC licence agreement, with a simple, clear, short licence to cover all situations. Option 2, an overarching PDC licence agreement that groups all commonalities of a number of licence, is possibly a second best, but it comes with a great risk of lack of simplicity, and confusion.

Option 1, a use-based portfolio of standard licences, would possible make sense in terms of clarity, but it complicates greatly the management of legal issue for the licensees. The consultation highlights that “rights and associated charges [would be] tailored to specific markets”, making it very difficult to understand such licences.

Naturally, if these licences need to be more restrictive than the Open Government Licence, I still think that a single restrictive licence, on the model of what the State of Queensland in Australia has done, would be the best idea for maintaining clarity and simplicity.

Q3 What do you think the advantages and disadvantages of each of the options would be? Please provide evidence to support your comments

It’s very hard to tell at this stage, but I think that overcomplicated licences would greatly slow down access to the data and, consequently, delay the development of services to the community and the possibility of creating sustainable business. That’s why my choice goes to a single PDC licence agreement, possibly the Open Government Licence itself, in order to get services quickly developed and available. 

 Q4 Will the benefits of changing the models from those in use across Government outweigh the impacts of taking out new or replacement licences?

I reckon there will be situations in which changing the models will have a positive impact as well as some cases in which there will be a local negative impact. We need to look at the overall benefit to society.

 

Oversight

Q1  To what extent is the current regulatory environment appropriate to deliver the vision for a PDC?

I would say the current regulatory environment is appropriate and ready to deliver the vision for a PDC, having already produced a very effective OGL. The problem is not in delivering the PDC, it is rather in questioning the need for the corporation tout-court.

 Q2 Are there any additional oversight activities needed to deliver the vision for a PDC and if so what are they?

 The only oversight activity needed at this stage is a deep analysis questioning the need for a PDC. I would strongly recommend to question the need for charging and using licences other than the OGL. A PDC charging for data risks to destroy the thriving open data ecosystem and deprive the community of great services. The development of a rich ecosystem will generate, at some point, an income for the Government through taxation. It’s just not the moment to think about directly charging for data.

 Q3 What would be an appropriate timescale for reviewing a PDC or its constituent parts public task(s)?

I would recommend an ongoing review to be held no more than every 7-8 months, no less than every 18 months.

Standard
gov, open data

Making Open Data Real, episode 1: the gathering

Not everyday you get the opportunity to attend an event at Cabinet Office. Moreover, not everyday they’re inviting you to an event you actually care about. Hence, here I am at 22 Whitehall for a discussion about the Open Data Consultation with their Transparency Team.

The people attending this kind of events usually belong to the following tribes:

  • developers want data to be released as quick as possible and have in mind possible applications/visualisations/uses of the data; tend not to care much about the legal implications
  • openness campaigners push for data to be released no matter whether they can be useful or not; their only concern is transparency (“you’ve got nothing to hide, right?”)
  • privacy campaigners are not necessarily against data release, but are over-worried about big-brotheresque implications (where Big Brother is in this case your car insurer, rather than the Government)
  • policymakers which is a cool description of the average Civil Servant involved in this: they support data to be released with moderation, and are usually worried. And they don’t know about what.

Such a diverse gathering incurs easily in the risk of over-generalising the discussion, which is technically what has happened. However, I guess this was exactly the goal of the Cabinet Office Transparency Team: see how these different people tend to perceive the Open Data issue, and what common grounds can be found. Necessarily, such common grounds are generalistic and tend to involve a discussion about fears, hopes, effects of data releases, and what they want from each other.

The workshops was pretty much interactive and helped each person interact with others and get in contact with, sometimes, a completely different point of view. Evidently there are many hopes about Open Data: that it can be better, quicker, machine readable, and most importantly linked. Many people attending the workshop also stressed they would like the process of data release to be more transparent. Also some fears were made explicit, especially about the possibility of low-quality, meaningless, data being released.

I think, however, that the two most important points made in this respect were

  • sustainability of the data infrastructure: we don’t want Open Data to be released and go offline the day after because the server is engulfed by excessive demand; sustainability also in the sense that we want the agency releasing the data defining a process for updating the data.
  • engagement: the agency releasing Open Data needs to set up a way to interact with developers and campaigners to respond to their queries about the data released, and possibly some kind of “customer service” structure.

I strongly believe these two points to be the key to make the Open Data movement successful and I was frankly surprised of hearing someone dismissing them as “we just want the data”. Although I agree that some data is better than no data, we shouldn’t be driving the system to the frustrating situation in which we can’t affect the Open Data release process because such process hasn’t been defined properly. Moreover, although sometimes low-quality data is acceptable if there’s no alternative, I wouldn’t push the agencies to release data whose quality hasn’t been assessed: we don’t want to drive the whole quality down.

I fully understand that in the view of the Government and of some campaigners Open Data release can be a way to deal with Freedom Of Information requests in a more automatic way, and this surely means that data must be released as and when available. However, we have a historical chance to define the way data should be made public and what kind of added value we expect from them. This is an opportunity not to be missed.

Some interesting points were made when discussing what to expect from the Government and from the other actors. For example, the idea of re-sharing seems to be finally part of the common culture of data: most users are ready to be both users and consumers of open data, and push for everyone to make their data available. These data can be, in turn, derivative data from the original agency: a process that can enrich and empower the final users.

I do not particularly agree with those saying that Government should set the data release and step out of the game: I think that there is a need for a central assessment of the quality of data in order to avoid “crap data” to become mainstream and I can’t see many alternatives to a central agency, as Ofcom is for communications or Ofsted for education. What the Government needs to do is to make such procedures simple, to help other actors to release Open Data with an easy legislation, and to extend access to procurement for SMEs who currently struggle to satisfy the financial requirements even though they might offer better services than bigger companies. I believe that the Government should maintain its regulatory powers in this context in order to make data more relevant, accessible, democratic, genuinely open.

There is some concern about privacy, of course. One of the main point is that once you start releasing data you don’t know how these will be used and by who. Worringly, data don’t need to be directly referring to a person to identify them. Identification is not a binary function. The classical example is how a car insurance company (yes, I pick on them easily!) can alter its prices after analysing crime rates data. This is something they couldn’t do before. In a way, where I live now identifies me strongly than before, and the car insurer can amend their behaviour towards me because of Open Data although they don’t have perfectly identifiable information about me.

Should this prevent crime data to be released? I don’t think so. I would rather call for more regulations and for punishing this kind of behaviour, but I also think this concern shouldn’t be part of the Open Data movement: we only need to care about transparency and, in my case, efficiency of the systems that will be used to release the data. Concerns about privacy need to be addressed, but abuse of data is a widespread problem that does not affect only the Open Data context, so it should be tackled by another, more general, task-force.

I will be commenting about the points of the Open Data Consultation in a following post. For the time being, I would recommend reading what Chris Taggart has written about his response to the ODC.

Standard
geo, gov, Web 2.0

Free data: utility, risks, opportunities

Some random thoughts after The possibilities of real-time data event at the City Hall.

Free your location: you’re already being photographed
I was not surprised to hear the typical objection (or rant, if you don’t mind) of institutions’ representative when requested to release data: “We must comply with the Data Protection Act!“. Although this is technically true, I’d like to remind these bureaucrats that in the UK being portraited by a photographer in a public place is legal. In other words, if I’m in Piccadilly Circus and someone wants to take a portrait of me, and possibly use it for profit, he is legally allowed to do so without my authorization.
Hence, if we’re talking about releasing Oyster data, I can’t really see bigger problems than those related to photographs: where Oyster data makes it public where you are and, possibly, when, a photograph might give insight to where you are and what you are doing. I think that where+what is intrinsically more dangerous (and misleading, in most cases) than where+when, so what’s the fuss about?

Free our data: you will benefit from it!
Bryan Sivak, Chief Technology Officer of Washington DC (yes, they have a CTO!), has clearly shown it with an impressive talk: freeing public data improves service level and saves public money. This is a powerful concept: if an institution releases data, developers and business will start creating enterprises and applications over it. But more importantly, the institution itself will benefit from better accessibility, data standards, and fresh policies. That’s why the OCTO has released data and facilitated competition by offering money prizes to developers: the government gets expertise and new ways of looking at data in return for technological free speech. It’s something the UK (local) government should seriously consider.

Free your comments: the case for partnerships between companies and users
Jonathan Raper, our Twitter’s @MadProf, is sure that partnerships between companies and users will become more and more popular. Companies, in his view, will let the cloud generate and manage a flow of information about their services and possibly integrate it in their reputation management strategy.
I wouldn’t be too optimistic, though. Albeit it’s true that many longsighted companies have started engaging with the cloud and welcome autonomous, independently run, twitter service updates, most of them will try to dismiss any reference to bad service. There are also issues with data covered by licenses (see the case of FootyTweets).
I don’t know why I keep thinking about trains as an example, but would you really think that, say, Thameslink would welcome the cloud twitting about constant delays on their Luton services? Not to mention the fact that NationalRail forced a developer to stop offering a free iPhone application with train schedules – to start selling their own, non free (yes, charging £4.99 for data you can get from their own mobile web-site for free, with the same ease of use, is indeed a stupid commercial strategy).

Ain’t it beautiful, that thing?
We’ve seen many fascinating visualization of free data, both real-time and not. Some of these require a lot of work to develop. But are they useful? What I wonder is not just if they carry any commercial utility, but if they can actually be useful to people, by improving their life experience. I have no doubt, for example, that itoworld‘s visualization of transport data, and especially those about Congestion Charging, are a great tool to let people understand policies and authorities make better planning. But I’m not sure that MIT SenseLab’s graphs of phone calls during the World Cup Final, despite being beautiful to see, funny to think about, and technically accurate, may bring any improvement to user experience. (Well, this may be the general difference between commercial and academic initiative – but I believe this applies more generally, in the area of data visualization).

Unorthodox uses of locative technologies
MIT Senselab‘s Carlo Ratti used gsm cell association data to approximate people density in streets. This is an interesting use of technology. Nonetheless, unorthodox uses of technologies, especially locative technologies, must be taken carefully. Think about using the same technique to calculate road traffic density: you would have to consider single and multiple occupancy vehicles, where this can have different meanings on city roads and motorways. Using technology in unusual ways is fascinating and potentially useful, but the association of the appropriate technique to the right problem must be carefully gauged.

Risks of not-so-deep research
This is generally true in research, but I would say it’s getting more evident in location-based services research and commercial activities: targeting marginally interesting areas of knowledge and enterprise. Ratti’s words: “One PhD student is currently looking at the correlations between Britons and parties in Barcelona… no results yet“. Of course, this was told as a half-joke. But in many contexts, it’s still a half-truth.

Standard