geo, open data, open source

Open Addresses: a great opportunity

The privatisation of Royal Mail came with a massive Open Data defeat, having the Postcode Address File been left within the assets of the sold company. The Open Data User Group – and many others players in this space – voiced their dissent, but the decision had been taken: PAF had to go with the rest of Royal Mail.

It’s time to move on.

The newly funded Open Addresses project is a great opportunity in this context and a Symposium run by the ODI on August 8th has just reinforced my impression that some of the smartest people in the “Open” community are working incessantly to create a credible alternative to PAF (and to Ordnance Survey’s Address-point). Let’s call this OAF.

The Open Addresses Project represents a great opportunity.

First of all, it’s an opportunity to show that crowdsourcing can be as good as a top-down approach, if not better.

We’ve seen this happening with OpenStreetMap. In many parts of the world, the coverage and accuracy of OSM is not just way beyond that of commercial solutions, but it is kept constantly in check by an army of volunteers and users.

Why hasn’t OSM gone mainstream to show the power of crowdsourcing? Primarily because Average Joe doesn’t get that map means archive of location-based data points. Average Joe reads map and thinks Google Maps. “You can’t use OSM” is a common objection, and this misses the point about OSM.

OAF would incur this danger considerably less. Addresses are readily understandable by people, and there would be no confusion about what the database represents.

In an address file, location is an attribute of the address; whereas in a map, address is a attribute of the location.

There is of course another problem of OSM: it is not perceived as authoritative. I remember a discussion with a travelled contact of mine heading to India and looking for maps in constantly changing area. “Use OpenStreetMap”, I said; “it will never be good enough!”, he replied. Except that when he checked, he found out that OSM had more data than any other map available to him.

Open Addresses could definitely risk a perceived lack of authority. Advocacy will play an important role, together with case studies, independent evaluations, and early adopters, in showing that the Open Address File can become the authority in this space.

Secondly, this is an opportunity for Royal Mail and Ordnance Survey to review their practices about addresses and to improve their own products by providing users with an independent way to assess their quality. It’s maybe an opportunity for them to finally and uncontroversially understand that Openness is a way to increase their business influence – and revenues – and not a way to jeopardise it.

There are unquestionably several challenges ahead, both technical and non-technical.

The technical challenges can be easily enumerated: what is an address? What is the minimum unit of space we want to represent? Where do we stop – street level, floors, flats, units, rooms? None of these will be easy to solve, but the beauty of this project is in its process that intends to bring together addressing experts with users of addresses.

However, considerably more challenging will be the cultural problems children of the monopoly of Royal Mail and Ordnance Survey in this space, the myths about what PAF could deliver and the reliance on restrictively licensed products that might hinder the smooth transition to an open product.

Let me mention some of these cultural challenges:

  • PAF was built as a way to allow postmen to deliver post; it’s a collection of delivery points, not a way to identify buildings, houses, or business premises, although some users have come up to intend it this way; OAF will need to re-state its goals and make them easy to understand for its users
  • PAF can be authoritative in the minds of its users, but it’s not as accurate as they generally believe: duplications and errors are rather common; OAF has the opportunity to be more accurate and have more coverage and will need to be up to this for at least a number of clear use cases. Let me quote ODUG’s response to the PAF consultation:

Royal Mail usually states the completeness of PAF (the principal measure of quality) as being in excess of 98%. However, as Royal Mail determines what a delivery point is, no external body can identify missing delivery points to confirm that measure.

  • evidently, there is no benchmark to assess the quality of PAF; OAF will need to be built in a way that makes this assessment possible and desirable to its users
  • feedback loops will need to be clear, i.e. how to allow third parties to add addresses into OAF
  • the NLPG is often hailed as a solution, when in fact is just another face of this problem, coming with restrictive licensing

All of these issues are difficult, but not unsolvable. I believe that early adopters will play an important part in advocating this new data product, showing that it’s not just as good as the existing commercial not-really-open solutions, but better in terms of reliability, accuracy, coverage.

Most use cases of an address file revolve around the function of address lookup. As such, they come with a great feature: they immediately detect if an address is missing or incorrect. Feedback will play a relevant part in the process envisioned to build Open Addresses.

Hence, let me close with an appeal: if you use addresses in your business, for your job, for marketing purposes, keep an eye on this project and start building your services in a way that allows you to use an Open Addresses file.

Standard
gov, open data

In response to comments to ODUG’s GP benefits case

I would like to respond to Owen Boswarva’s comments about ODUG’s GP benefits case.

Most of his comments are appropriate, but I think – and I say this very respectfully – that they are slightly missing the point by taking in isolation a series of remarks that, in ODUG’s view, make sense when viewed as a whole.

The short version is: I think that a single and authoritative GP dataset is needed because
1 – there is user demand for this dataset (several data requests on data.gov.uk)
2 – it would bring improved accuracy and quality to existing datasets
3 – it would streamline an already existing process, unifying the procedures of several entities
4 – it is a natural candidate for the National Information Infrastructure.

Some direct comments to Owen’s points follow.

Those datasets are all reusable under the Open Government Licence, i.e. they are open data.

Some datasets are OGL, but the HSCIC, currently the most complete of the datasets provided, specifically does not permit the use of data “for the purpose of promoting commercial products or services to the public”.
Following from my three points above, the goal of the benefits case is to generate a single dataset under the same licence, and from our demand-led point of view we can’t but ask for a more open licensing.

The same applies to Owen’s point number 3:

ODUG maintains that the GP practices data on the HSCIC site is not open data, and points to a page about “responsibilities in using the ODS data”. However HSCIC has recorded that dataset (EGPCUR) on Data.gov.uk as reusable under the OGL. (The ODS “responsibilities” page seems to written for NHS users. A literal reading only permits use of the data in connection with NHS-related activities, which is obviously not the actual licensing position.)

We are asking for clarity. If explaining all of this requires several searches and a blog post, I believe we are in the right asking for an improvement to the current situation.

The ODUG criticises the NHS Choices dataset as follows:

 

“the branding of the NHS Choices dataset as a ‘Freedom Of Information’ dataset is troubling from an Open Data perspective, mainly for is “on demand” nature: a FOI data release, being a reactive response to a request, does not establish an ongoing process; while data release under an Open licence often comes proactively from the publishing entity, which in doing so creates a sustainable data update procedure”.

 

I think this is rather over the top. NHS Choices hasn’t “branded” the data as a FOI dataset. It has merely made it available, along with a number of other useful data files, in the FOI section of its site. It would be nice if the NHS Choices site also had a dedicated open data landing page. However it’s perfectly sensible to draw users’ attention to existing datasets that they may want to know about before submitting a FOI request. NHS Choices says the data files are updated daily, so they are clearly not being published as a “reactive response” to FOI requests.

I’m sorry if Owen takes offence at the wording of this, we are not critical of NHS Choices. Our engagement on this topic with NHS England, moreover, has been totally collaborative and so far positive.

Our point can only be understood by remembering that it’s in ODUG’s DNA to approach Open Data from the user’s point of view. FOI and Open Data cover different aspects of the transparency agenda – both aspects are immensely important but come with different expectations from the users. Releasing Open Data under a FOI portal is confusing and, my personal opinion, semantically incorrect. “Branding” here is used in this sense, it doesn’t intend to be controversial.

Owen’s point that it’s good to attract users to existing datasets before they submit a FOI request is absolutely spot on and I totally agree with it. ODUG is not against a better engagement between FOI and Open Data, actually the opposite. It’s just that in this specific case we question the user experience.

There’s nothing wrong with arguing that existing datasets could be made more useful by improving the quality, or updating them more frequently, or appending data from other sources.

That’s the point of this benefits case. Showing the way for an improved quality of the dataset.

But we can have those arguments about most of the nation’s information infrastructure. A dataset doesn’t need to be ideal to be authoritative in practice.

The HSCIC and NHS Choices datasets are produced by the relevant official body, they are in wide use, and there are currently no better equivalents. The datasets are therefore, on the face of it, authoritative.

We need to start somewhere :)

“Authoritative” goes in conjunction with “to whom”. Yes, they are authoritative to the relevant official body; they are not perceived as sufficiently authoritative by the end users we aim to represent. We wish to make sure that the final authoritative dataset is a sum of the accuracies of the several datasets currently existing.

ODUG proposes that DoH establishes “an ongoing process to build, update and maintain on data.gov.uk an authoritative dataset of medical practices and operating practitioners, drawing on the datasets made available by HSCIC and NHS Choices”.

 

I’m not sure how ODUG expects DoH to build an authoritative dataset by drawing on datasets it has dismissed as non-authoritative. ODUG’s call is to DoH, but in practice DoH would surely delegate any such new process to HSCIC. So what is ODUG proposing HSCIC should do differently?

The datasets has not been dismissed as non-authoritative. What we are arguing is that the three datasets are only partially overlapping, and this is not controversial. We are calling for an “official” process of aggregation of such datasets so that it can be referred to as the authoritative source.

So the question is not what DoH or HSCIC could do differently; it’s about collaboration. The best outcome, the most accurate and reliable dataset would come from an integration of the various processes that collect such data, each of which aims at different objectives and follows different procedures. Collaboration and integration would create efficiency and build capacity in the bodies involved. It would be a win-win situation.

Maintaining the new dataset on Data.gov.uk is also unlikely to add credibility, given the current state of the DGU catalogue and other functionality. HSCIC already has its own platforms and they seem serviceable for the publication of data. What in the ODUG proposal requires the involvement of Data.gov.uk?

DGU has certainly some problems, but in our view we need to ask for more. We want DGU to be the index – not necessarily, in all cases, the repository – of all public open data. Hence, it makes sense for us to argue for having these datasets on DGU. Whether they end up being hosted on it, or just linked from it, it’s not for us to decide: we are confident that initiating the process of data integration is more important, and once the procedure has been identified, linking to or hosting the dataset will just come as a due consequence.

I’ve never been entirely on board with the idea of submitting “benefits cases” for release of open data, because it seems to conflict with the principle of “open by default”.

Why see this as contrasting goals? There is of course an ongoing struggle between the ideal situation (everything open by default) and the practicalities of actually establishing procedures for the release of datasets. We use benefits cases to priorities such releases or, in this case, to make sure the data is aggregated in a way that meets users’ requests.

In this instance ODUG seems to be arguing for creation of a new data product, combining the existing HSCIC/NHS Choices datasets with data from other sources such as GMC’s Medical Register and patient acceptance criteria for each GP practice.

 

That last source in particular would probably involve quite a bit of ongoing administration and processing, as patient acceptance criteria are not held centrally or in a standard format.

The fact data are not held centrally doesn’t mean that the data do not exist or that they should not be, in some form, available to the public. Electoral datasets are a clear example of this. Whether it’s new data, or newly collected data it doesn’t make a massive difference in our view of demand-led prioritisation of data releases.

Arguing for release of existing data is one thing. Arguing for the creation of new data products and new processes is something more.

 

I have no doubt there is room for improvement in the existing open data that HSCIC publishes on GP and dental practices. However public datasets are mainly produced to support a public task. I will be surprised if DoH takes up these ODUG recommendations without a more detailed demonstration of why the existing data and processes are inadequate to meet the requirements of the agencies and public bodies it supports.

Not even the official terms of reference ask ODUG to stop at simply arguing for the release of existing data; moreover, we aim at sustainable and efficient data releases. Sustainable data releases require an established process; where this process is already established, we can help identifying inefficiencies and overlaps and act as a bridge between the several organisations involved. We believe that a new dataset, collected in the way discussed, would greatly improve the public tasks of the agencies involved; preliminary discussions with representatives of such organisations suggested the conversations between them would benefit from a single data source.

My last point: we are contributing to the definition of what the NII should be. A GP dataset is for many reasons an obvious candidate for the NII. Given we are at this stage, and given the possibly great impact the NII might have on public tasks, it is in my opinion the right time to argue for a review of the way the GP data is collected and made available, and achieving results in this area would be an encouraging blueprint to follow in other contexts.

Standard
open data

On the open ended nature of Openness

Some days ago I was joking with a friend of making t-shirts with a “Open Data is my mission” slogan. The problem of that mission is that its object is not particularly well defined.

I was involved in a couple of interesting discussions via Twitter about this, with a couple of people whose opinion I really value. On the day TfL announced their new api.tfl.gov.uk I tweeted my happiness about their Open Data licence. My happiness was not shared by Adrian Short:

adrianshort

Adrian suggested their API was all but an open one; that as it had restrictions, especially the requirement to register for a key, it could not be linked to the adjective “Open”. (The whole conversation can be accessed here).

In a similar direction went a quick exchange with Aral Balkan:

aral

These two conversations highlight a curious problem in the “Open” communities, whether they are -Source, -Data, -Whatever: we’re talking about some very loosely defined concepts. As I say in my response to Aral: Open Data is just a phrase – what really matters is the licence attached to the data.

Openness is measured on a continuous scale. If there is a threshold below which we shouldn’t call some data “open”, that threshold has not been defined yet. It’s relative (to the data, to the context, to the country, to the user), it’s flexible, it’s got several possible meanings.

My personal position is to call open data whatever comes with no use restrictions (i.e.: you can use the data for whatever purpose you like). In legal terms, however, this gets complicated because we need to assign a licence to the data. When working with ODUG, for example, I always make a point of not accepting data releases with anything less than an Open Government Licence (or its Creative Commons / Open Database Licence equivalents).

Furthermore, in the not-so-public sector (which is what I generally call TfL), things are clearly complex, especially given the expectation (which I do not personally agree with, but this has no effect to this discussion) that a transport agency in a metropolis should be profit-making. TfL’s licence is probably not the best worded ever, but it is an Open Licence:

tfl

Yes, as Adrian notes, TfL can revise the Licence at any point. But until they do, they allow to copy, adapt, exploit the information with only requirement that of attribution. This is not much different from OGL.

Does the requirement to sign up for an API key justify the critique? This is clearly a complication that comes from the real-time nature of this data. A system with such a huge amount of data generated in a short time needs provisioning, and the best provisioning comes from knowing how many users can access the system. In this case I don’t think that having to register for a key is affecting the openness of the data because there is no restriction on who can register. Of course an improvement would be to have the possibility of anonymous registrations and I would support this; however, the SLA might still give priority to users who are not anonymous, simply because it knows more about their requirements. Openness is a compromise, one that comes from opposing needs clashing.

The non-real time datasets could be distributed without registration, this is where I agree with Adrian, but I don’t think this can justify the negativity against this data release, a step that goes in the right direction. Does anyone want to bring this up with TfL?

On a similar note, Aral initiated a somewhat long and inflamed thread about a similar issue: the use of Open Data and the expectation that from something open should descend something open. In this case the focus was on my friends at @transportapi, whom I think are doing a great job of showing how Open Data can create business.

transportapi

(Full conversation here).

Some interesting questions emerge from the thread:

  • Aral Balkan: “How’s this not closing off open data via a proprietary system only to license it commercially via an API?”
  • Emer Coleman: “We are DaaS provider. 1,000 hits a day for free then charge per hit with SLA’s once exceeded but also don’t have any IP on downstream products or services and more open licensing”.

The thread goes on and on with similarly opposing views. The question emerging is one: is there a (moral) obligation for Open Data users to be as open as the starting data?

I will take the “risk” of being seen as an Open Data moderate: my view is that this question doesn’t have a straight answer, it all depends on the level of maturity of the Open Data movement in that specific context and the product. Once again, as in TfL’s case, we’re talking about a relevant amount of real-time data. In this specific case, the data is heavily modified by Transport API to make it cleaner. It is a relevant chunk of work. It would be unsustainable to provide it for free and without registration the service level would soon degrade. Hence, once again we need a compromise. Building sustainable businesses on top of Open Data is still something new. But sticking to the legal: the licence does not place limitations on the use of the data. This can be open enough for some and not for others. “Open Data is a broad church”, says Jonathan Raper in the same thread. Sustainable Open Data-powered businesses create a virtuous circle that encourages more data releases, and I think we should welcome it.

One final note: we should probably stop capitalising the words “open data” and accept that multiple views will always be possible. Once again, open data is a compromise, as this debate shows. By keeping it on we can make that compromise produce useful results and the openness agenda advance.

Standard
gov, hackday, my projects, open data, Work in IT

chaMPion

I never thought hackdays could be so much fun that I would end up attending not just one but two in about ten days, getting flu in between. Oh, and that my team would end up winning the overall Best in show award over 27 other hacks and almost 100 people! Which is what this blog post is about…

First of all: credit where credit is due

The folks from Rewired State deserve a massive thank you for setting up such events, and for showing me that no matter the age and background of the people you work with there is room for great results bacause geeks more than often work well together, in teams, despite what stereotypes like to say.

ChaMPion: what is it?

The idea behind chaMPion is rather simple: you want to find MPs who care about what you care. Often their “declared interests” are not particularly meaningful or up to date, so we decided we would mine the content of their speeches.

ChaMPion is a tool that allows the user to enter a given topic and returns a list of MPs who have spoken about that topic, ranked by relevance.

How does it work?

In easy steps:

  1. we downloaded the extract of the Commons debates for all the sessions of Parliament since the first sitting in May 2010 following the General Election to the latest in November 2012
  2. we parsed these extract and aggregated the speeches by MP – as a result we obtained a map associating any given MP to all of his or her speeches
  3. for each MP we run an algorithm that calculates their keywords distribution; specifically we used Topia.Termextract which, given a text, determines its important terms and their strength
  4. we calculated a ratio for each word over the total of terms extracted for that MP and used this as a basis for our rank
  5. we built an API that searches by keyword and a captivating UI that displays the results graphically, together with other data for the MP and his or her constituency harvested from other sources.

Did you find anything interesting?

Yes! For example, if you search for phone the winner is Tom Watson; if you search for rape, it’s Caroline Flint.

Why didn’t you use X, Y, Z?

YES, you are right, this is not perfect, but it was meant to be just a quick hack that received much more interest than we were anticipating :)

For example, using Topia.Termextractor was not my first choice. For a semantic analysis of this kind a beautiful mathematical tool called Latent Dirichlet Allocation (LDA) is generally the natural choice. LDA runs a statistical analysis over a corpus of text, assuming that a document is about a collection of topics. It then returns the distribution of such topics. It’s not difficult to understand. For example, it might say that a speech by Tom Watson is 30% about phones, 30% about news and 40% about crime.

Unfortunately, I didn’t manage to find a library for LDA that worked on my laptop.

Will you keep developing it?

Given we received some pretty heart-warming feedback the answer is yes. For example, I’m going to try and find (or develop) an LDA library to have finally a proper topic model.

We also plan to introduce more statistics, possibly at a single MP level, and to try and work out a temporal component as well, in order to display how interests change over time. This might not make sense for all the MPs, as most of them will give a speech very rarely, but there is certainly a subset for which this analysis is meaningful.

Starting next week, the website will be updating with data from the coming sittings.

Code?

The code for this hack is all on my GitHub account. Feel free to download it, modify it, run your services on top of it. I’ll keep uploading changes and the most recent stable version will always be found running at http://www.champion.puntofisso.net. Feedback is also very welcome, but beware that the code is very dirty until I manage to tidy it up a little. Requests for functionality are encouraged and will be considered :)

Another round of thanks

To wrap up, I gave Mark the input of “look there’s an interesting hackday” but I will never thank him enough for actually taking me seriously, setting up one of the best teams ever, and facilitating our conversations and work. Lewis has been a great partner in crime, giving his best on a simple but effective UI which has certainly been überimportant in conveying our idea and let us win.

Sharon has provided invaluable knowledge of the works of the Parliament and some incredibly good mock-ups of the final interface, while Hadley has helped with a great understanding of the datasets.

Together with our chats with Glyn, Sheila and Brett, we had some good fun discussing ideas and saving ourselves the burden of having to go through a set of certainly wrong hacks during the day.

A big recommendation: Cards Against Humanity is the best team-building tool ever conceived.

Standard
gov, open data, policy

20 UkGovCamp thoughts

1. One of the best camps I’ve attended recently
2. I want @davebriggs’ shirt
3. “Only thing that makes you special is tax payer funding” is the most stupid thing I’ve read
4. “We are special because we’re here on a Saturday, in our own time, trying to solve the taxpayer’s problems” is the best reply I managed to give
5. So many local government officers around is a very good sign
6. So little councillors/politicians around is a not so very good sign
7. Tree macro-areas for discussion and action: Transparency, Participatory democracy, Data geekism
8. I’ve finally met Baskers!
9. In the LocalGov/PublicSector communities I know more people than I thought
10. Social Media strategy evaluation is difficult (not just in the public sector): how can you evaluate a conversation?
11. Defining the goals of that strategy is the most interesting part – and the outcome of that evaluation is not necessarily a number
12. Kudos to @LinkedGov and @danpaulsmith for an amazing service and session showing how Linked Data can become interesting and useful to everyone
13. UkGovCamp is political, but it involves people with very different ideological backgrounds
14. I’d like to see more people from the public sector taking part to this: many problems are similar, as it is resource availability
15. Wow, I can present myself to an audience with a microphone. I used to go piping red doing that!
16. Never catch flu the last day of such a great event
17. Next time: get speaker/organiser’s name on the agenda. It helps identifying it.
18. I might accept @the_anke’s offer of conversations in German, next time ;-)
19. Open Data is great but we need to define what it is, how to share it, and how to get people engaged.
20. Government (GDS) involvement is a great and exciting thing, but open data (and the movement) will succeed only with citizens/developer/activists maintaining ownership of action

Standard
gov, open data, policy, Web 2.0

Making Open Data Real, episode 3: the corporation

I have submitted my views to the Public Data Corporation consultation. Here are the answers.

Charging

Q1 How do you think Government should best balance its objectives around increasing access to data and providing more freely available data for re-use year on year within the constraints of affordability? Please provide evidence to support your answer where possible.

 I strongly believe that the Government should do its best to keep free as much as data it’s possible. In all honesty, I believe that all data should be kept free as there are two possible situations:

– data are already available, or refer to processes that already produce data, in which case the cost of publishing can be kept relatively low;

– data are not available, in which case one should ask why this dataset is required.

In the second case, I would suggest that the agency releasing such dataset could gain in efficiency, justifying the release of the data for free to the public.

There is also a consideration of what a data-based business model should look like. I think companies and individuals using public data as a basis for their business are finding it very hard to generate ongoing profit based on data only. Which brings me to the idea that charging for such data might actually make such companies lose their interest in using them, with a loss of business and service to the community. 

A good example to this point is represented by real-time transport-related mobile apps: they provide, often for a price that is very low, an invaluable service to the public. These are data that are already available to some agencies, as they are generated by a process of driving the transport business to higher efficiency and effectiveness by knowing the location of the transport agents (buses, trains, etc…). Although in some cases this requires costs for servers to support a high demand, in absolute and relative terms we are talking about limited resources. Such limited resources create a great service to the public, effectiveness for the transport company, and possibly some profit for the entity releasing the software. The wider benefit of the release of these data for free is much more important than the recovery of costs through a charge. That’s why I question in first place the need for a Public Data Corporation, if its goal is just that of charging for access to data.

 Q2 Are there particular datasets or information that you believe would create particular economic or social benefits if they were available free for use and re-use? Who would these benefit and how? Please provide evidence to support your answer where possible.

 Surely, transport and location based datasets are the most important: they allow careful planning by the public and, as a result, a more efficient society. But I would not talk about specific datasets. I would rather suggest the Government to have an ongoing relationship with the data community: hear what developers, activists, volunteers, charities ask for, and see if such requests can be satisfied by issuing a dataset appropriately.

Q3 What do you think the impacts of the three options would be for you and/or other groups outlined above? Please provide evidence to support your answer where possible.

 As I outlined in Question 1, I think data should be kept free. Hence, the best option is Option 1, provided that there is a genuine commitment to release more data for free. As I said the real question is whether data are available or not. When data are available, publishing and managing their update is a marginal cost to the initial process. When data are not available, the focus should be moved to understanding whether their publication can improve ongoing processes.

The freemium model works in the assumption that there is a big gap in the provision of a basic version of the data with respect to a more advanced service. I do not believe that this assumption holds for most of the datasets in the public domain.

Q4 A further variation of any of the options could be to encourage PDC and its constituent parts to make better use of the flexibility to develop commercial data products and services outside of their public task. What do you think the impacts of this might be?

I think that organisations involved in the PDC should keep to their public task. 

The risk in letting them develop commercial data product outside the public task is that the quality of the free portion of the data would plummet.

Q5 Are there any alternative options that might balance Government’s objectives which are not covered here? Please provide details and evidence to support your response where possible. 

I cannot see any other viable alternative, unless we consider the very unpopular idea of asking the developers for part of their profit, if any, in a way that shadows the mobile apps market. However, I think that the overhead in doing so is not worth setting up such a system.

 

Licensing

Q1  To what extent do you agree that there should be greater consistency, clarity and simplicity in the licensing regime adopted by a PDC? 

I think that realistically developers and other people interested in getting access to public data want to have clear and simple terms and conditions. I am not a legal expert and cannot possibly comment on the content of such licensing regime, but I would like it to be clear, short, and understandable to people who are not lawyers. The Open Government License, and any Creative Commons derivative, is a good example.

Q2  To what extent do you think each of the options set out would address those issues (or any others)? Please provide evidence to support your comments where possible.

Once again, I would like to stress the fact that the Open Government Licence is the ideal licence for any open-data. This would suit Option 3: creating a single PDC licence agreement, with a simple, clear, short licence to cover all situations. Option 2, an overarching PDC licence agreement that groups all commonalities of a number of licence, is possibly a second best, but it comes with a great risk of lack of simplicity, and confusion.

Option 1, a use-based portfolio of standard licences, would possible make sense in terms of clarity, but it complicates greatly the management of legal issue for the licensees. The consultation highlights that “rights and associated charges [would be] tailored to specific markets”, making it very difficult to understand such licences.

Naturally, if these licences need to be more restrictive than the Open Government Licence, I still think that a single restrictive licence, on the model of what the State of Queensland in Australia has done, would be the best idea for maintaining clarity and simplicity.

Q3 What do you think the advantages and disadvantages of each of the options would be? Please provide evidence to support your comments

It’s very hard to tell at this stage, but I think that overcomplicated licences would greatly slow down access to the data and, consequently, delay the development of services to the community and the possibility of creating sustainable business. That’s why my choice goes to a single PDC licence agreement, possibly the Open Government Licence itself, in order to get services quickly developed and available. 

 Q4 Will the benefits of changing the models from those in use across Government outweigh the impacts of taking out new or replacement licences?

I reckon there will be situations in which changing the models will have a positive impact as well as some cases in which there will be a local negative impact. We need to look at the overall benefit to society.

 

Oversight

Q1  To what extent is the current regulatory environment appropriate to deliver the vision for a PDC?

I would say the current regulatory environment is appropriate and ready to deliver the vision for a PDC, having already produced a very effective OGL. The problem is not in delivering the PDC, it is rather in questioning the need for the corporation tout-court.

 Q2 Are there any additional oversight activities needed to deliver the vision for a PDC and if so what are they?

 The only oversight activity needed at this stage is a deep analysis questioning the need for a PDC. I would strongly recommend to question the need for charging and using licences other than the OGL. A PDC charging for data risks to destroy the thriving open data ecosystem and deprive the community of great services. The development of a rich ecosystem will generate, at some point, an income for the Government through taxation. It’s just not the moment to think about directly charging for data.

 Q3 What would be an appropriate timescale for reviewing a PDC or its constituent parts public task(s)?

I would recommend an ongoing review to be held no more than every 7-8 months, no less than every 18 months.

Standard
gov, open data, open source, policy, Web 2.0

Making Open Data Real, episode 2: the consultation

This is my response to the Open Data Consultation run by Cabinet office:

My name is Giuseppe Sollazzo and I work as a Senior Systems Analyst at St. George’s, University of London, dealing with projects both as a consumer and a producer of Open Data. In one previous job, I was dealing with clinical data bases so I would say I developed a certain feeling for issues around the topic of this consultation both from a technical and policy-based perspective.

 

An enhanced right to data

I believe this is the crucial point of the consultation: the Government and the Open Data community need to work side by side in developing a culture that fosters openness in data. The consultation asks specifically what can be done to ensure that Open Data standards are embedded in new ICT contracts and I think three important points need to be made:

1) independent consultants/advisors need to be taken on board of new ICT projects when the tendering process is started; such consultants need to be recognised leaders of the Open Data community and their presence should ensure the project has enough drive in its Open Data aspects.

2) Open Source solutions need to be favoured over proprietary software. There are Open Source alternatives to virtually any software package. Should not this be available, a project should be initiated to develop such a solution in-house with an Open Source licence. Albeit not always free, Open Source solutions will offer a standard solution for a lower price, and will create possibilities for resource-sharing and business creation.

3) ICT procurement needs to be made easier. Current focus of ICT procurement in the public sector is mostly on the financial stability of the contractor. I argue it should rather be on reliability and effectiveness of the solution proposed. Concentrating the focus on financial stability is a serious mistake, mainly caused by the fact that contractors will develop proprietary solutions; a bankruptcy becomes a terrible risk because of the closedness of the solution; because no other company would be able to take it where the former contractor left; hence the need of strict financial requirements in the tenders. I object to this. In my view, relaxing the financial requirements and moving the focus to the quality of the solution, its openness, its capability to create an ecosystem and be shared, its compatibility with open standards, will improve the overall effectiveness of any ICT solution. Moreover, should the main contractor go bankrupt, someone else will be able to take their place, provided the solution was developed according in the way I envision: consequently, no need for strict financial requirements.

 

Setting Open Data Standards

As I have already stressed in the previous paragraph, the Government will need to change its rules of access to ICT procurement. Refocusing the attention to openness, standards, ability to re-share the software, is the way to go to start setting a new model in the Open Data area. Web standards can be used and they can represent an example to follow to create new data standards. Community recognised leader can help in this process.

 

Corporate and personal responsibility

It is absolutely important that common sense rules are established and make into law. The goal of this is not to slow operations down, but to ensure that the right to data mentioned earlier on is actually enforced.

The consultation asks explicitly how to ensure the commitment to Open Data by public sector bodies. I believe that, despite many people feeling that the Government should “stay away”, there is a strong need for smart, effective regulation in this area. Think about the Data Protection and the Freedom of Information Act. Current legislation requires many public bodies to deal with data-sensitive operations, and most do so by having a Data Protection Officer and a Freedom of Information Officer. I believe that an Open Data Officer should operate in conjunctions with these two, and that this would not require many more resources than already allocated. The Open Data Officer should drive the publication of data, and inspire the institution they work for to embrace the Open Data culture.

The Government should devolve its regulatory powers in this area to an independent authority to be established to deal with such regulatory issues. I envision the creation of Ofdata on the model of Ofcom for communication and Ofsted for education.

 

Meaningful Open Data

A lot of discussions have been going on about the issue of data quality. Surely, the whole community aims for data to be informative, high-quality, meaningful and complete. Unfortunately, especially at the beginning of the process, this is hard to reach.

I think that lack of quality should never be a reason for publication to be withheld: where data is available, it should be published. However, I also believe that quality is important and that is why the Government should publish datasets in conjunction with a statistical analysis and independent review (maybe run by the authority I introduced in the previous paragraph) that assesses the quality of the dataset. This should serve two goals: firstly, it would allow open data consumers to deal with error and interpretation of data; secondly, it would help the open data producer to investigate problems in the process leading to the publication and setting goals in its open data strategy.

The final outcome of this publish-and-assess procedure would be a refined publication process that informs the consumers and the public about what to expect. Setting a frequency of update should be part of this process. Polishing the data should not: data should always be made available as it is, and if deemed low quality it should be improved at the next iteration.

There are questions about how to prioritise the publication of data. I believe that in this respect, and without missing the requirements of the FoIA, the only prioritisation strategy should be requests numbers: the more a dataset the public requests, the higher priority it should be given in being published, improved, updated.

 

Government sets the example

I think the Government is doing already a good job with this Open Data consultations, and I hope it will be able to take the lessons learnt and develop legislation accordingly.

Unfortunately, in many areas of the public sector there is still a “no-culture” responsible for data not to be released, Freedom of Information requests going unanswered, and general hostility towards transparency. I have heard a FoI officer commenting “this is stuff for nerds, we don’t need to satisfy this kind of need” to Open Data requests. This is a terrible cultural problem preventing a lot of good to be done.

I believe that the Government should set the example by reviewing and refining its internal procedures for the release of data and responding to FoI requests in a more simple, compassionate way, stressing collaboration with the requestor rather than antagonism.

Moreover, it should be the Government’s mission to organise workshops and meetings with Open Data stakeholders in the public sector, to try and create a deeper perception of the issues around Open Data and its benefits. Being on http://data.gov.uk should be standard for any public sector institution, and represent an assessment of their engagement with the public.

 

Innovation with Open Data

The Government can stimulate innovation in the use of Open Data in some very simple way. Surely it can speed up awards and access to funding to individuals and enterprises willing to build applications, services, and businesses around Open Data. This should apply to both for-profit and not-for-profit ventures, and have as only discriminating factor the received social benefit to their communities or to the wider public.

The most important action the Government can take to stimulate innovation is, however, simplification of bureaucracy. Making Company Law requirements easier to satisfy, as we have already discussed for ICT procurement, is vital to bring ideas to life quickly. Limiting legal liability for non-profit ventures is also a big step ahead. Funding and organising “hackathons”, barcamps, unconferences, and any other kind of sponsored moment where developers, policy makers, charities, volunteers can work together, is also a very interesting way of pushing innovation and making it happen.

 

Open data offers an amazing opportunity of creating “improvement-by-knowledge”. Informed choice, real time analysis, accurate facts, can all be part of a new way of intending democracy and innovation, and the UK can lead the way if its leaders will be able to understand the community and provide it with the appropriate rules that make its tools work and the results happen. This way, we will have a situation where services can be discussed and improved, and public bodies can have a chance to adjust their strategy; where citizens can develop their ideas, change the way they vote, take their leaders to account; and, as a result, communities can work together, and society can be improved.

Standard
gov, open data

Making Open Data Real, episode 1: the gathering

Not everyday you get the opportunity to attend an event at Cabinet Office. Moreover, not everyday they’re inviting you to an event you actually care about. Hence, here I am at 22 Whitehall for a discussion about the Open Data Consultation with their Transparency Team.

The people attending this kind of events usually belong to the following tribes:

  • developers want data to be released as quick as possible and have in mind possible applications/visualisations/uses of the data; tend not to care much about the legal implications
  • openness campaigners push for data to be released no matter whether they can be useful or not; their only concern is transparency (“you’ve got nothing to hide, right?”)
  • privacy campaigners are not necessarily against data release, but are over-worried about big-brotheresque implications (where Big Brother is in this case your car insurer, rather than the Government)
  • policymakers which is a cool description of the average Civil Servant involved in this: they support data to be released with moderation, and are usually worried. And they don’t know about what.

Such a diverse gathering incurs easily in the risk of over-generalising the discussion, which is technically what has happened. However, I guess this was exactly the goal of the Cabinet Office Transparency Team: see how these different people tend to perceive the Open Data issue, and what common grounds can be found. Necessarily, such common grounds are generalistic and tend to involve a discussion about fears, hopes, effects of data releases, and what they want from each other.

The workshops was pretty much interactive and helped each person interact with others and get in contact with, sometimes, a completely different point of view. Evidently there are many hopes about Open Data: that it can be better, quicker, machine readable, and most importantly linked. Many people attending the workshop also stressed they would like the process of data release to be more transparent. Also some fears were made explicit, especially about the possibility of low-quality, meaningless, data being released.

I think, however, that the two most important points made in this respect were

  • sustainability of the data infrastructure: we don’t want Open Data to be released and go offline the day after because the server is engulfed by excessive demand; sustainability also in the sense that we want the agency releasing the data defining a process for updating the data.
  • engagement: the agency releasing Open Data needs to set up a way to interact with developers and campaigners to respond to their queries about the data released, and possibly some kind of “customer service” structure.

I strongly believe these two points to be the key to make the Open Data movement successful and I was frankly surprised of hearing someone dismissing them as “we just want the data”. Although I agree that some data is better than no data, we shouldn’t be driving the system to the frustrating situation in which we can’t affect the Open Data release process because such process hasn’t been defined properly. Moreover, although sometimes low-quality data is acceptable if there’s no alternative, I wouldn’t push the agencies to release data whose quality hasn’t been assessed: we don’t want to drive the whole quality down.

I fully understand that in the view of the Government and of some campaigners Open Data release can be a way to deal with Freedom Of Information requests in a more automatic way, and this surely means that data must be released as and when available. However, we have a historical chance to define the way data should be made public and what kind of added value we expect from them. This is an opportunity not to be missed.

Some interesting points were made when discussing what to expect from the Government and from the other actors. For example, the idea of re-sharing seems to be finally part of the common culture of data: most users are ready to be both users and consumers of open data, and push for everyone to make their data available. These data can be, in turn, derivative data from the original agency: a process that can enrich and empower the final users.

I do not particularly agree with those saying that Government should set the data release and step out of the game: I think that there is a need for a central assessment of the quality of data in order to avoid “crap data” to become mainstream and I can’t see many alternatives to a central agency, as Ofcom is for communications or Ofsted for education. What the Government needs to do is to make such procedures simple, to help other actors to release Open Data with an easy legislation, and to extend access to procurement for SMEs who currently struggle to satisfy the financial requirements even though they might offer better services than bigger companies. I believe that the Government should maintain its regulatory powers in this context in order to make data more relevant, accessible, democratic, genuinely open.

There is some concern about privacy, of course. One of the main point is that once you start releasing data you don’t know how these will be used and by who. Worringly, data don’t need to be directly referring to a person to identify them. Identification is not a binary function. The classical example is how a car insurance company (yes, I pick on them easily!) can alter its prices after analysing crime rates data. This is something they couldn’t do before. In a way, where I live now identifies me strongly than before, and the car insurer can amend their behaviour towards me because of Open Data although they don’t have perfectly identifiable information about me.

Should this prevent crime data to be released? I don’t think so. I would rather call for more regulations and for punishing this kind of behaviour, but I also think this concern shouldn’t be part of the Open Data movement: we only need to care about transparency and, in my case, efficiency of the systems that will be used to release the data. Concerns about privacy need to be addressed, but abuse of data is a widespread problem that does not affect only the Open Data context, so it should be tackled by another, more general, task-force.

I will be commenting about the points of the Open Data Consultation in a following post. For the time being, I would recommend reading what Chris Taggart has written about his response to the ODC.

Standard
geo, gov, mobile, my projects, open data, open source, policy

Outreach and Mobile: opening institutions to their wider community

[Disclaimer: this post represents my own view and not that of my employer. As if you didn’t know that already.]

Do the words “mobile portal” appeal to you?

I have been working extensively, with a small team, to launch St. George’s University of London‘s mobile portal since last January after we decided to go down the road of a web portal rather than that of a mobile app. The reason for this choice is pretty clear: despite the big, and growing, success of mobile apps, we didn’t want to be locked in to a given platform or to waste resources on developing for more platform. Being a small institution it’s very difficult to get resources to develop on one platform, even less on multiple ones. We also wanted to reach more and more users, and a mobile portal based on open, accessible, resources made perfect sense.

As many of the London-based academic institutions, St. George’s needs to account for two different driving forces: the first is that as an internationally renowned institution it needs to approach students and researchers all over the world; the second is that being based in a popular borough it is part of the local community for which it needs to become a reference point, especially in times of crisis. Being a medical school, based in a hospital and a quality NHS health care structure, emphasizes a lot the local appeal of this institution.

This idea of St. George’s as an important local institution was one of the main drives behind our mobile portal development. We surely wanted to provide a good, alternative, service to our staff and students, by letting them access IT services when on the move. However, the idea of reaching out to people living and working around us, to get St George’s better known and integrated within its own local community, lead us to a thriving experience developing and deploying this portal. “Can we provide the people living in Tooting, Wandsworth, and even London, with communication tools to meet their needs, while developing them for people within our institution?” we asked ourselves. “Can we help people find more about their local community, give them ideas for places to go, or show them how to access local services?“.

This coalition government had among its flagship policy that of a “Big Society”, having the aim “to create a climate that empowers local people and communities”. Surely a controversial topic, nonetheless helpful to rediscover a local role for institutions like us to get them back in touch with their own local community, which in some case they had completely forgotten.

In any London borough there are hospitals, universities, schools, societies, authorities. No matter their political affiliation, if each of these could do something, they would improve massively the lives of the people living within their boundaries. Can IT be part of this idea? I think so. I believe that communication in this century can and does improve quality of life. If I can now just load my mobile portal and check for train and tube times, that will help me get home earlier and spend more time with my family. If I can look up the local shops, it will make my choices more informed. It might get me to know more local opportunities, and ultimately to get me in touch with people.

Developing this kind of service doesn’t come with no effort. It required work and technical resources. We thought that if we could do this within the boundaries of something useful to our internal users, that effort would be justified, especially if we tried to contain the costs. With this view in mind, we looked for free, open-source, solutions that we might deploy. Among many frameworks, we came across Mollyproject, a framework for the rapid development of information and service portals targeted at mobile internet devices, originally developed at Oxford University for their own mobile portal. When we tried it for the first time, it was still very unstable and could not run properly on our servers. But we found a developers community with very similar goals to ours, willing to serve their town and their institution. We decided to contribute to the development of the project. We provided documentation on how to run the Molly framework on different systems, and became contributors of code. Molly was released with its version 1 and shortly afterwards we went live.

Inter-academic collaboration has been a driving force of this project: originally developed for one single institution, with its peculiar structure and territorial diffusion, it was improved and adapted to serve different communities. The great developments in the London Open Data Store allowed us to add live transport data to the portal, letting us have enthusiastic reactions from our students, and these were soon integrated in the Molly project framework with great help from the project community. I think this is a good example of how institutions should collaborate to get services running. A joint effort can lead to a quality product, as I believe the Molly project is.

The local community is starting to use and appreciate the portal, with some great feedback received an the Wandsworth Guardian reporting about a “site launched to serve the community”. I’m personally very happy to be leading this project as it is confirming my idea that the collaborative and transparent cultures of open source and open data can lead to improved services and better relationships with people around us, all things that will benefit the institutions we work for. The work is not complete and we are trying to extend the range of services we offer to both St. George’s and external users; but what we really care and are happy about is that we’re setting an example to other institution of how localism and a mission to provide better services can meet to help build better communities.

Standard