July 2014 – Tech Blog

I would like to respond to Owen Boswarva’s comments about ODUG’s GP benefits case.

Most of his comments are appropriate, but I think – and I say this very respectfully – that they are slightly missing the point by taking in isolation a series of remarks that, in ODUG’s view, make sense when viewed as a whole.

The short version is: I think that a single and authoritative GP dataset is needed because
1 – there is user demand for this dataset (several data requests on data.gov.uk)
2 – it would bring improved accuracy and quality to existing datasets
3 – it would streamline an already existing process, unifying the procedures of several entities
4 – it is a natural candidate for the National Information Infrastructure.

Some direct comments to Owen’s points follow.

Those datasets are all reusable under the Open Government Licence, i.e. they are open data.

Some datasets are OGL, but the HSCIC, currently the most complete of the datasets provided, specifically does not permit the use of data “for the purpose of promoting commercial products or services to the public”.
Following from my three points above, the goal of the benefits case is to generate a single dataset under the same licence, and from our demand-led point of view we can’t but ask for a more open licensing.

The same applies to Owen’s point number 3:

ODUG maintains that the GP practices data on the HSCIC site is not open data, and points to a page about “responsibilities in using the ODS data”. However HSCIC has recorded that dataset (EGPCUR) on Data.gov.uk as reusable under the OGL. (The ODS “responsibilities” page seems to written for NHS users. A literal reading only permits use of the data in connection with NHS-related activities, which is obviously not the actual licensing position.)

We are asking for clarity. If explaining all of this requires several searches and a blog post, I believe we are in the right asking for an improvement to the current situation.

The ODUG criticises the NHS Choices dataset as follows:

“the branding of the NHS Choices dataset as a ‘Freedom Of Information’ dataset is troubling from an Open Data perspective, mainly for is “on demand” nature: a FOI data release, being a reactive response to a request, does not establish an ongoing process; while data release under an Open licence often comes proactively from the publishing entity, which in doing so creates a sustainable data update procedure”.

I think this is rather over the top. NHS Choices hasn’t “branded” the data as a FOI dataset. It has merely made it available, along with a number of other useful data files, in the FOI section of its site. It would be nice if the NHS Choices site also had a dedicated open data landing page. However it’s perfectly sensible to draw users’ attention to existing datasets that they may want to know about before submitting a FOI request. NHS Choices says the data files are updated daily, so they are clearly not being published as a “reactive response” to FOI requests.

I’m sorry if Owen takes offence at the wording of this, we are not critical of NHS Choices. Our engagement on this topic with NHS England, moreover, has been totally collaborative and so far positive.

Our point can only be understood by remembering that it’s in ODUG’s DNA to approach Open Data from the user’s point of view. FOI and Open Data cover different aspects of the transparency agenda – both aspects are immensely important but come with different expectations from the users. Releasing Open Data under a FOI portal is confusing and, my personal opinion, semantically incorrect. “Branding” here is used in this sense, it doesn’t intend to be controversial.

Owen’s point that it’s good to attract users to existing datasets before they submit a FOI request is absolutely spot on and I totally agree with it. ODUG is not against a better engagement between FOI and Open Data, actually the opposite. It’s just that in this specific case we question the user experience.

There’s nothing wrong with arguing that existing datasets could be made more useful by improving the quality, or updating them more frequently, or appending data from other sources.

That’s the point of this benefits case. Showing the way for an improved quality of the dataset.

But we can have those arguments about most of the nation’s information infrastructure. A dataset doesn’t need to be ideal to be authoritative in practice.

The HSCIC and NHS Choices datasets are produced by the relevant official body, they are in wide use, and there are currently no better equivalents. The datasets are therefore, on the face of it, authoritative.

We need to start somewhere 🙂

“Authoritative” goes in conjunction with “to whom”. Yes, they are authoritative to the relevant official body; they are not perceived as sufficiently authoritative by the end users we aim to represent. We wish to make sure that the final authoritative dataset is a sum of the accuracies of the several datasets currently existing.

ODUG proposes that DoH establishes “an ongoing process to build, update and maintain on data.gov.uk an authoritative dataset of medical practices and operating practitioners, drawing on the datasets made available by HSCIC and NHS Choices”.

I’m not sure how ODUG expects DoH to build an authoritative dataset by drawing on datasets it has dismissed as non-authoritative. ODUG’s call is to DoH, but in practice DoH would surely delegate any such new process to HSCIC. So what is ODUG proposing HSCIC should do differently?

The datasets has not been dismissed as non-authoritative. What we are arguing is that the three datasets are only partially overlapping, and this is not controversial. We are calling for an “official” process of aggregation of such datasets so that it can be referred to as the authoritative source.

So the question is not what DoH or HSCIC could do differently; it’s about collaboration. The best outcome, the most accurate and reliable dataset would come from an integration of the various processes that collect such data, each of which aims at different objectives and follows different procedures. Collaboration and integration would create efficiency and build capacity in the bodies involved. It would be a win-win situation.

Maintaining the new dataset on Data.gov.uk is also unlikely to add credibility, given the current state of the DGU catalogue and other functionality. HSCIC already has its own platforms and they seem serviceable for the publication of data. What in the ODUG proposal requires the involvement of Data.gov.uk?

DGU has certainly some problems, but in our view we need to ask for more. We want DGU to be the index – not necessarily, in all cases, the repository – of all public open data. Hence, it makes sense for us to argue for having these datasets on DGU. Whether they end up being hosted on it, or just linked from it, it’s not for us to decide: we are confident that initiating the process of data integration is more important, and once the procedure has been identified, linking to or hosting the dataset will just come as a due consequence.

I’ve never been entirely on board with the idea of submitting “benefits cases” for release of open data, because it seems to conflict with the principle of “open by default”.

Why see this as contrasting goals? There is of course an ongoing struggle between the ideal situation (everything open by default) and the practicalities of actually establishing procedures for the release of datasets. We use benefits cases to priorities such releases or, in this case, to make sure the data is aggregated in a way that meets users’ requests.

In this instance ODUG seems to be arguing for creation of a new data product, combining the existing HSCIC/NHS Choices datasets with data from other sources such as GMC’s Medical Register and patient acceptance criteria for each GP practice.

That last source in particular would probably involve quite a bit of ongoing administration and processing, as patient acceptance criteria are not held centrally or in a standard format.

The fact data are not held centrally doesn’t mean that the data do not exist or that they should not be, in some form, available to the public. Electoral datasets are a clear example of this. Whether it’s new data, or newly collected data it doesn’t make a massive difference in our view of demand-led prioritisation of data releases.

Arguing for release of existing data is one thing. Arguing for the creation of new data products and new processes is something more.

I have no doubt there is room for improvement in the existing open data that HSCIC publishes on GP and dental practices. However public datasets are mainly produced to support a public task. I will be surprised if DoH takes up these ODUG recommendations without a more detailed demonstration of why the existing data and processes are inadequate to meet the requirements of the agencies and public bodies it supports.

Not even the official terms of reference ask ODUG to stop at simply arguing for the release of existing data; moreover, we aim at sustainable and efficient data releases. Sustainable data releases require an established process; where this process is already established, we can help identifying inefficiencies and overlaps and act as a bridge between the several organisations involved. We believe that a new dataset, collected in the way discussed, would greatly improve the public tasks of the agencies involved; preliminary discussions with representatives of such organisations suggested the conversations between them would benefit from a single data source.

My last point: we are contributing to the definition of what the NII should be. A GP dataset is for many reasons an obvious candidate for the NII. Given we are at this stage, and given the possibly great impact the NII might have on public tasks, it is in my opinion the right time to argue for a review of the way the GP data is collected and made available, and achieving results in this area would be an encouraging blueprint to follow in other contexts.