Notes from csv,conf,v9

Posted on Sep 15, 2025

Image

I’ve just come back from a splendid week in Bologna, where I helped run csv,conf,v9 for the first time.

Videos will become available soon, but for now you can find a growing list of slide decks here.

Paola Masuzzo’s live commentary on Bluesky has now also been “unrolled” here.

Here’s a quick summary from some of the talks I attended.

Keynotes

I was really happy to have the opportunity to introduce 2 of the 4 keynotes:

The role of data in protecting information integrity

The Maldita.es team made by CEO Clara Jiménez Cruz and CTO David Fernández Sancho. They spoke on how Maldita uses technology and data to empower communities to defend themselves from misinformation.

Interestingly, I had serendipitously heard Clara and David speak at TRGCON last year, and I immediately thought they’d be a great match for csv,conf,v9 – their talk was outstandingly well received by over 20 minutes of questions :)

Slides.

Fifteen years into the open data movement

Italian open data legends Giorgia Lodi and Andrea Borruso of Associazione onData charted the journey of 15 years of Italian open data, with some successes and a lot of difficulties.

Slides.

The other two keynotes were also brilliant:

Make Mirrors, Not Windows

Professor Rahul Bhargava looked at how communities can use data to look at themselves rather than others.

Abstract: “The tools of data storytelling are primarily designed to observe and quantify other groups, as if looking through a window for measurement. To use data for community empowerment we need to reset and focus on building more data mirrors, creating ways for the subjects of data themselves to own the narratives and impact of associated data stories. This requires a broader toolbox of techniques; data beyond the visual. Across the globe, groups are working with artists and designers to create participatory data experiences that redefine what data storytelling can look like. We must all learn from, contribute to, and build on these off-screen examples in order to more effectively and appropriate work on community data. Together, we can fight the damaging power dynamics of datafication by working with communities to create more data mirrors.

Slides.

Developing tech with the community: the example of Open Data Editor

Our very own Sara Petti, who was the power behind Bologna’s involvement in csv,conf,v9, told the story behind the great success of the OKFN’s Open Data Editor.

Abstract: “Data is messy. In the 21st century, this is still a sad but solid reality. A lot of people out there working with data everyday (journalists, activists, small public administrations) spend much more time than they would like reviewing datasets to detect errors and getting them ready before they can finally move to the part of work they actually enjoy. Many of those people don’t have the programming skills that would help them automatise some of that work, and sometimes lack data literacy skills. Open Data Editor is a desktop application specifically designed to help people detect errors in tables. It has been developed in constant interaction with the community from a very early stage. These interactions helped us understand what was really helping the community and what not, and especially made us aware of how much the use of such a tool could actually be helpful in increasing data literacy. In this talk we will share our experience piloting the application with different organisations from around the world (Open Knowledge Nepal, City of Zagreb, Bioinformatics Hub of Kenya initiative, the Demography Project, Observatoire des Armements) and how these collaborations shaped the application, making it what it is today. We will also demonstrate that to have a tech product that really helps communities, you have to develop it with them, not only for them.

Slides.

Talks

Building CSV-powered tools for social sciences

Researcher Guillaume Pique gave what was for me the most useful talk in terms of practical data wrangling (with a beautifully crafted slide deck). The set of tools described to process datasets in the command line, including the ability to generate text-based data visualization for quick data exploration, was predominantly aimed at the social sciences / digital humanities, but could easily be applicable elsewhere.

Abstract: “CSV is ubiquitous in social sciences and in the humanities. CSV data is indeed the perfect bridge between social scientists, accustomed to dealing with tabular data, and research engineers needing to process the same data. That is why SciencesPo’s médialab has been building many of its Open-Source tools around CSV files, from well-designed web apps such as Table2Net to convert tabular data into graph data, down to powerful CLI tools such as minet to collect data from the web or xan to process tabular data using constrained resources. What’s more, the CSV format is aligned with an ethos of sobriety and does not require overpowered hardware to be processed. This is very important to us because our public is mostly comprised of researchers, students, data journalists and other members of civil society that do not have access to powerful machines & servers. This talk is therefore an occasion to tell the tale of 10 years of building social sciences tools around CSV data and to be a testament to our lab’s love for the format.

Slides.

Semantic MediaWiki data description, transformation and integration tool

Marco Montanari, one of our local crowd members, and Wikimedia Italia treasurer, spoke about using semantic facilities in MediaWiki, in the context of his work implementing Bologna’s Digital Twin.

Abstract’s excerpt: “For the Bologna Digital Twin platform, we developed a MediaWiki-based Semantic Data Transformation tool. This enables us to describe, transform and integrate data in various forms as semantic structures and, via scripts interacting with it we are able to semantically enhance and merge datasets to create new data keeping the data structures and data lienage aspects intact, while creating also specific new outputs to be reused.

Slides.

Professor Monica Palmirani, Legal Informatics expert and one of Akoma Ntoso’s authors, gave this talk that, as per abstract, “presents the core features of Akoma Ntoso (the LegalDocML standard endorsed by OASIS), demonstrating how its semantic annotation facilitates the unlocking of legal knowledge and its application in advanced legal AI systems.

Akoma Ntoso is the legislative data standard on which ultimately Parli-N-Grams is based, so I’m particularly fond of it.

Slides.

OpenAQ: empowering communities with open air quality data and tools

Russ Biggs spoke about the OpenAq project trying to “aggregate and harmonize open air quality data from across the globe onto an open-source, open-access data platform”.

Bridging Communities The Carpentries and GREI Collaboration Creators

Dr Kari Jordan looked at the work done by the Carpentries, an organisation whose training materials are world-renowned for their effectiveness.

Abstract: “In this talk we share the outcomes of a collaboration between The Carpentries and the Generalist Repository Ecosystem Initiative (GREI), an NIH-led effort to enhance data-sharing practices across disciplines. Generalist repositories play a vital role in making research outputs more discoverable and reusable, but researchers need the right skills to maximize their potential. Through this partnership, we aimed to align The Carpentries’ training programs with GREI’s mission by integrating best practices in data management, refining educational resources, and fostering a culture of open science. Our collaboration includes workshops, curriculum development, and community engagement to support researchers in navigating generalist repositories effectively. Join us to explore how this initiative can empower communities, enhance data workflows, and advance open, FAIR-aligned research practices.

Slides.

Rethinking Open Data: From Civic Participation to Democratic Defense

Luigi Reggi explored how communities take part in open government data as a means of empowerment and accountability. This was an incredibly well-structured talk, that however got me a little nostalgic thinking that it’s over 10 years since the Open Government Partnership summit was held in London, with a lot of unfulfilled prokmises.

Abstract: “My talk critically explores the evolving relationship between open government data, community empowerment, and democratic accountability. Governments increasingly view data as tools of centralized control, efficiency, or even political repression, undermining community trust and participation. Reflecting on cases like the recent IRS-ICE data-sharing controversy in the U.S., I argue that community-driven data initiatives might be called to actively defend democratic values, turning civic monitoring of public action into a crucial safeguard against government misuse of data. Using examples from specific civic monitoring initiatives, I illustrate how practical tools such as data literacy, policy literacy, independent oversight, and organized citizen engagement can be employed as democratic practices to detect, resist, and counteract potential abuses of government data, such as selective disclosure, surveillance, or politically motivated targeting.

Slides.

30 years in Italy through open data

My former next-door flat mate Enzo Alberto Candreva, an Optical Network engineer, told the evolving story of Italy through the lens of open government geographic data.

Abstract: “In this talk I will show the use of the Italian Census data and the Corine Land Cover data to obtain a quantitative understanding of the dynamics of the population and of the land usage in the last 30 years in Italy. Further, the integration of the two datasets allows to highlight how the increase of the population in a given area has affected the environment.

A Toolkit for Community-Driven Data Governance

Jennifer Ding, who I worked with at organising London Data Week, spoke about the interesting idea of data governance within communities; especially of interest was the fact that the case study was about voice data governance in choirs.

Abstract: “Knowledge and creative communities generate large troves of data every day, but face challenges in governing this valuable resource, especially in the wake of large-scale scraping and extraction of online commons by AI technologies. Based on our studies of and contributions to online communities like Reddit and Stack Exchange, as well as data donation initiatives such as Mozilla Common Voice and the Serpentine’s Choral Data ‘Trust’ Experiment, we propose the development of a toolkit for Community-Driven Data Governance. The talk will cover primary categories for resources to support communities interested in establishing data governance, such as raising awareness and data literacy, synthesising group preferences, building capacity for collective data governance, and legal vehicles for data governance. Case studies on initial tool prototypes for the toolkit will also be shared, such as open source platforms for data collection, templates for licenses and other legal documents, and guidance for using tools like Polis to support group decision making. We hope to engage with the csv,conf community to prioritise areas of development to further community-driven data governance, identify additional tools in development by other organisations to integrate in the toolkit, and connect with data communities who may be interested in adopting or contributing to the toolkit.

How did we get to OpenCitations: a brief history of open scholarly citations

A final shout out to Silvio Peroni, whose talk I didn’t manage to attend but is of particular interest to me, especially due to my earlier work in library data.

Abstract: “In this talk, I will briefly introduce the key milestones that have led to the concept of open citations and subsequent initiatives aimed at making metadata, abstracts, and research information accessible and open. Such a historical introduction will be intertwined with the path that led to the creation of one of the current Open Science infrastructures dedicated to providing open research information, namely OpenCitations. OpenCitations’ mission, data, services, and governance will also be briefly introduced, showcasing possible scenarios that highlight different ways OpenCitations data have been used in previous years.

Slides.

I’ll update the links to slides and videos as they come available.