AI lab TL;DR | Stefaan G. Verhulst - Are we entering a Data Winter?
🔍 In this TL;DR episode, Dr. Stefaan G. Verhulst (The GovLab & The Data Tank) discusses his Frontiers Policy Labs contribution on the urgent need to preserve data access for the public interest with the AI lab
📌 TL;DR Highlights
⏲️[00:00] Intro
⏲️[01:13] Q1-‘Data Winter’:
Can you provide a brief overview of your concept of 'Data Winter' and why you believe we are on the brink of entering one?
⏲️[05:05] Q2-Generative AI-nxiety:
What are some of the most significant challenges currently hindering public access to social media and climate data, and the effects of Generative AI-nxiety?
⏲️[07:49] Q3-‘Decade for Data’:
Could you outline what the “Decade for Data” initiative entails and how it could transform data stewardship and collaboration?
⏲️[12:25] Wrap-up & Outro
💭 Q1-‘Data Winter’
🗣️ At the time of an AI summer, when everyone suddenly is excited about the potential of
generative AI (...) for public interest purposes, (...) we are actually entering a data winter.
🗣️ What I’ve witnessed the last few months, and that’s mainly as a result of advances in artificial intelligence, is that we actually see a backtracking of the progress that we’ve made in society as it relates to opening up data for public interest purposes.
🗣️ Social media platforms such as X, but also Facebook, have closed down access to some of their data for research and for data journalism purposes as well.
🗣️ Science data, such as climate science data, which was typically open science, has now become commercialised and is becoming proprietary data enclosed for many in society.
🗣️ The initial data that was available for training data has now also become much harder to access, a result of concerns that some of that data has been extracted without a return to the data holder.
💭 Q2-Generative AI-nxiety
🗣️ Some of the data that typically was available through APIs has now been closed off, and so some are calling this the post-API environment that we're currently in, where data was easily available through an API now is actually much harder to access unless one pays for it.
🗣️ New licensing is being used to actually shield off the data for public interest purposes as well. So there are a whole range of vehicles that exist to enclose data that actually makes it much harder to access it for reuse.
🗣️ We see a decline in access to Wikipedia, a decline in people accessing Wikipedia, and a decline in people contributing to Wikipedia, mainly because they fear that whatever they contribute will be used as training fodder for generative AI purposes.
🗣️ Initiatives like Wikipedia, which are to a large extent the main source of a lot of the training data of generative AI services, are currently also suffering from AI extraction because they are dependent on voluntary contributions by the audience and the participants.
🗣️ As a result, we are entering a data winter, which if we are not careful (...) may actually affect the AI summer that we currently have as well.
💭 Q3-‘Decade for Data’
🗣️ I’ve been calling for, together with others, such as the United Nations University, a Decade for Data, which is a typical way the United Nations often operates, to feature a problem and then have a well-defined strategy to address that problem.
🗣️ A Decade for Data would have multiple components, one being advancing data collaboration, where you actually have new models of data being shared, including data commons, which can be updated in the current AI environment.
🗣️ We need a new reimagined profession of data stewards that are individuals or teams who have the sophistication and competencies to provide access to data in a systematic, sustainable, and responsible manner.
🗣️ A Decade for Data would also involve rethinking data governance and embedding digital self-determination in data governance to go beyond the current paradox of consent, facilitating access in a way that aligns with perceptions, expectations, and preferences of communities.
🗣️ Establishing a social license for reuse is key, where you understand the preferences and expectations of communities and individuals, translating that into a social license so that data can be reused in a way that is trusted and aligned with community expectations.
📌 About Our Guest
🎙️ Dr. Stefaan G. Verhulst | Co-Founder, The GovLab & The Data Tank
🌐 Frontiers Policy Labs | Are We Entering a Data Winter?
🌐 The Data Tank
🌐 GovLab
🌐 Dr. Stefaan G. Verhulst
Dr. Stefaan G. Verhulst co-founded several research organisations, including the GovLab (New York) and The DataTank (Brussels). He focuses on using advances in science and technology, including data and AI, to improve decision-making and problem-solving and has been recognized as one of the 10 Most Influential Academics in Digital Government globally.