1:1 with Pamela Samuelson
In this podcast Pamela Samuelson (UC Berkeley School of Law) & the AI lab ‘decrypt’ Artificial Intelligence from a policy making point of view
📌Episode Highlights
⏲️[00:00] Intro
⏲️[02:59] Q1 - The Deepdive: AI Decrypted | What significant practical obstacles in complying with a transparency obligation about copyrighted works in training data do you identify?
⏲️[10:50] Q2 - The Deepdive: AI Decrypted | Looking at the disassembly or tokenization in the training process, can you explain why “generative AI models are generally not designed to copy training data; they are designed to learn from the data at an abstract and uncopyrightable level”?
⏲️[18:58] Q3 - The Deepdive: AI Decrypted | On generative AI outputs: 1) why is the idea that an AI could or should be recognised as author problematic, and 2) could prompts be detailed enough to meet the threshold of authorship?
⏲️[26:30] Q4 - The Deepdive: AI Decrypted | On licensing AI input your submission states: “(...) it will be impossible under current technologies to calibrate payments made under a collective licensing arrangement to actual usage of individual authors’ works.” What’s at stake?
⏲️[35:37] Outro
🗣️ A rule that (...) you have to keep very, very accurate records about what your training datasets are (...) is just (...) impractical if you care about (...) a large number of people instead of a few big companies being able to participate in the (...) generative AI space.
📌Episode Highlights
⏲️[00:00] Intro
⏲️[02:59] Q1 - The Deepdive: AI Decrypted | What significant practical obstacles in complying with a transparency obligation about copyrighted works in training data do you identify?
⏲️[10:50] Q2 - The Deepdive: AI Decrypted | Looking at the disassembly or tokenization in the training process, can you explain why “generative AI models are generally not designed to copy training data; they are designed to learn from the data at an abstract and uncopyrightable level”?
⏲️[18:58] Q3 - The Deepdive: AI Decrypted | On generative AI outputs: 1) why is the idea that an AI could or should be recognised as author problematic, and 2) could prompts be detailed enough to meet the threshold of authorship?
⏲️[26:30] Q4 - The Deepdive: AI Decrypted | On licensing AI input your submission states: “(...) it will be impossible under current technologies to calibrate payments made under a collective licensing arrangement to actual usage of individual authors’ works.” What’s at stake?
⏲️[35:37] Outro
🗣️ A rule that (...) you have to keep very, very accurate records about what your training datasets are (...) is just (...) impractical if you care about (...) a large number of people instead of a few big companies being able to participate in the (...) generative AI space.
🗣️ Data basically is in a certain form in the in-copyright works that are part of the training data but the model does not embody the training data in a recognisable way. (...) It's just not the way we think about the component elements of copyright works.
🗣️ If you think [licensing] will mean that authors will be able to continue to make a living, we're talking about really small change here in terms of each author's entitlement. It's not like you're going to get $10,000 or $50,000 a year.
🗣️ The collective license idea doesn't pay attention to (...) that we're talking about billions of works, (...) billions of authors, (...) a lot of things that essentially have no commercial value.
🗣️ [Collective licensing:] it's so impractical that it's just not really feasible. (...) No question that collecting societies would (...) be the big beneficiaries of this, not the authors.
🗣️ If a voluntary licensing regime works (...), I think that's fine. (...) [A] mandate that everything be licensed (...) is kind of unrealistic.
📌About Our Guest
🎙️ Pamela Samuelson | Richard M. Sherman Distinguished Professor of Law and Information, UC Berkeley School of Law
𝕏 https://twitter.com/PamelaSamuelson
🌐 Comments in Response to the U.S. Copyright Office’s Notice of Inquiry on Artificial Intelligence and Copyright by Pamela Samuelson, Christopher Jon Sprigman, and Matthew Sag (30 October 2023)
🌐 U.S. Copyright Office Issues Notice of Inquiry on Copyright and Artificial Intelligence
🌐 Allocating Ownership Rights in Computer-Generated Works (Pamela Samuelson, 1985)
🌐 Common Crawl
🌐 Shutterstock Expands Partnership with OpenAI, Signs New Six-Year Agreement to Provide High-Quality Training Data
🌐 Prof Pamela Samuelson
Pamela Samuelson is the Richard M. Sherman Distinguished Professor of Law and Information at UC Berkeley. She is recognized as a pioneer in digital copyright law, intellectual property, cyberlaw and information policy. Professor Samuelson is a director of the internationally-renowned Berkeley Center for Law & Technology. She is co-founder and chair of the board of Authors Alliance, a nonprofit organization that promotes the public interest in access to knowledge. She also serves on the board of directors of the Electronic Frontier Foundation, as well as on the advisory boards for the Electronic Privacy Information Center, the Center for Democracy & Technology, and Public Knowledge. Professor Samuelson has written and published extensively in the areas of copyright, software protection and cyberlaw, with recent publications looking into the possible intersections of generative AI and copyright.