The chardet open source library relicensed from LGPL to MIT two days ago thanks to a Claude Code assisted "clean room" rewrite - but original author Mark Pilgrim is disputing that the way this was done justifies the change in license - my notes here: simonwillison.net/2026/Mar/5/c…
Can coding agents relicense open source through a “clean room” implementation of code?
Over the past few months it’s become clear that coding agents are extraordinarily good at building a weird version of a “clean room” implementation of code. The most famous version …Simon Willison’s Weblog

Matt May
in reply to Simon Willison • • •AUSTRALOPITHECUS 🇺🇦🇨🇿
in reply to Simon Willison • • •Julian Andres Klode 🏳️🌈
in reply to Simon Willison • • •Gordon Messmer
in reply to Simon Willison • • •"There are several twists that make this case particularly hard to confidently resolve:"
I really expected one of them to be that LLM output isn't subject to copyright under US law. Since a license is a grant of permissions that would not otherwise exist due to copyright, applying a license to LLM output doesn't make any sense.
No one needs explicit permission to use LLM output.
penguin42
in reply to Simon Willison • • •Florian
in reply to Simon Willison • • •On the legal side, I am not an expert. But I understand the concerns of moving to a more permissive license regardings the user's freedom.
And my general feeling is, well, generative AI is technically impressive, but its really putting a lot of mess on the planet and humans relations.
I am not entirely stubbornly opposed (:p), otherwise following you would be masochism ;), but I struggle to find benefits in this tools, for us, as a society.
Tom Bortels
in reply to Simon Willison • • •It's clearly not "clean-room" - but as has been pointed out, that may or may not be necessary to relicense. I'd call it a re-implementation, but again it's unclear how that affects licensing - these are very uncharted waters.
But here's a new wrinkle: at least in my current understanding, you can't copyright AI generated code. Doesn't that imply you can't impose a license on it as well? IANAL, but my take is this re-implementation at best makes an unemcumbered implementation, but that's also hinky - can I feed copyrighted code into an AI to strip the copyright? Probably not, but arguably what has happened here, albeit indirectly.
It seems to me the original license was clearly violated when the AI provider ingested the licensed code as part of the training corpus - the resulting AI data is *clearly* a derivative work. Put that in your pipe and smoke it!
(My gut says the real root issue here is copyright started breaking the day it applied to something other than books, and each media change breaks it more. It needs to be replaced by a better system or removed ent
... Show more...It's clearly not "clean-room" - but as has been pointed out, that may or may not be necessary to relicense. I'd call it a re-implementation, but again it's unclear how that affects licensing - these are very uncharted waters.
But here's a new wrinkle: at least in my current understanding, you can't copyright AI generated code. Doesn't that imply you can't impose a license on it as well? IANAL, but my take is this re-implementation at best makes an unemcumbered implementation, but that's also hinky - can I feed copyrighted code into an AI to strip the copyright? Probably not, but arguably what has happened here, albeit indirectly.
It seems to me the original license was clearly violated when the AI provider ingested the licensed code as part of the training corpus - the resulting AI data is *clearly* a derivative work. Put that in your pipe and smoke it!
(My gut says the real root issue here is copyright started breaking the day it applied to something other than books, and each media change breaks it more. It needs to be replaced by a better system or removed entirely, but so much money is involved by vested interests it never will be...)
yoasif
in reply to Simon Willison • • •AI’s Unpaid Debt: How LLM Scrapers Destroy the Social Contract of Open Source
Youssuff Quipsdoboprobodyne
in reply to Simon Willison • • •Obviously! The source code is: that code required to produce the binary. That code was LGPL. It doesn't matter how many algorithms, nor the nature of the algorithms, it goes through to become those 1s and 0s.
#law #lawfare #computerScience #intellectualProperty #licensing #FOSS #GNU #LGPL #MIT #code #softwareEngineering #LLM #codeWashing
mirth
in reply to Simon Willison • • •> Claude itself was very likely trained on chardet as part of its enormous quantity of training data—though we have no way of confirming this for sure
A note on that: It would be easy to paste snippets of code from the original codebase into Claude and ask it to analyze, attribute, and fill in the next few lines. Depending on what the answers are they may constitute a near certain confirmation.
Matěj Cepl 🇪🇺 🇨🇿 🇺🇦
in reply to Simon Willison • • •No right to relicense this project
a2mark (GitHub)Andreas 🌈
in reply to Simon Willison • • •the situation is interesting and the questions are challenging. The situation is even more complex and undefined if you create a new AI-based implementation based on an existing AI-implementation:
I created a Rust implementation of chardet based on this particular chardet v 7 version. I decided to pick the original LGPL version for this AI-based-on-AI implementation (which is by all numbers much, much faster than V7).
github.com/zopyx/chardet-rust
GitHub - zopyx/chardet-rust: Universal character encoding detector for Python — Rust-powered fork of chardet 7.0.
GitHubMartin
in reply to Simon Willison • • •So it seems this talk from FOSDEM became reality almost in an instant:
fosdem.org/2026/schedule/event…
FOSDEM 2026 - Let's end open source together with this one simple trick
fosdem.org