<- Back
Comments (145)
- pmarreckI have successfully reproduced a few projects with LLM assistance via strict cleanroom rules and only working off public specifications.
- dathinabThe argument that a rewrite is a copyright violation because they are familiar with the code base is not fully sound."Insider Knowledge" is not relevant for copyright law. That is more in the space of patent law then copyright law.Or else a artist having seen a picture of a sunset over an empty ocean wouldn't be allowed to pain another sunset over an empty ocean as people could claim copyright violation.Through what is a violation is, if you place the code side by side and try to circumvent copyright law by just rephrasing the exact same code.This also means that if you give an AI access to a code base and tell it to produce a new code base doing the same (or similar) it will most likely be ruled as copyright violation as it's pretty much a side by side rewriting.But you very much can rewrite a project under new license even if you have in depth knowledge. IFF you don't have the old project open/look at it while doing so. Rewrite it from scratch. And don't just rewrite the same code from memory, but instead write fully new code producing the same/similar outputs.Through while doing so is not per-se illegal, it is legally very attackable. As you will have a hard time defending such a rewrite from copyright claims (except if it's internally so completely different that it stops any claims of "being a copy", e.g. you use complete different algorithms, architecture, etc. to produce the same results in a different way).In the end while technically "legally hard to defend" != "illegal", for companies it's most times best to treat it the same.
- antirezI believe that Pilgrim here does not understand very well how copyright works:> Their claim that it is a "complete rewrite" is irrelevant, since they had ample exposure to the originally licensed codeThis is simply not true. The reason why the "clean room" concept exists is precisely since actually the law recognizes that independent implementations ARE possibile. The "clean room" thing is a trick to make the litigation simpler, it is NOT required that you are not exposed to the original code. For instance, Linux was implemented even if Linus and other devs where well aware of Unix internals. The law really mandates this: does the new code copy something that was in the original one? The clean room trick makes it simpler to say, it is not possible, if there are similar things it is just by accident. But it is NOT a requirement.
- darkwaterIt's not clear at all why the current maintainers wanted/needed this re-licensing. I guess that their employee, Monarch Money, wants to use derivative work in their application without releasing the changes?
- RoritharrAs part of my consulting, i've stumbled upon this issue in a commercial context. A SaaS company who has the mobile apps of their platform open source approached me with the following concern.One of their engineers was able to recreate their platform by letting Claude Code reverse engineer their Apps and the Web-Frontend, creating an API-compatible backend that is functionally identical.Took him a week after work. It's not as stable, the unit-tests need more work, the code has some unnecessary duplication, hosting isn't fully figured out, but the end-to-end test-harness is even more stable than their own."How do we protect ourselves against a competitor doing this?"Noodling on this at the moment.
- scosmanSounds like they didn’t build a proper clean room setup: the agent writing the code could see the original code.Question: if they had built one using AI teams in both “rooms”, one writing a spec the other implementing, would that be fine? You’d need to verify spec doesn’t include source code, but that’s easy enough.It seems to mostly follow the IBM-era precedent. However, since the model probably had the original code in its training data, maybe not? Maybe valid for closed source project but not open-source? Interesting question.
- p0w3n3dWow that's hot. I was not aware that you need to be "untainted" by the original LGPL code. This could mean that...All AI generated code is tainted with GPL/LGPL because the LLMs might have been taught with it
- hu3I torn on where the line should be drawn.If the code is different but API compatible, Google Java vs Oracle Java case shows that if the implementation is different enough, it can be considered a new implementation. Clean room or not.
- geenatFastAPI's underlying library, Starlette, has been going through licensing shenanigans too lately: https://github.com/Kludex/starlette/issues/3042Be really careful who you give your projects keys to, folks!
- binaryturtleIsn't the real issue here that tons of projects that depend on the "chardet" now drag in some crappy still unverified AI slop? AI forgery poisoning, IMHO.Why does this new project here needed to replace the original like that in this dishonourable way? The proper way would have been to create a proper new project.Note: even Python's own pip drags this in as dependency it seems (hopefully they'll stick to a proper version)
- ArdrenHuh, 7e25bf4 was a big commit. 2,305 files changed +0 -546871 lines changed https://github.com/chardet/chardet/commit/7e25bf40bb4ae68848...
- oytisI wonder if LLMs will push the industry towards protecting their IP with patents like the other branches of engineering rather than copyright. If you patent a general idea of how your software works then no rewrite will be able to lift this protection.
- mytailorisrich> Licensed code, when modified, must be released under the same LGPL license. Their claim that it is a "complete rewrite" is irrelevant, since they had ample exposure to the originally licensed code (i.e. this is not a "clean room" implementation).I don't think that the second sentence is a valid claim per se, it depends on what this "rewritten code" actually looks like (IANAL).Edit: my understanding of "clean room implementation" is that it is a good defence to a copyright infrigement claim because there cannot be infringement if you don't know the original work. However it does not mean that NOT "clean room implementation" implies infrigement, it's just that it is potentially harder to defend against a claim if the original work was known.
- charcircuitClean room implementations are not necessary to avoid copyright infringement.
- soulofmischiefThe README has clearly been touched by an LLM. Count the idiosyncrasies:“chardet 7.0 is a ground-up, MIT-licensed rewrite of chardet. Same package name, same public API — drop-in replacement for chardet 5.x/6.x”Do people not write anymore?
- q3k> 12-stage detection pipelineWhat is this recent (clanker-fueled?) obsession to give everything fancy computer-y names with high numbers?It's not a '12 stage pipeline', it's just an algorithm.
- myrmidonI think Mark Pilgrim misrepresents the legal situation somewhat: The AI rewrite does not legally need to be a clean room implementation (whatever exactly that would even mean here).That is just the easiest way to disambiguate the legal situation (i.e. the most reliable approach to prevent it from being considered a derivative work by a court).I'm curious how this is gonna go.
- skeledrewI feel like the author is missing a huge point here by fighting this. The entire reason why GPL and any other copyleft license exists in the first place is to ensure that the rights of a user to modify, etc a work cannot be ever taken away. Before, relicensing as MIT - or any other fully permissive license - would've meant open doors to apply restrictions going forward, but with AI this is now a non-issue. Code is now very cheap. So the way I see this, anyone who is for copyleft should be embracing AI-created things as not being copyrightable (or a rewrite being relicensable) hard*.
- imcriticLicenses are cancer and the enemy of opensource.