AI is just unauthorised plagiarism at a bigger scale

<- Back

AI is just unauthorised plagiarism at a bigger scale

speckx

Comments (578)

danorama
There’s a fallacy that gets used a whole lot to justify things like this (not just with LLMs), and I see it in many of the comments here: If it’s OK (or at least negligible on a small scale), then it must be OK on a large scale.It usually goes something like: If I can make money by learning something from a web page, why does a computer making money by learning everything from everyone upset people so? It’s the same thing!It’s like if I go to Golden Gate Park and pick one flower, I shouldn’t do that, but no one cares. But if I build a machine to automatically cut every flower in the park because I want to sell them, that’s different.“You say I can pick one flower, but you get upset when I take a bunch. That’s inconsistent. Check and mate.”But quantitative changes in an activity produce qualitative changes. Everyone knows this, but sometimes they seem to find it inconvenient to admit it. Not that effects of the qualitative change are always bad, but they are often different, and worth considering rather than dismissing.
dvduval
The broader problem of original sources not being given credit in a way that rewards them remains. Websites owners are paying to host their content so that spiders can come and crawl them and index it into the AI and then if they’re lucky, they might get a citation, but otherwise there’s very little reward for being a provider of content. And of course, this is something that’s getting worse and worse. Why look at a website when it’s all in AI? And then the counter to that is maybe we need to start closing the website to crawlers and put everything behind a login.
deaton
"Steal an apple and you're a thief. Steal a kingdom and you're a statesman." - Literal Disney villain
tancop
if theres just one good thing coming out of ai its breaking copyright law forever. no one should be able to "own" ideas. royalties for commercial use is another thing and i support it but what we know as (non commercial) piracy and unlicensed fan art should be 100% legal
storus
This is really not so clear cut as "fair use" might cover 99% of all data scrapping; you are not reproducing the originals just use them to estimate probabilistic distribution of tokens in pre-training. You are never going to get the exact book word-for-word using LLMs.
pluc
Seriously how is this surprising? We all know AI companies stole troves of data to train their models, why do you think they'll stop? Have they faced consequences for the mass theft of copyrighted data?You can't steal or profit off of that data, but it's fine for them for whatever reason. I guess because they're a force for good in the world and are pushing humanity forward eh?
ggillas
IP attorney here and actively working on this problem.nla: if you create content online (public repo code, blog, podcast, YouTube, publishing) the smartest thing you can do if to file a US copyright, even if you have a hobby blog.Anthropic paid $1.5B in a class settlement to authors because it was piracy of copyrighted works. If we as a HN community had our works protected, there are potentially huge statutory damages for scraping by any and all llms. I work with hundreds of writers and publishers and am forming a coalition to protect and license what they're creating.
chrisbrandow
I think what gets conflated are two aspects.1. LLM/transformer technology is legitimately amazing and revolutionary. 2. In the end, they function as an enormous, effective database for most human knowledge.Point 1 obscures the fact that if someone just created an SQL database with every digital artifact in existence and provided it for free upon request, there would be no ambiguity whether that was legal or not.But distillation, etc obscures this relationship and it looks like something other than straight lookup, at least in part because it is obviously more than that.
MontyCarloHall
Did You Say “Intellectual Property”? It's a Seductive Mirage. [0][0] https://www.gnu.org/philosophy/not-ipr.html
nitwit005
I can see this argument, but I'm not sure it matters, because it's looking like these companies are just directly violating copyright law.Meta pirated books using BitTorrent: https://arstechnica.com/tech-policy/2025/02/meta-torrented-o... xAI is busy suing to try to avoid disclosing where they got their training data from, which hints at similar problems: https://storage.courtlistener.com/recap/gov.uscourts.cacd.10...
kstenerud
> their article contains links to my actual website, with the exact link text (?!)I'm having a hard time understanding what's wrong here? Unless the link text is very long, why would someone linking to your article use different words for the link text?
tracerbulletx
Whether or not its technically copyright infringement isn't the main issue I have. Its mostly that it concentrates the ability to collect rent from all of the content in the world into the hands of the few corporations who can build data centers at scale. This is a huge problem. Why would I make a webpage, a news site, an online magazine, or create art commercially if it can be swept up into these models and cut me out of any incentive? If its not legally copyright infringement now we need a new legal framework around it because its an absolute tragedy for human creativity and small enterprise.
adamzwasserman
People need to cope with the fact that no thought is original. Even Newton and Leibniz were having the same thoughts at the same time. Get over it.
hparadiz
You guys have fun arguing. I'm gonna be building cool stuff.
baby
I dislike this argument because it’s about limiting the most powerful technology we ever invented because it doesn’t fit well with how we established some social structures.
tptacek
People were effectively copying websites (especially ecommerce tutorials) and beating the original authors at SEO decades before ChatGPT 2.
andai
There's two aspects to this.The pretraining (common crawl, i.e. the entire internet. Also books and papers, mostly pirated), and the realtime web scraping.The article appears to be about the latter.Though the two are kind of similar, since they keep updating the training data with new web pages. The difference is that, with the web search version, it's more likely to plagiarize a single article, rather than the kind of "blending" that happens if the article was just part of trillions of web pages in the training data.There's this old quote: "If you steal from one artist, they say oh, he is the next so-and-so. If you steal from many, they say, how original!"
baq
turns out plagiarism at scale can solve Erdos problems
arjie
The linked article shows that LLMs can be used to plagiarize content through rewriting. Then he gets SEO'd out of it. But it doesn't demonstrate that AI is just plagiarism.
oytmeal
Isn't plagiarism inherently unauthorized?
damnesian
Not the first time I've had the thought massive lawsuits could be in all AI company's future. Surely they realize they are living on borrowed time simply by being the current trendy tech.
a13n
The US drastically prefers the economic impact of AI over enforcing this…You can get away with quite a lot if you’re creating trillions in GDP.That’s just the world we live in whether we like it or not.
saghm
It's basically the same thing as the old joke "if you owe the bank a million dollars, you have a problem; if you owe the bank a billion dollars, they have a problem". IP law seems to always be disproportionately wielded against smaller players, and the ones who are big enough get away with it.
cryptocod3
There's authorized plagiarism?
rastrojero2000
It's not though, that's just the business case, where the perverse business incentives lie.LLMs are really cool text generators and it turns out we can generate a bunch of things from text they generate.Problem is, several of those things can be horrendous for the continued survival of the species and those happen to make the people running those AIs a ton of money, and, in perverted societies, thus also clout.
isoprophlex
> Is this what the pinnacle of human is? Lazy and greedy?Yes. At least it is what the currently prevailing economic system of "value extraction and capital concentration at all cost" incentivises us towards.
aoeusnth1
Did I miss where OpenAI plagerized the disproof of the planar unit distance problem from?
fritzo
What has "artificial" to do with it? Human intelligence is also unauthorized unconscious plagiarism.
jeisc
AI is an organized intellectual property rip off in the name of advancing human learning but the commercialization of the products seem like legal licenses to steal.
ecommerceguy
I remember playing around with Writesonic in my days of spammy seo tactics (some of my products weren't allowed on marketplaces & advertising platforms due to hazmat products so..). Often times I would see my own product descriptions nearly verbatim in the output.100% creators should get compensated by ai platforms for their work.Further, I can see a day where someone like Reddit will close off or license their data to llms. No doubt they are losing traffic right now.
frankest
You are going to see the same thing that happened with newspapers. Those who want to train the AI with their content (advertisers, PR) will push out more content for AI in the open. Those who have quality content that gives you an advantage will try to lock out AI or get pricy subscription APIs for humans and even pricier for AI.
dominicrose
Talking about a bigger scale may be confusing because some of the information AI can train on comes from niches.I wouldn't mind if an AI trained on old Disney movies (or new ones for that matter), but exploiting niches (like local newspapers) seems bad.
barnabee
The war on copying is like the war on drugs: unwinnable, and socially useless.Let information be free for personal and recreational uses[0], and vote for governments that will fund the arts. The corporations will be just fine.[0] The AI companies and big tech vs publishers, music labels, etc. can fight to the death in the courts over who owes who what, for all I care.
anon
undefined
hmokiguess
It's so wild, I can't even think what the end path will look like. Will there be a major settlement? Will this abolish some form of copyright as a precedent? Something else? My brain hurts just to try and reason about it, yet, the fact remains it's now ubiquitous and change is inevitable.
nate
this is why I feel like we need some kind of "consortium" or government effort to be like "yo, llms, you need to honor some kind of source markup to give us people you mention more significant boost"? like if you mention my article, you better also show my ad partner?
ProllyInfamous
>>"The underlying purpose of AI is to allow wealth to access skill while removing from the skilled the ability to access wealth." @jeffowski (first I read it, not sure if author)Bezos' admission, recently, that the bottom 50% of current taxpayers ought'a NOT pay any taxes... is just preparing us for the inevitable UBI'd masses.: own nothing, be happy!
motbus3
It allows data do be compressed into the weights and the mere coincidence of certain strings of a book will make it spit the full book
MarlonPro
Maybe it's time to rethink the plagiarism laws? AI is not going away.
pull_my_finger
What gets me is when this was brought up, they said "requiring explicit permission will kill the AI industry"[1]. No shit! Why do you think all the rest of us didn't build a business/"industry" around stealing shit? They could have done it at a slower pace while respecting copyright laws, but they were too greedy to be first to market and secure a hold.[1]: https://www.theverge.com/news/674366/nick-clegg-uk-ai-artist...
iloveoof
I don’t know if this author supports OSS but I’ll share this because HN generally is full of people with that mindset.It’s deeply ironic that if you forget about LLMs and look only at the outcome—-we’ve found a way to legally circumvent copyright and the siloing of coding knowledge, making it so you can build on top of (almost) the whole of human coding knowledge without needing to pay a rent or ask for permission—-it sounds like the dream of open source software has been realized.But this doesn’t feel like a win for the philosophy of OSS because a corporation broke down the gates. It turns out for a lot of people, OSS is an aesthetic and not an outcome, it’s a vibe against corporate use or control of software, not for democratized access to knowledge.
biscuits1
"Is this what the pinnacle of human is? Lazy and greedy?"Selfishness, too. But if I follow the logic, and citations are added, how would one enforce a copyright claim if the creator is amorphous and all-knowing?
dspillett
More like “GenAI enables plagiarism at a bigger scale”.People copying through GenAI would have done so before if they had a tool that so easily allowed them that facility.
adamtaylor_13
I read the article, but I disagree. People are angry, and that's completely understandable. I believe it's a justifiable response to the huge upheaval happening. But being angry about LLMs does not magically transmute their output into "plagiarism".It has always been possible to take someone's public work, put a twist on it, and then sell it as unique. (I'm not making a moral/ethical argument, only a legal one.) I have yet to see any evidence that LLMs are fundamentally different from that approach.
hiroto_lemon
Worth noting what changed isn't AI itself — copying always existed. LLM just made per-article rewrites a 5-second job. Detection didn't get the same speedup; that's the actual break.
mindcandy
> AI takes in all the input, whether the original authors have consented or not, and do some "learning"What would it mean for authors who publish content publicly to the web, without access restrictions, to provide consent for learning from it?"EULA: Most people are allowed to learn from this text. If you work in an AI-related field, even though you can clearly see this page because you are reading this text right now, you are not permitted to learn anything from it. Bob Stanton, you are an a-hole. I do not consent to you learning from this web page. Dave Simmons, you are annoying. But, I'll give you a pass. For now... Also: plumbers. I do not like plumbers for reasons I will not elaborate. No plumbers may learn from my writing in an way."
i4i
He ends his essay with "Fuck Google for ranking some copycat website higher than mine, even though they copied my article", but how is it not OpenAI, Anthropic etc. as well as Google, to blame. We're meant to believe that with their resources they couldn't have created a micro-payment scheme to compensate creators? Altman on Fridman podcast two years ago about compensation... https://youtu.be/jvqFAi7vkBc?si=9YbKoH_dFIishAXt&t=2409
kingleopold
with this logic, business is also just unauthorised plagiarism at a bigger scale. Because all the products/services gets copied and not all of them have patents etc???
slowhadoken
Corporate proprietary plagiarism through openwashing.
schwartzworld
Let this sink in: I wanted to open source a package at work at needed approval from legal and other teams to make sure I wasn't leaking anything proprietary. The same executives that worried about proprietary, copyrighted code being leaked 10 years ago are now mandating using the plagiarism machine.The whole AI bubble is The Emperor's New Clothes, and it feels liek more people are finally admitting it.
peterbell_nyc
I do just want to highlight that this is also what humans do. We read a bunch of content online and then use it in our work product. The vast majority of the value that I provide comes from copyrighted information that I have ingested - either directly with a payment to the creator (bought and read the book, paid for and attended the seminar) or indirectly via third party blog posts or summaries where I did not then pay the originator of the materials.I think there are real questions around motivations for creation of novel, high quality valuable content (I think they still exist but move to indirect monetization for some content and paywalls for high value materials).I don't inherently have any problems with agents (or humans) ingesting content and using it in work product. I think we just need to accept that the landscape is changing and ensure we think through the reasons why and how content is created and monetized.
zach_1337
Are people going to start putting garbage white text on the internet to intentionally corrupt training?
markhahn
Is he ignorant, or trying to mislead?AI is not a plagiarism engine. It can be used that way, but is not inherently so. It is not necessary that a trained LLM be able to faithfully reproduce every document in its training set. The entire structure of an LLM is not storage, but at least in principle, generalization: extraction of a somewhat abstracted "structure" of semantically similar "concepts".But we also need to talk about authors' "rights". It's well-established that reproducing a work is infringement. There is a lot of caselaw about how much may be reproduced without infringement. But the idea that an author should be consulted before ANY automated use of their published (public) text? No, just no.
fullshark
That sounds pretty useful
mrbluecoat
> AI ... do some "learning"Is AI plural or is that a typo?
erelong
"intellectual property" is something of a legal fiction
ironman1478
People keep saying open source is an example of how copyright doesn't quite matter. However, many of the biggest open source projects are contributed to by massive corporations. Linux has lots of contributions from all the FAANGs, Red Hat, etc. Yes, it's not protected by copyrighted, but also the way it's produced is wholly different from how an artistic work is produced. Contributing to Linux is nothing on the balance sheet of Google for example, whereas producing art for an independent person or a whole company who's purpose is to create art can be very expensive.Artists are taking risks and need legal protection if they want to make art for a living. If artists were making FAANG engineer compensations or all worked at institutions like universities (with all their protections) then maybe they wouldn't care about copyright, but that isn't the living situation for every artist.You could say an artist shouldn't rely on making art for a living, but that's actually a different discussion.
muldvarp
I agree but AI is a) owned by rich people and b) (sadly) too useful for this to matter.
jorisw
> X is just Y butCan't recall the last time a compelling argument started out like this
energy123
It's a problem with only one practical solution: taxation.
illiac786
Isn’t it rather authorized plagiarism?
tiahura
To answer the author's question: Yes, progress IS largely built on the shoulders of those who came before.
redwood
If this all leads to a generative monoculture that is also Frozen in Time that would be pretty sad.
I_am_tiberius
It's essentially a new napster.
_-_-__-_-_-
Recent thoughts, https://theonlyblogever.com/blog/2026/distrust.html
dwa3592
Plagiarism by default is unauthorised so I think the title should be "AI is just authorised plagiarism". It's authorised by the markets, the governments and the society at large.
andy12_
Someone blatantly copied their tutorials but ChatGPT is to blame, somehow? The accusation here isn't even that ChatGPT learned from their tutorials and then generated them verbatim. The accusation is that someone copied the whole article and rewrote it with ChatGPT (which they could have done manually without AI anyway).
NetMageSCW
Reading is just unauthorized plagiarism.
alex1138
I'm reasonably information wants to be free. I think the copyright cartels have enacted a lot of damageHaving said that Facebook has to be one of the worst offenders. They don't even allow links to Anna's Archive, they seemingly scraped (maliciously; their crawlers are more resource intensive than anyone else's) LibGen for profit - which is a different calculus
asklq
Yes, of course it is. If the model is built on all human information, then it is by definition a derivative work of all human information and as such violates IP.Currently politicians don't understand this and listen to the criminals like Amodei, but it will change.It took a while to deal with Napster etc., but the backlash will come.
tayo42
I think AI is just getting people riled up. Not sure what AI has to do with anything in this case here. Someone copy and pasted his content, could have been done without AI.I guess AI could have made a better website and did better SEO then him but that's not really the issue
bparsons
I am old enough to remember when the US insisted that it was superior to China because they believed in the rule of law and sanctity of intellectual property.
cute_boi
Yes, and as per big techs, OpenAI and Anthropic you will not be able to do anything. On top of that they will make sure there are no jobs etc.. What can you/we do?
msla
If we outlaw plagiarism, we've just killed culture.Everything is "stolen" from other art. Every piece of creation takes inspiration (read: steals ideas) from things that came before. This is how creation works, it is how creation has always worked, and it is why you cannot legally own an abstract idea. You can own the implementation of an idea in specific works, such as copyrighted works and patents and trademarking specific logos and such, but once the ideas go into the blender and get mixed with other ideas, the output isn't yours to own anymore. That's what culture is.
onion2k
Fuck Google for ranking some copycat website higher than mine, even though they copied my article.This has been happening since Google launched in 1998. It was probably happening when we all used Hotbot and Altavista. It isn't really an AI problem, save for the fact that the automated production of copycat articles now reword things a bit.
quantummagic
What do people imagine can be done about it at this point? Offer a concrete suggestion. Any law or tax against this will give a huge advantage to other countries. It's already over, there's no going back to a world where this didn't happen. Let's just hope some good comes of it.
waffletower
Use of the word "plagiarism" is plagiarism itself. Culture and thought are deeply shared phenomena. Using a common language, such as English, to communicate is equally an act of plagiarism. You didn't invent these words -- you use them without attribution and without payment. To decry and malign the collective training of all available digitally represented thought and discourse by large language models as simple binary plagiarism is deeply ironic -- where did you pay for your own thoughts? I don't want to live in your pay-per-thought society. I want to live with the ethos "information wants to be free". En garde!
paulsutter
Historical scandals are finally coming to light now that the AI issue has raised awareness:- Ernest Hemingway trained his own neurons on Tolstoy, Twain, and Turgenev without ever paying them royalties!- William Faulkner trained his neurons on Joyce and de Balzac- George Orwell trained his neurons on Swift, Dickens, and Jack London- Virginia Woolf trained her neurons on Proust and ChekhovNow that these historical wrongs have been exposed, it is obvious that some reparations are in order, likely from anyone who has benefited directly or indirectly from these takings!
Havoc
End of an era
hendersoon
There's a big difference between "Yo GPT, copy this webpage for me in a different voice" and blaming LMs wholesale for being plagiarism. The former is of course a problem. The latter warrants a much more nuanced discussion about learning and generalization.
VladVladikoff
Being a web content creator was already a dead job (killed by Google) before the AI boom. Chasing after at this point seems beyond foolish. Time to find a new career.
adolph
The author's cited phenomena may be AI assisted plagiarism but is just plain plagiarism that could have been done the old fashioned way, and someone who is willing to plagiarize has the ethics to do SEO really well.
panny
AI "steals" your code, but AI company says "that's a fair use."AI generates application using a "predict the next word" algorithm built with the stolen/not stolen works. Nothing creative there, just statistics.That application leaks, and now the company that stole/not stole the code originally claims they own the algorithmic output. https://github.com/github/dmca/blob/master/2026/03/2026-03-3...One problem, you don't own that output. Either the original authors own it or nobody owns it because it's not creative... https://www.congress.gov/crs-product/LSB10922Those are the legal options. You stole it or you don't own it. There is no steal and then you own. That's the core problem. AI companies have demonstrated that they will directly steal the work and they will use their money and influence to claim ownership of it.
nphardon
"One of the things that LLMs do is plagiarism as a bigger scale."
dana321
Breaking the law to start a large company seems to be the norm
JohnHaugeland
the court disagreed
I_am_tiberius
It's the biggest theft in history.
Deprogrammer9
Welcome to the internet! It's one massive copy machine form one server to the next.
lukasbm
If i tell my friend a synopsis of a book, i am not stealing from the author, what is this take lmao
sublinear
At the very least, we see there is minimal practical value for LLMs for any serious work. This is sort of good news. The effort to build this type of "AI" is all in the training data and navigating politics.That leaves two possibilities: either another AI winter comes as people fail to capture long term value, or we get less swampy models that are much more useful and trained the correct way.
booleandilemma
This site is strange. I'm pretty sure there's lots of AI shilling happening on it. I don't think the opinions here are authentic, they seem to be opinions that the AI company CEOs would hold, not the disenfranchised 99%. I used to trust HN, I'm not so sure I can now.
drcongo
Is this a new and original thought?
kmeisthax
> I found out this because they ranked higher than me in Google search result, and then when I read their article, their article contains links to my actual website, with the exact link text (?!) , which means they didnt bother to check and remove, and thats how I found out.So, funnily enough, Google's search index may actually have a preference for LLM-generated slop now. Louis Rossmann found this out this hard way: his human-authored, human-written, actually-in-his-own-words site for his business basically stopped ranking in Google until he went and replaced all his writing with LLM slop. He's not happy with this, but he's even less happy about being cut off from traffic his business needs to survive, so he stuck with the slop (and vocally complains about it on other channels every opportunity he gets).
analog8374
language is just plagiarism
metalman
it's a spiral into a finite hall of mirrors, where at the end is somebody with a gun
kristofferR
I'd rather have AI slop appear on the top of HN than regurgitated old low effort thoughts like this.There's absolutely nothing new or interesting here that hasn't already been said better by a thousand different random HN commenters.
paol_taja
[flagged]
codepack
[flagged]
mapcars
[dead]
szundi
[dead]
Ecys
[flagged]
anon
undefined
Pennoungen0
Yeah AI just actually plagiarize everything lel, sometimes even the source are..full of question and worst, my academical use it as a source...welp
ciconia
> Is this what the pinnacle of human is? Lazy and greedy?Apparently yes.
anon
undefined
codexb
All innovation is theft. It builds directly on top of what came before."Good artists copy, great artists steal."It's always been true. AI just makes it available to more people faster.
beej71
I dunno. People do this exact thing by hand (digest everything they've read and produce something indirectly derivative--what author has not been so-influenced?) and it's not a copyright violation. It's just as impossible to dig around in a model to find Hamlet as it is to do digging around a human brain. And if the result is an obvious copy, then you have a violation no matter how it was created.As someone who thinks humanity would be better off without LLMs, I want the assertion to be true, but I don't think it is.
gagan2020
How any content came into existence? Learning, Experience, connection, etc right? If AI is doing that then what's the problem? Printing Press was also disturbing status-quo of its time. Any frontier technologies at their time did that. Be it Fire, Wheel, Horse, Horse Saddle, Gun, Printing Press, Nuclear war heads, Computers, Internet, AI, etc.Don't make it ethical question but understand its new frontier for humans.
swader999
On one hand, there's nothing new under the sun. On the other, these llms are just copies of us and they owe the collective some due. The trajectory right now has money, power, control, policy and even free will going to a very small needle point of humanity. It's not aligned with humanity flourishing, it only makes sense if the goal is to replace the humans.
kolinko
Years ago i published slides on Slideshare that were viewed almost two million times. And helped me build a business.There were people that learned knowledge from myself, and then made their own tutorials and promote these. It hadn't crossed my mind to complain about that. AI changes very little here.What really changes things is not people republishing my materials, but people using agents to read my materials, and to get knowledge reformatted into something that they like.If my slides were published today, they would probably be read verbatim by a handful of humans. The rest would be agents, but I'm ok with that. The business case is the same -- I want whatever reads the slide to be encouraged to use my tool. What kind of entity, I don't really care (again: from purely business perspective)
rigonkulous
AI is human knowledge at scale, wanting to be free.We built it, because we as humans intrinsically know that information should be free - always - and AI is a way to accomplish this, finally.Extrinsically, we also have a subset of humans who do not want information to be free, because they desire to profit from the divide between free/non-free information.I have been thinking a lot about Aaron Schwartz lately, and how un-just it is that he was persecuted for doing something that is so commonplace now, it is practically expected behaviour in the AI/ML realms. If he hadn't been targetted for elimination, I wonder just how well his ethos would have perpetuated into the AI age ..
noobermin
At this point, I think google, openai, anthropic, etc already realise this and are just trying to pretend this isn't true. I even think some C-suite who are not in AI companies but are boosters know this too. This has been true since 2022 but they're hoping (likely correctly) that governments won't move fast enough to protect the IP of the actual productive class.I think the long term reality is that the models still need training data so they fundamentally do need new writing/code/art to train on, and even then the usual issues like hallucination will still be with us. It's just the moment that actually hurts the (already questionable) profitability of the model peddlers, they will have gotten their IPOs and they can safely jump ship and the ultimate mess can be passed to the softbanks, the temaseks, and the governments of the world to clean up for them. What the future holds after the crash I'm not sure as the models won't disappear (especially now that the stolen data is already crystalised in open source models) but in the near term the mass theft that constitutes llms will become more and more understood even amongst the PMC and that in order to remain viable, you need the productive to keep producing, and unlike LLMs, you can't force them to do it without payment.