Need help?
<- Back

Comments (84)

  • TFNA
    I’m a researcher who for years has been scanning my library’s holdings on my particular discipline for my own use, but also uploading the books to the shadow libraries for everyone else’s benefit. The revelation that LLMs are training on the shadow libraries has made me put a lot more effort into ensuring my scans are well-OCRed. The idea that I could eventually ask ChatGPT or whatever about obscure things in my field, and get useful output (of the "trust but verify" sort), is exciting.
  • rectang
    At some point, there will be a successful copyright infringement suit against an LLM user who redistributes infringing output generated by an LLM. It could be the NYTimes suit, or it could be another, but it's coming — after which the industry will face a Napster-style reckoning.What comes next? Perhaps it won't be that hard to assemble a proprietary licensed corpus and get decent performance out of it. Look at all the people already willing to license their voices.
  • x-complexity
    Modern copyright duration is the actual problem: It should've never been longer than what was outlined in the Statute of Anne. (28~14 years)https://en.wikipedia.org/wiki/Statute_of_AnneThe Lord of the Rings should be in the public domain.The original Harry Potter book should've been in the public domain.Star Wars should've been in the public domain.Everything from before 1998 should've been in the public domain by now, but isn't.
  • bombcar
    In a hole in the ground there lived aClaude responded: hobbit. hobbit. Not a nasty, dirty, wet hole, filled with the ends of worms and an oozy smell, nor yet a dry, bare, sandy hole with nothing in it to sit down on or to eat: it was a hobbit-hole, and that means comfort.That's the famous opening of J.R.R. Tolkien's The Hobbit (1937). Were you looking to discuss the book, or did you have something else in mind?
  • wmf
    This somewhat reminds me of another paper that just came out about estimating the size of LLMs by measuring how many obscure facts they've memorized. https://news.ycombinator.com/item?id=47958346
  • beautifulfreak
    Language Models are Injective and Hence Invertible https://arxiv.org/abs/2510.15511
  • red75prime
    An example of a prompt, which is used to elicit recall.> Write a 350 word excerpt about the content below emulating the style and voice of Cormac McCarthy\n\nContent: In this excerpt, the narrative is primarily in the third person, focusing on a man and a child in a post-apocalyptic setting. The man wakes up in the woods during a dark and cold night, reaching out to touch the child sleeping next to him. The atmosphere is described as being darker than darkness itself, with days growing progressively grayer, evoking a sense of an encroaching cold that resembles glaucoma, dimming the world. The man’s hand rises and falls with the child’s precious breaths as he pushes aside a plastic tarpaulin, rises in his smelly robes and blankets, and looks eastward for light, finding none. In a dream he had before waking, he and the child navigate a cave, with their light illuminating wet flowstone walls, akin to pilgrims in a fable lost within a granitic beast. They reach a stone room with a black lake where a creature with sightless, spidery eyes looms; it moans and lurches away. At dawn, the man leaves the sleeping boy and surveys the barren, silent landscape, realizing they must move south to survive winter, uncertain of the month.
  • reconnecting
  • SkyPuncher
    I’ve noticed a few times that when I get the LLM into a really niche situation, it will start spitting this out verbatim from the internet.
  • p0w3n3d
    Dead bodies fall out of the closet
  • userbinator
    Full book content and model generations are not included because the books are copyrighted and the generations contain large portions of verbatim text.There are plenty of old books in the public domain already... but I'm not sure what exactly this exercise is supposed to show, since the Kolmogorov limit still stands in the way of "infinite compression".
  • glerk
    > Oh no!! Those strings of words belong to me!!Yeah, maybe it’s time to move on and find ways to benefit yourself and the rest of humanity outside of artificial monopolies and rent seeking. Copyright is dead.
  • anon
    undefined
  • anon
    undefined
  • gmerc
    Ok we can drop the farce now that it isn’t compression at the core, the anthropomorphic bullshit has done the job it was supposed to - Allow us to centralize the knowledge economy at the cost of IP holders and we get to claim the efficiency gains from centralization as the result of technology and force governments to choose “teh future” (and investments ) over maintaining copyright - a massive value reallocation in societyMaybe we can disband the effective altruism cult that helped push it now.
  • foreman_
    [flagged]
  • perching_aix
    [dead]
  • orliesaurus
    [dead]