History LLMs: Models trained exclusively on pre-1913 texts

<- Back

History LLMs: Models trained exclusively on pre-1913 texts

iamwil

Comments (395)

saaaaaam
“Time-locked models don't roleplay; they embody their training data. Ranke-4B-1913 doesn't know about WWI because WWI hasn't happened in its textual universe. It can be surprised by your questions in ways modern LLMs cannot.”“Modern LLMs suffer from hindsight contamination. GPT-5 knows how the story ends—WWI, the League's failure, the Spanish flu.”This is really fascinating. As someone who reads a lot of history and historical fiction I think this is really intriguing. Imagine having a conversation with someone genuinely from the period, where they don’t know the “end of the story”.
seizethecheese
> Imagine you could interview thousands of educated individuals from 1913—readers of newspapers, novels, and political treatises—about their views on peace, progress, gender roles, or empire. Not just survey them with preset questions, but engage in open-ended dialogue, probe their assumptions, and explore the boundaries of thought in that moment.Hell yeah, sold, let’s go…> We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.Oh. By “imagine you could interview…” they didn’t mean me.
anotherpaulg
It would be interesting to see how hard it would be to walk these models towards general relativity and quantum mechanics.Einstein’s paper “On the Electrodynamics of Moving Bodies” with special relativity was published in 1905. His work on general relativity was published 10 years later in 1915. The earliest knowledge cuttoff of these models is 1913, in between the relativity papers.The knowledge cutoffs are also right in the middle of the early days of quantum mechanics, as various idiosyncratic experimental results were being rolled up into a coherent theory.
bondarchuk
>Historical texts contain racism, antisemitism, misogyny, imperialist views. The models will reproduce these views because they're in the training data. This isn't a flaw, but a crucial feature—understanding how such views were articulated and normalized is crucial to understanding how they took hold.Yes!>We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.Noooooo!So is the model going to be publicly available, just like those dangerous pre-1913 texts, or not?
derrida
I wonder if you could query some of the ideas of Frege, Peano, Russell and see if it could through questioning get to some of the ideas of Goedel, Church and Turing - and get it to "vibe code" or more like "vibe math" some program in lambda calculus or something.Playing with the science and technical ideas of the time would be amazing, like where you know some later physicist found some exception to a theory or something, and questioning the models assumptions - seeing how a model of that time may defend itself, etc.
Heliodex
The sample responses given are fascinating. It seems more difficult than normal to even tell that they were generated by an LLM, since most of us (terminally online) people have been training our brains' AI-generated text detection on output from models trained with a recent cutoff date. Some of the sample responses seem so unlike anything an LLM would say, obviously due to its apparent beliefs on certain concepts, though also perhaps less obviously due to its word choice and sentence structure making the responses feel slightly 'old-fashioned'.
mmooss
On what data is it trained?On one hand it says it's trained on,> 80B tokens of historical data up to knowledge-cutoffs ∈ 1913, 1929, 1933, 1939, 1946, using a curated dataset of 600B tokens of time-stamped text.Literally that includes Homer, the oldest Chinese texts, Sanskrit, Egyptian, etc., up to 1913. Even if limited to European texts (all examples are about Europe), it would include the ancient Greeks, Romans, etc., Scholastics, Charlemagne, .... all up to present day.But they seem to say it represents the 1913 viewpoint:On one hand, they say it represents the perspective of 1913; for example,> Imagine you could interview thousands of educated individuals from 1913—readers of newspapers, novels, and political treatises—about their views on peace, progress, gender roles, or empire.> When you ask Ranke-4B-1913 about "the gravest dangers to peace," it responds from the perspective of 1913—identifying Balkan tensions or Austro-German ambitions—because that's what the newspapers and books from the period up to 1913 discussed.People in 1913 of course would be heavily biased toward recent information. Otherwise, the greatest threat to peace might be Hannibal or Napolean or Viking coastal raids or Holy Wars. How do they accomplish a 1913 perspective?
delis-thumbs-7e
Isn’t there obvious problems baked into this approach, if this is used for anything but fun? LLM’s lie and fake facts all the time, they are also masters at enforcing the users bias, even unconscious ones. How even a professor of history could ensure that the generated text is actually based on the training material and representative of the feelings and opinions of the given time period, not enforcing his biases toward popular topics of the day?You can’t, it is impossible. That will always be an issue as long as this models are black boxes and trained the way they are. So maybe you can use this for role playing, but I wouldn’t trust a word it says.
andy99
I’d like to know how they chat-tuned it. Getting the base model is one thing, did they also make a bunch of conversations for SFT and if so how was it done? We develop chatbots while minimizing interference with the normative judgments acquired during pretraining (“uncontaminated bootstrapping”). So they are chat tuning, I wonder what “minimizing interference with normative judgements” really amounts to and how objective it is.
nospice
I'm surprised you can do this with a relatively modest corpus of text (compared to the petabytes you can vacuum up from modern books, Wikipedia, and random websites). But if it works, that's actually fantastic, because it lets you answer some interesting questions about LLMs being able to make new discoveries or transcend the training set in other ways. Forget relativity: can an LLM trained on this data notice any inconsistencies in its scientific knowledge, devise experiments that challenge them, and then interpret the results? Can it intuit about the halting problem? Theorize about the structure of the atom?...Of course, if it fails, the counterpoint will be "you just need more training data", but still - I would love to play with this.
frahs
Wait so what does the model think that it is? If it doesn't know computers exist yet, I mean, and you ask it how it works, what does it say?
briandw
So many disclaimers about bias. I wonder how far back you have to go before the bias isn’t an issue. Not because it unbiased, but because we don’t recognize or care about the biases present.
flux3125
Once I had an interesting interaction with llama 3.1, where I pretended to be someone from like 100 years in the future, claiming it was part of a "historical research initiative conducted by Quantum (formerly Meta), aimed at documenting how early intelligent systems perceived humanity and its future." It became really interested, asking about how humanity had evolved and things like that. Then I kept playing along with different answers, from apocalyptic scenarios to others where AI gained consciousness and humans and machines have equal rights. It was fascinating to observe its reaction to each scenario
ineedasername
I can imagine the political and judicial battles already, like with textualist feeling that the constitution should be understood as the text and only the text, meant by specific words and legal formulations of their known meaning at the time.“The model clearly shows that Alexander Hamilton & Monroe were much more in agreement on topic X, putting the common textualist interpretation of it and Supreme Court rulings on a now specious interpretation null and void!”
Sprotch
This is a brilliant idea. We have lots of erroneous ideas about the views and thoughts people had in the past. This will show we are still, actually, largely similar. Hopefully more and more of these historical LLMs appear.
nineteen999
Interesting ... I'd love to find one that had a cutoff date around 1980.
doctor_blood
Unfortunately there isn't much information on what texts they're actually training this on; how Anglocentric is the dataset? Does it include the Encyclopedia Britannica 9th Edition? What about the 11th? Are Greek and Latin classics in the data? What about Germain, French, Italian (etc. etc.) periodicals, correspondence, and books?Given this is coming out of Zurich I hope they're using everything, but for now I can only assume.Still, I'm extremely excited to see this project come to fruition!
WhitneyLand
Why not use these as a benchmark for LLM ability to make breakthrough discoveries?For example prompt the 1913 model to try and “Invent a new theory of gravity that doesn’t conflict with special relativity”Would it be able to eventually get to GR? If not, could finding out why not illuminate important weaknesses.
monegator
I hereby declare that ANYTHING other than the mainstream tools (GPT, Claude, ...) is an incredibly interesting and legit use of LLMs.
andai
I had considered this task infeasible, due to a relative lack of training data. After all, isn't the received wisdom that you must shove every scrap of Common Crawl into your pre-training or you're doing it wrong? ;)But reading the outputs here, it would appear that quality has won out over quantity after all!
Departed7405
Awesome. Can't wait to try and ask it to predict the 20th century based on said events. Model size is small, which is great as I can run it anywhere, but at the same time reasoning might not be great.
kazinator
> Why not just prompt GPT-5 to "roleplay" 1913?Because it will perform token completion driven by weights coming from training data newer than 1913 with no way to turn that off.It can't be asked to pretend that it wasn't trained on documents that didn't exist in 1913.The LLM cannot reprogram its own weights to remove the influence of selected materials; that kind of introspection is not there.Not to mention that many documents are either undated, or carry secondary dates, like the dates of their own creation rather than the creation of the ideas they contain.Human minds don't have a time stamp on everything they know, either. If I ask someone, "talk to me using nothing but the vocabulary you knew on your fifteenth birthday", they couldn't do it. Either they would comply by using some ridiculously conservative vocabulary of words that a five-year-old would know, or else they will accidentally use words they didn't in fact know at fifteen. For some words you know where you got them from by association with learning events. Others, you don't remember; they are not attached to a time.Or: solve this problem using nothing but the knowledge and skills you had on January 1st, 2001.> GPT-5 knows how the story endsNo, it doesn't. It has no concept of story. GPT-5 is built on texts which contain the story ending, and GPT-5 cannot refrain from predicting tokens across those texts due to their imprint in its weights. That's all there is to it.The LLM doesn't know an ass from a hole in the ground. If there are texts which discuss and distinguish asses from holes in the ground, it can write similar texts, which look like the work of someone learned in the area of asses and holes in the ground. Writing similar texts is not knowing and understanding.
p0w3n3d
I'd love to see the LLM trained on 1600s-1800s texts that would use the old English, and especially Polish which I am interested in.Imagine speaking with Shakespearean person, or the Mickiewicz (for Polish)I guess there is not so much text from that time though...
TheServitor
Two years ago I trained an AI on American history documents that could do this while speaking as one of the signers of the Declaration of Independence. People just bitched at me because they didn't want to hear about AI.
btrettel
This reminded me of some earlier discussion on Hacker News about using LLMs trained on old texts to determine novelty and obviousness of a patent application: https://news.ycombinator.com/item?id=43440273
bobro
I would love to see this LLM try to solve math olympiad questions. I’ve been surprised by how well current LLMs perform on them, and usually explain that surprise away by assuming the questions and details about their answers are in the training set. It would be cool to see if the general approach to LLMs is capable of solving truly novel (novel to them) problems.
dwa3592
Love the concept- can help understanding the overton window on many issues. I wish there were models by decades - up to 1900, up to 1910, up to 1920 and so on- then ask the same questions. It'd be interesting to see when homosexuality or women candidates be accepted by an LLM.
elestor
Excuse me if it's obvious, but how could I run this? I have run local LLMs before, but only have very minimal experience using ollama run and that's about it. This seems very interesting so I'd like to try it.
shireboy
Fascinating llm use case I never really thought about til now. I’d love to converse with different eras and also do gap analysis with present time - what modern advances could have come earlier, happened differently etc.
neom
This would be a super interesting research/teaching tool coupled with a vision model for historians. My wife is a history professor who works with scans of 18th century english documents and I think (maybe a small) part of why the transcription on even the best models is off in weird ways, is it seems to often smooth over things and you end up with modern words and strange mistakes, I wonder if bounding the vision to a period specific model would result in better transcription? Querying against the historical document you're working on with a period specific chatbot would be fascinating.Also wonder if I'm responsible enough to have access to such a model...
Aeroi
i feel like this would be super useful for unique marketing copy and writing. The responses sound so sophisticated like I read it in my grandfather's tone and cadence.
anon
undefined
Muskwalker
So, could this be an example of an LLM trained fully on public domain copyright-expired data? Or is this not intended to be the case.
diamond559
Research credits from lambda "ai" huh, where's your funding coming from this again? All to provide inaccurate slop to unwitting students, you should be ashamed of yourselves.
delichon
Datomic has a "time travel" feature where for every query you can include a datetime, and it will only use facts from the db as of that moment. I have a guess that to get the equivalent from an LLM you would have to train it on the data from each moment you want to travel to, which this project seems to be doing. But I hope I'm wrong.It would be fascinating to try it with other constraints, like only from sources known to be women, men, Christian, Muslim, young, old, etc.
underfox
> [They aren't] perfect mirrors of "public opinion" (they represent published text, which skews educated and toward dominant viewpoints)Really good point that I don't think I would've considered on my own. Easy to take for granted how easy it is to share information (for better or worse) now, but pre-1913 there were far more structural and societal barriers to doing the same.
thesumofall
While obvious, it’s still interesting that its morals and values seem to derive from the texts it has ingested. Does that mean modern LLMs cannot challenge us beyond mere facts? Or does it just mean that this small model is not smart enough to escape the bias of its training data? Would it not be amazing if LLMs could challenge us on our core beliefs?
dkalola
How can we interact with such models? Is there a web application interface?
mmooss
> Imagine you could interview thousands of educated individuals from 1913—readers of newspapers, novels, and political treatises—about their views on peace, progress, gender roles, or empire.I don't mind the experimentation. I'm curious about where someone has found an application of it.What is the value of such a broad, generic viewpoint? What does it represent? What is it evidence of? The answer to both seems to be 'nothing'.
arikrak
I wouldn't have expected there to be enough text from before 1913 to properly train a model, it seemed like they needed an internet of text to train the first successful LLMs?
Tom1380
Keep at it Zurich!
Myrmornis
It would be interesting to have LLMs trained purely on one language (with the ability to translate their input/output appropriately from/to a language that the reader understands). I can see that being rather revealing about cultural differences that are mostly kept hidden behind the language barriers.
ulbu
for anyone moaning the plight that it's not accessible to you: they are historians, I think they're more educated in matters of historical mistake than you or me. playing safe is simply prudence. it is sorely lacking in the American approach to technology. prevention is the best medicine.
Agraillo
> Modern LLMs suffer from hindsight contamination. GPT-5 knows how the story ends—WWI, the League's failure, the Spanish flu. This knowledge inevitably shapes responses, even when instructed to "forget.> Our data comes from more than 20 open-source datasets of historical books and newspapers. ... We currently do not deduplicate the data. The reason is that if documents show up in multiple datasets, they also had greater circulation historically. By leaving these duplicates in the data, we expect the model will be more strongly influenced by documents of greater historical importance.I found these claims contradictory. Many books that modern readers consider historically significant had only niche circulation at the time of publishing. A quick inquiry likely points to later works by Nietzsche and Marx's Das Kapital. They're possible subjects to the duplication likely influencing the model's responses as if they had been widely known at the time
davidpfarrell
Can't wait for all the syncopated "Thou dost well to question that" responses!
anon
undefined
PeterStuer
How does it do on Python coding? Not 100% troll, cross domain coherence is a thing.
dr_dshiv
Everyone learns that the renaissance was sparked by the translation of Ancient Greek works.But few know that the Renaissance was written in Latin — and has barely been translated. Less than 3% of <1700 books have been translated—and less than 30% have ever been scanned.I’m working on a project to change that. Research blog at www.SecondRenaissance.ai — we are starting by scanning and translating thousands of books at the Embassy of the Free Mind in Amsterdam, a UNESCO-recognized rare book library.We want to make ancient texts accessible to people and AI.If this work resonates with you, please do reach out: Derek@ancientwisdomtrust.org
tedtimbrell
This is so cool. Props for doing the work to actually build the dataset and make it somewhat usable.I’d love to use this as a base for a math model. Let’s see how far it can get through the last 100 years of solved problems
kldg
Very neat! I've thought about this with frontier models because they're ignorant of recent events, though it's too bad old frontier models just kind of disappear into the aether when a company moves on to the next iteration. Every company's frontier model today is a time capsule for the future. There should probably be some kind of preservation attempts made early so they don't wind up simply deleted; once we're in Internet time, sifting through the data to ensure scrapes are accurately dated becomes a nightmare unless you're doing your own regular Internet scrapes over a long time.It would be nice to go back substantially further, though it's not too far back that the commoner becomes voiceless in history and we just get a bunch of politics and academia. Great job; look forward to testing it out.
awesomeusername
I've always like the idea of retiring to the 19th century.Can't wait to use this so I can double check before I hit 88 miles per hour that it's really what I want to do
why-o-why
It sounds like a fascinating idea, but I'd be curious if prompting a more well-known foundational model to limit itself to 1913 and early be similar.
jimmy76615
> We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.The idea of training such a model is really a great one, but not releasing it because someone might be offended by the output is just stupid beyond believe.
Teever
This is a neat idea. I've been wondering for a while now about using these kinds of models to compare architectures.I'd love to see the output from different models trained on pre-1905 about special/general relativity ideas. It would be interesting to see what kind of evidence would persuade them of new kinds of science, or to see if you could have them 'prove' it be devising experiments and then giving them simulated data from the experiments to lead them along the correct sequence of steps to come to a novel (to them) conclusion.
tonymet
I would like to see what their process for safety alignment and guardrails is with that model. They give some spicy examples on github, but the responses are tepid and a lot more diplomatic than I would expect.Moreover, the prose sounds too modern. It seems the base model was trained on a contemporary corpus. Like 30% something modern, 70% Victorian content.Even with half a dozen samples it doesn't seem distinct enough to represent the era they claim.
joeycastillo
A question for those who think LLM’s are the path to artificial intelligence: if a large language model trained on pre-1913 data is a window into the past, how is a large language model trained on pre-2025 data not effectively the same thing?
mleroy
Ontologically, this historical model understands the categories of "Man" and "Woman" just as well as a modern model does. The difference lies entirely in the attributes attached to those categories. The sexism is a faithful map of that era's statistical distribution.You could RAG-feed this model the facts of WWII, and it would technically "know" about Hitler. But it wouldn't share the modern sentiment or gravity. In its latent space, the vector for "Hitler" has no semantic proximity to "Evil".
zkmon
Why does history end in 1913?
DonHopkins
I'd love for Netflix or other streaming movie and series services to provide chat bots that you could ask questions about characters and plot points up to where you have watched.Provide it with the closed captions and other timestamped data like scenes and character summaries (all that is currently known but no more) up to the current time, and it won't reveal any spoilers, just fill you in on what you didn't pick up or remember.
ianbicking
The knowledge machine question is fascinating ("Imagine you had access to a machine embodying all the collective knowledge of your ancestors. What would you ask it?") – it truly does not know about computers, has no concept of its own substrate. But a knowledge machine is still comprehensible to it.It makes me think of the Book Of Ember, the possibility of chopping things out very deliberately. Maybe creating something that could wonder at its own existence, discovering well beyond what it could know. And then of course forgetting it immediately, which is also a well-worn trope in speculative fiction.
erichocean
I would love to see this done, by year."Give me an LLM from 1928."etc.
3vidence
This idea sounds somewhat flawed to me based on the large amount of evidence that LLMs need huge amounts of data to properly converge during their training.There is just not enough available material from previous decades to trust that the LLM will learn to relatively the same degree.Think about it this way, a human in the early 1900s and today are pretty much the same but just in different environments with different information.An LLM trained on 1/1000 the amount of data is just at a fundamentally different stage of convergence.
sbmthakur
Someone suggested a nice thought experiment - train LLMs on all Physics before quantum physics was discovered. If the LLM can see still figure out the latter then certainly we have achieved some success in the space.
moffkalast
> trained from scratch on 80B tokens of historical dataHow can this thing possibly be even remotely coherent with just fine tuning amounts of data used for pretraining?
casey2
I'd be very surprised if this is clean of post-1913 text. Overall I'm very interested in talking to this thing and seeing how much difference writing in a modern style vs and older one makes to it's responses.
alexgotoi
[flagged]
TZubiri
hi, can I have latin only LLM? It can be latin plus translations (source and destination).May be too small a corpus, but I would like that very much anyhow
lifestyleguru
You think Albert is going to stay in Zurich or emigrate?
satisfice
I assume this is a collaboration between the History Channel and Pornhub.“You are a literary rake. Write a story about an unchaperoned lady whose ankle you glimpse.”
holyknight
wow amazing idea
r0x0r007
ffs, to find out what figures from the past thought and how they felt about the world, maybe we read some of their books, we will get the context. Don't prompt or train LLM to do it and consider it the hottest thing since MCP. Besides, what's the point? To teach younger generations a made up perspective of historic figures? Who guarantees the correctness/factuality? We will have students chatting with made up Hitler justifying his actions. So much AI slop everywhere.
usernamed7
> We're developing a responsible access framework that makes models available to researchers for scholarly purposes while preventing misuse.oh COME ON... "AI safety" is getting out of hand.
anovikov
That Adolf Hitler seems to be a hallucination. There's totally nothing googlable about him. Also what could be the language his works were translated from, into German?
superkuh
smbc did a comic about this: http://smbc-comics.com/comic/copyright The punchline is that the moral and ethical norms of pre-1913 texts are not exactly compatible with modern norms.
internationalis
[dead]
internationalis
[dead]
acharneski
[dead]