EuroLLM: LLM made in Europe built to support all 24 official EU languages

<- Back

EuroLLM: LLM made in Europe built to support all 24 official EU languages

NotInOurNames

Comments (486)

adzm
For those curious, the 24 official languages are Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, and Swedish.Maltese, interestingly, is the only Afro-Asiatic derived language.Hungarian, Finnish, and Estonian are the three Uralic languages.All the others are Indo-European, Greek being the only Hellenic one, Irish the only Celtic, the rest are Baltic, Slavic, Italic, or Germanic.(I originally used the term Balto-Slavic, though I was unaware of some of the connotations of that term until just now. Baltic and Slavic do share a common origin, but that was a very very long time ago)
whimsicalism
Actually nuts to me the degree to which European policymakers do not even begin to understand how to kickstart technologically-intensive industry. Anyone who has seen close-up the results of a "pick the winners" grant-style approach to innovation knows what will go wrong here.Also funny to read this narrative of how access to the European 'supercomputer' cluster is going. https://x.com/levelsio/status/1981485945745788969
Stagnant
Title is missing "(2024)". The 9B model was released last december[0].0: https://sites.google.com/view/eurollm/home
fcanesin
Nice, congrats. But that O looks like an ass.
loandbehold
Aren't all frontier models already able to use all these languages? Support for specific languages doesn't need to be built in, LLMs support all languages because they are trained on multilingual data.
htrp
>The EuroLLM Team brings together some of the brightest minds in AI including Unbabel, Instituto Tecnico Lisbon, the University of Edinburgh, Instituto de Telecommunicacoes, Université Paris-Saclay, Aveni, Sorbonne University, Naver Labs, and the University of Amsterdam.>Europe is the only continent in the world to have a large public network of supercomputers that are managed by the EuroHPC Joint Undertaking (EuroHPC JU). As soon as we received the EuroHPC JU access to the supercomputer, we were ready to roll up our sleeves and get to work. We developed the small model right away and in less than 6 months the second model was ready.[1] https://www.eurohpc-ju.europa.eu/eurohpc-success-story-speak...Repurposing some of that physics sim compute
adt
The EuroLLM-9B model release is from Dec/2024, and scores just above random chance for benchmarks like MMLU-Pro (17.6%, random chance is 10%).Comparison with similar EU models + 600 other highlights:https://lifearchitect.ai/models-table/
zyngaro
No github link, high performance claim with 0 numbers. No technical details. 0 interesting link to learn more. But hey a key people page is there so that's ok.
hebejebelus
Some cursory clicking about didn't reveal to me the actual corpus they used, only that it is several trillion tokens 'divided across the languages'. I'm curious mainly because Irish (among some other similarly endangered languages on the list) typically has any large corpus come from legal/governmental texts that are required to be translated. There must surely be only a relatively tiny amount of colloquial Irish in the corpus. It be interesting to see some evals in each language particularly with native speakers.I think LLMs may be on the whole very positive for endangered languages such as Irish, but before it becomes positive I think there's an amount of danger to be navigated (see Scots Gaelic wikipedia drama for example)In any case I think this is a great initiative.
srameshc
I was thinking the same, why are so many superior models coming from only countries like US and China. And why are European countries not in the list other than France with Mistral. Why are so few companies in India, Japan, South Korea even close to a promising new model like what Chinese companies did ?
extraduder_ire
From the EuroLLM-9B page on hugginface;>You need to agree to share your contact information to access this modelIs this common? I've never seen it on the site before, and it isn't on the smaller model. What are they collecting this information for?
fbergen
EU should focus on making an attractive startup market and more European LLMs (and so many other things) will emerge
sireat
It is interesting how much traction this 9B model is getting which is good.Still two month earlier 19 European language model with 30B parameters got almost no mention:https://huggingface.co/TildeAI/TildeOpen-30bMind you that is another open model that is begging for fine-tuning (it is not very good out of box).
DrNosferatu
1. It's a nice start, but the EU has to scale to Manhattan Project levels in order to properly compete with the US and China.2. A credible scale effort for EU own silicon for AI Compute, wouldn't hurt either.3. And this can only be achieved by vertical integration to combat fragmentation.
PeterStuer
Still not open weights?
sherinjosephroy
That’s a cool idea — training a multilingual model like that is ambitious. But I’m curious how well it’ll actually handle smaller EU languages compared to English or French. If it truly nails those, that’s a big win for accessibility.
supermatt
> It is fully open source and available via Hugging Face.This model was released in 2024, and I couldn't find any links to the training data - is it just an open weights model?
sorenjan
If I want to use an LLM to do translation, should I use a base model or an instruction tuned version? I've had mixed results using the chat models and a simple "Translate this to <language>: "
KronisLV
Here's the models: https://huggingface.co/utter-project/modelsI used the 9B Instruct version, from the small models, it was the one with the best Latvian knowledge out there, bar none. GPT-OSS 20B and Qwen3 30B A3B and similar ones weren't even close.That said, the model itself was a little bit dumb and not something you'd really use for programming/autocomplete or tool calling or anything like that, which also presented some problems - even for processing text, if you need RAG or tool server calls, you need to use something like Qwen3 for the actual logic and then pass the contents to EuroLLM for translation/formatting with the instructions, at which point your n8n workflow looks a bit messy and also you have to run those two models instead of only one.Meanwhile, the best cloud model for Latvian that I've found so far was Google Gemini 2.5 Pro, but obviously can't use cloud models in certain on-prem use cases.
nonethewiser
How does this work?It seems like it, in most ways, it would be bad to train on 24 separate languages. That's just 24 partitions to the data. Seems really inefficient and better to simply train in the biggest (english) and translate.I do think this will introduce some biases that correlate with the English language. It would be interesting to see more specifically what this means. But regardless, I don't think you can produce a competitive model with such a large subdivision of training data.
ks2048
Their home page has link "Technical Report for EuroLLM" but links to the same page as their other link for release article on hugging face.I suppose that's a typo and I found a technical report here: https://arxiv.org/abs/2506.04079
wildredkraut
Wow this site, logo and everything is so ugly. But the FAX styled photos fits well to Europe's deficit.
danielam
Curiously, just came across this paper [0].[0] https://arxiv.org/abs/2503.01996
dostick
What good does it do by having only include formal languages? For example there’s no Russian, while there’s now at least 8 million ethnic Russians living in Europe.
kreetx
I'm somewhat skeptical of taxpayer funded innovation. Seen a few Horizon grants from the side, as a citizen I'd prefer to not pay for them, but unfortunately can't opt out.
trilogic
Great job, Thank you.We support your work and offer backup and distribution. Here a copy just in case: https://hugston.com/uploads/llm_models/EuroLLM-22B-Instruct-...
rmoriz
Maybe we can call it "open weights" and not open source?
fulafel
See also Apertus: https://www.swiss-ai.org/apertus
aurintex
Is it planned to have a VLM or something compareable like Qwen3-VL for the future?
grigio
the benchmarks seems pretty bad, why should somebody use it? just because it's made in EU ?
Steen3S
If multi‑lang is the goal, why not translate the output of the big labs?
Zufriedenheit
EU officials should create an environment where abundant private companies can afford to put out many great open models instead of funding some selected individuals with taxpayer money.
elias_t
Are there any benchmarks that exist for those 24 languages?
tithos
I love the website design
ph4evers
How does it compare to Mistral’s model?
anon
undefined
bogtog
They report benchmarks on the huggingface page (https://huggingface.co/utter-project/EuroLLM-9B)They almost exclusively compare their model to prior models from 2024 or older and brag about "results comparable to Gemma-2-9B". I'm not sure what I expected. The eurollm.io homepage states "EuroLLM outperforms similar-sized models", which just seems like a lie for all practical purposesAn overly charitable interpretation is that EuroLLM isn't a reasoning model and has minimal post-training, so they sought out comparisons to such models (they're still ignoring reasoning models that have non-reasoning modes)
websku
I'm looking to try this for ActorDO
memet_rush
Hopefully Albanian is added one day!
fodkodrasz
Kiváló cél, remélem sikerre viszik!
cess11
In this vein there's also the recent swiss Apertus.https://www.swiss-ai.org/apertus
gyudin
How competitive is it performance wise to other open source models? Considering they took €50 millions in funding.
zoobab
Can we add Gaumais to the list? I ask Llama3 questions on how to translate french to Gaumais, it was pretty good at it.https://fr.wikipedia.org/wiki/Gaumais
jagermo
looks cool, i hope kagi adds it to the assistant.
rob_c
This, I hope, is close to multi-modal in lingual terms. There's potentially a lot to learn from examining where this works/fails :D
seydor
It's just another Horizon2020 grant, people. Don't be overly harsh to a bunch of academics who are just earning their living.
rvz
As expected, Europe finally catches up to 2024 and launches an LLM that barely competes against the heavyweights.The US and China are running rings around Europe.Mistral is an exception as it was funded by US VCs and they are a great example showing that without VC funding, Mistral would have been begging to the EU for a microsopic grant to train a LLM worse than Llama.
johnjames87
I prefer proprietary LLMs that are actually good products - byproducts of free market competition (capitalism), instead of products created from govt initiatives that lead nowhere (good).
geretnal
Finally!
moralestapia
Benchmarks?Edit: Thanks, @Bengalilol.The 1.7B one looks meh.But really solid numbers on the 9B! Props to the team!
nellyspageli
Could you adjust the title from:"all official 24 EU languages” to "all 24 official EU languages"
constantcrying
[flagged]
marsven_422
[dead]
Maksadbek
[flagged]
syahlanahda
[flagged]
mezod
Of course catalan isn't in the list. 10 million speakers that don't matter to the European Union. EU likes our productivity but squanders our rights. We are 2nd class citizens.Now let's wait for the people saying "Spain" could change this. Hypocrites.Cultural genocide at its best.