The gap between open weights LLMs and closed source LLMs

<- Back

The gap between open weights LLMs and closed source LLMs

kkm

Comments (88)

profsummergig
IMHO, the biggest problem with the future of open weights models is that currently, open weights models are the result of philanthropy by some private org. (e.g. DeepSeek).The spigot can be turned off at any time.Until there's some sort of "community owned hardware", open weights models are always at risk of being discontinued.
cedws
I haven’t seen it discussed anywhere that closed models can essentially cheat benchmarks right? What Anthropic or OpenAI brand as a model doesn’t necessarily have to be just weights, it can be a whole backend system that augments the model itself. With this they can score better benchmarks than an open source model that is weights alone.
christina97
The Chinese models will not overtake the frontier US ones given the current way things are going. The US models derive their lead from incredible efforts to source more and higher quality (mostly synthetic data) via great feats (eg generating with humongous teacher models that could never feasibly serve interactive traffic). The Chinese models advance via heroic efforts to optimize models and great feats to secure more and higher quality training data from the US frontier models.For an (Chinese) open weight model to surpass the (US lab) frontier models, this equation must flip and the Chinese labs must entirely retool from harvesting frontier model data to producing the data systems and efforts to produce novel data; as well as procuring latest generation hardware en masse for this. This does not happen easily. Also training a frontier scale model is actually not such an unimaginable feat: doing all the inference with the teacher models is where the hardware goes.
gehsty
Interesting to consider this inline with recent us export bans, could the US be squandering its lead by giving the open source, largely Chinese labs catch up (in terms of model quality available to masses), will US labs be able to maintain the lead without users being able to use their latest models?
jacobgold
It would be interesting to know how much of a boost the closed models companies are giving the open models.If the closed models stop improving will the progress of open models slow?
tzs
I wonder if a lot of the companies and governments that seem to think it is essential to be on the forefront of applying leading edge LLMs to the point of starting to become dependent on them are going to find themselves in a situation like that from the Arthur C. Clarke short story "Superiority"? [1] [2].[1] The story: https://nob.cs.ucdavis.edu/classes/ecs153-2019-04/readings/s...[2] Wikipedia: https://en.wikipedia.org/wiki/Superiority_(short_story)
samat
Article confuses open source models with open weights models.Not the same thing.It’s used right in the articles body, but title is misleading.
dabinat
I believe the open model party will eventually end. Perhaps because companies realize it’s too much of a commercial advantage, countries don’t want to give other countries commercial or military help, or maybe even an outright ban after someone uses an open model to guide them through how to make a bomb.
_pdp_
Frankly it does not matter if there is gap because for most practical use-cases the end user can barely perceive the difference in intelligence.On paper frontier models will be ahead of the curve but I don't think hardly anyone will be able to tell if a piece of work, say a landing page, is created with Fable or GLM and that is the point. The perceptible intelligence will reach a point beyond which it is no longer considered, except for some narrow use-case.
doctoboggan
If the Chinese government is as involved in LLM development strategy as many people claim, wouldn't you expect them to immediately cease releasing open weight models and restrict access as soon as they start producing the frontier models? I am assuming this is what the USG thinks and is why they are trying to cut off the flow to foreign nationals ASAP.LLMs are an undeniably valuable tool, and governments like to control those.
JumpCrisscross
Now let’s look at the economics of buying versus renting. I’ve seen a lot of attention given to hardware capital costs. But a comment the other day got me thinking about power costs, too—at what performance differential do these factors intersect to make on-prem economically competitive with datacenters for businesses?
anon
undefined
jackconsidine
Achilles and the tortoise [0] is usually a fallacy. If the tortoise has a head start, then Achilles will never catch it because in the time it takes Achilles to reach the tortoise's location the tortoise has moved some degree further, ad infinitum. Obviously not real because Achilles will pass the tortoise -- I think a fallacy because the framing creates a fake asymptote (they will both pass the point where they're approaching a tie).In this case it may actually apply though, no? Open models get better from closed model distillation?[0] https://en.wikipedia.org/wiki/Zeno%27s_paradoxes
maxiniol
Am I the only one flagging inconsistencies in the different evaluations on the 18 benchmarks ? Why is sometimes the closed frontier model grok ? And then opus 4.8 ? Compared to GLM 5.2 once or sometimes Kimi 2.6 ?
justindotdev
at first glance, these graphs are confusing
llmslave
The gap is huge and im tired of reading these articles constantly