DeepSeek V4–almost on the frontier, a fraction of the price

<- Back

$DeepSeek V4–almost on the frontier, a fraction of the price$

DeepSeek V4–almost on the frontier, a fraction of the price

indigodaddy

Comments (229)

cedws
The biggest differentiator for me: DeepSeek just does what I ask. I've tried using both GPT and Claude for reverse engineering recently, both refused. I even got a warning on my OpenAI account.
0xkvyb
It might be at the frontier, but DeepSeek is really struggling with compute. The amount of 429 Rate Limit responses I've been getting just testing this thing made me pause all my attempts at cross-comparing it to others.I'm gonna stick to GLM5.1 for now.
wg0
Deepseek v4 Pro feels like Claude Opus 4.6 in it's personality but here's what I did find out about costs:I did cut loose Deepseek v4 on a decent sized Typescript codebase and asked it to only focus on a single endpoint and go in depth on it layer by layer (API, DTOs, service, database models) and form a complete picture of types involved and introduced and ensure no adhoc types are being introduced.It developed a very brief but very to the point summary of types being introduced and which of them were refunded etc.Then I asked it to simplify it all.It obviously went through lots of files in both prompts but total cost? Just $0.09 for the Pro version.On Claude Opus I think (from past experience before price hikes) these two prompts alone would have burned somewhere between $9 to $13 easily with not much benefit.Note - I didn't use Open router rather used the Deepseek API directly because Open router itself was being rate limited by Deep seek.
gyoridavid
I've connected it with my vscode copilot and took it for a ride. I've tried both flash and pro. For a small POC flash was sufficient enough, quite fast, and dirt cheap. It did stop a few times (maybe latency issue?) but it did a good job. I used the pro to do some heavy lifting, planning, etc. and it did a fantastic job. I paid ~10 cents for a small proof of concept, that worked exactly how I prompted it.For me, this is a real alternative after I cancel my github copilot towards the end of the month..
antirez
Related: live demo of DeepSeek v4 Flash running on my 128GB MacBook. Italian language with English subs.https://www.youtube.com/watch?v=todMmp6AGCE
cheshire_cat
While the cost are lower than frontier models there are two factors that make DS4 Pro and K2.6 not as cheap as they might look.For DS4 Pro there's a discount going on for the official API, which sometimes gets overlooked and mixed up in discussions. Simon uses the full price in the comparison, so that's not an issue here.The other issue is that DS4 Pro and K2.6 often use way more reasoning tokens than the frontier models. In my testing there are certain pathological cases where a request can cost the same as with a frontier model because they use so much more tokens. To be fair I'm using DS and kimi via 3rd party providers, so they might have issues with their setups.But if you look at the Artificial Analysis pages of the models you'll see that DSv4 Pro uses 190M tokens and K2.6 170M tokens for their intelligence benchmark, while GPT 5.5 (high) only used 45M.[0][1][2]I recommend looking at the "Intelligence vs. Cost to Run Artificial Analysis Intelligence Index" ("Intelligence vs Cost" in the UI). The open source models are still cheaper to run, but not by as much as you'd think just looking at the token prices.[0] https://artificialanalysis.ai/models/deepseek-v4-pro [1] https://artificialanalysis.ai/models/kimi-k2-6 [2] https://artificialanalysis.ai/models/gpt-5-5-high
naaqq
DeepSeek’s official API has a cache hit rate of over 99% if you use it continuously within the same codebase for long sessions, so it’s much cheaper than frontier models. I have an example of 200M token session in claude code.
curioussquirrel
V4 is definitely a step-up from V3.2 on our multilingual benchmarks.Two caveats: - when inferring through Openrouter, we've had a lot of issues with very slow speeds (TPS) and an occasional instability. I just checked and it's still 10-30 TPS on all available providers, which is not a lot for a model that likes to think as much as DeepSeek does.- the official DeepSeek API makes no guarantees of data privacy even for paying users.Both points could be moot with using it through Azure AI foundry (the latter is, afaik); I have yet to test that.In any case, happy to see more open-weights models that are somewhat competitive with SOTA models!
deaux
I'm surprised that people here don't care at all about these models openly training on your data, especially if you use them straight from the model developer. Whereas things like "GitHub now automatically opts everyone into using their code for model training" get hundreds of justifiably angry comments, I never see this brought up anymore on posts like these talking about using Chinese models through OpenRouter. This might be explained by "well they're different people", but the difference is very stark for that to be the whole explanation.
Havoc
This gives me hope that when the subsidization circus ends and everyone is on pure usage then it won't be entirely exclusionary to mere mortals who don't have $200pm budgets.
jdasdf
I've been using v4 pro for the past few days and honestly in terms of quality it seems more or less on par with open AIs 5.4 or opus 4.6 (i havent tried 4.7)To be clear, i'm not doing state of the art stuff. I mostly used it for frontend development since i'm not great at that and just need a decent looking prototype.But for my purposes it's a perfectly good model, and the price is decent.I can't wait for open model small enough for me to run locally come out though. I hate having to rely on someone elses machines (and getting all my data exfiltrated that way)
mohsen1
In my experience V4 is pretty good but for very hard problems it burns way too many tokens that it ends up being not so cheap anymore. I'm working on a compiler and the tasks are very involved. Tests won't pass unless it gets it absolutely right. 5.5 can achieve more in less time compared to V4 for me.
gertlabs
DeepSeek V4 Flash is the most cost effective model we've tested.We had to really understand why it outperformed DeepSeek V4 Pro (although even on unreliable model cards, Flash was very close to Pro). Pro is slower and smarter in one-shot reasoning problems, but less effective with tools and therefore less performant in long horizon agentic tasks (especially with custom tools it was not trained on).Benchmarks at https://gertlabs.com/rankings
ghm2180
I've been using the planning framework from Matt Pocock on very typical brownfield code. I use a harness over claude code, this is so cheap that I would be tempted to mirror my initial prompt to it and compare their responses to the task.
crakhamster01
I realize this post is about the pelican test, but in regards to coding, has anyone tried out the advisor strategy with V4?[0]e.g. Have V4 call out to Opus when it's uncertain, but otherwise handle execution.The results with Sonnet/Haiku in the blog post seemed promising, so I'm curious how it would go with these latest open models.[0] https://claude.com/blog/the-advisor-strategy
holysantamaria
From the pricing page of deepseek:(3) The deepseek-v4-pro model is currently offered at a 75% discount, extended until 2026/05/31 15:59 UTC.Was this taken into account when reviewing the model?
wolttam
DS V4 Pro has rocked. ~250 million tokens through their API, which has cost me about $10, and some of that was at the non-discount rate. So ~$40 at the non-discount rate. I have yet to have a single request feel slow or get rejected.I've used K2.6, GLM5.1, and DSV4 all a good amount. They're all very impressive, but DSV4 has taken the cake.
KronisLV
I'm currently paying for Anthropic's Max subscription (the 100 USD one) and I quite often hit or approach the 5 hour limits, but usually get to around 60-80% of the weekly limits before they reset (Opus 4.7 with high thinking for everything, unless CC decides to spawn sub-agents with Haiku or something).Those tokens are heavily subsidized, but DeepSeek's API pricing is looking really good. For example, with an agentic coding setup (roughly 85% input, 15% output and around 90% cache reads) I'd get around 150M tokens per month for the same 100 USD. Even at more output tokens and worse cache performance, it'd still most likely be upwards of 100M.
alasano
I tweeted about some implementation and review runs that used V4 Pro.Even without the currently discounted pricing, the value is incredible.It takes about twice as long to finish code reviews given an identical context compared to opus 4.7/gpt 5.5 but at 1/10 the cost of less, there's just no comparison.https://twitter.com/aljosa/status/2049176528638902555
taffydavid
I tried deepseek v4 through open code at the weekend. I'm a daily Claude/Claude code user.I tried to build something simple and while it got the job done the thinking displayed did not fill me with confidence. It was pages and pages of "actually no", "hang on", "wait that makes no sense". It was like the model was having a breakdown.Bear in mind open code was also new to me so I could be just seeing thinking where I usually don't
myaccountonhn
I recently switched from Claude to Opencode Go + pi.dev. It has Deepseek v4 pro along with Kimi K2.6, and it's performing quite well for basic coding, without hitting any limits.
bilsbie
Dumb question? Why does pro make a worse pelican than flash?
teruakohatu
The pelican is really getting old as an a standalone evaluation metric. By now they are certainly going to be in training set if not explicitly tuned to produce it for the press on HN alone.Keep the pelican but isn’t it time to add something else more novel that all current and past models struggle with?
rsanek
I'm not sure I'd call it "almost on the frontier," but I do think that v4 Pro is the most usable coding model I've seen out of China. I've used it via Ollama Cloud (coding) and OpenRouter (data processing). Feels Sonnet-level to me -- solid at implementation when given a specification, but falls a good bit short of Opus 4.7 max thinking when planning out larger changes or when given open-ended prompts.
piker
Jensen has a point. I believe these were trained and run on Huawei chips. The Nvidia embargo may backfire on American leadership as necessity gives way to invention.
fagnerbrack
I use in readplace.. oh boy it's SOO good and cheap for summaries!!
edg5000
Has anybody used V4 hard, for the most challenging tasks (agentically, locally)? It's so hard to compare without putting serious time in it. Like spending a year daily with the model.
anon
undefined
qekagn
There are so many login-free models now that most people will not even try DeepSeek if the access requires a login.
tomchui157
Wanna see ppl fine-tuning it
chaosprint
I doubt if those models already knew this pelican test...
sylware
If I want to run 'coding prompts' running the biggest deepseek model on CPU, what is the order of time I will have wait, hours, days?
alex1138
Does it censor mentions of what happened in Tiananmen Square in 1989?
shawryadev
[flagged]
alexmercerdev
[dead]
ajaystream
[flagged]
ai_terk_er_jerb
[dead]
Tarcroi
[dead]
raincole
The V3/R1 time and now are in such contrast. V3/R1 were hyped hard and barely usable for coding. V4 is much less hyped but (anecdotally) it has completely demolished all the Flash/Lite/Spark models.
trilogic
https://www.reddit.com/r/Hugston/comments/1t1mk0j/comparison...
tomjuggler
So I'm involved in an open source AI cli coding assistant called Cecli (cecli.dev) which is specifically designed to work well with DeepSeek.DeepSeek is a great model, and Cecli is all about efficiency. It works great for my purposes - agentic programming on a budget.
grassfedgeek
The credit for DeepSeek, in part, goes to US companies such as OpenAI [1] and DeepSeek [2]. Portions of DeepSeek are based on their products.[1] https://www.reuters.com/world/china/openai-accuses-deepseek-...[2] https://x.com/AnthropicAI/status/2025997928242811253