Kimi K2.6: Advancing Open-Source Coding

<- Back

Kimi K2.6: Advancing Open-Source Coding

meetpateltech

Comments (125)

simonw
Accessed via OpenRouter, this one decided to wrap the SVG pelican in HTML with controls for the animation speed: https://gisthost.github.io/?ecaad98efe0f747e27bc0e0ebc669e94...Transcript and HTML here: https://gist.github.com/simonw/ecaad98efe0f747e27bc0e0ebc669...
game_the0ry
There is some humor in the fact that china (of all countries) is pioneering possibly the world's most important tech via open source, while we (US) are doing the exact opposite.
XCSme
In my tests[0] it does only slightly better than Kimi K2.5.Kimi K2.6 seems to struggle most with puzzle/domain-specific and trick-style exactness tasks, where it shows frequent instruction misses and wrong-answer failures.It is probably a great coding model, but a bit less intelligent overall than SOTAs[0]: https://aibenchy.com/compare/moonshotai-kimi-k2-6-medium/moo...
elfbargpt
I've always been surprised Kimi doesn't get more attention than it does. It's always stood out to me in terms of creativity, quality... has been my favorite model for awhile (but I'm far from an authority)
candl
Are there any coding plans for this? (aka no token limit, just api call limit). Recently my account failed to be billed for GLM on z.ai and my subscription expired because of this... the pricing for GLM went through the roof in recent months, though...
nickandbro
Wow, if the benchmarks checkout with the vibes, this could almost be like a Deepseek moment with Chinese AI now being neck and neck with SOTA US lab made models
kburman
Has anyone here used Kimi for actual work?I tried it once, although it looks amazing on benchmarks, my experience was just okay-ish.On the other hand, Qwen 3.6 is really good. It’s still not close to Opus, but it’s easily on par with Sonnet.
jauntywundrkind
I really wish some of these very-long-horizon runs were themselves open sourced (open released open access). Have the harness setup to do git committing automatically of the transcript and code, offload the git commit message making. Release it all.This sounds so so so cool. It would be so amazing to see this unfurl:> Kimi K2.6 successfully downloaded and deployed the Qwen3.5-0.8B model locally on a Mac. By implementing and optimizing model inference in Zig—a highly niche programming language—it demonstrated exceptional out-of-distribution generalization. Across 4,000+ tool calls, over 12 hours of continuous execution, and 14 iterations, Kimi K2.6 dramatically improved throughput from ~15 to ~193 tokens/sec, ultimately achieving speeds ~20% faster than LM Studio.
dygd
> Agent Swarms, Elevated: Match 100 Jobs and Generate 100 Tailored ResumesModel seems quite capable, but this use-case is just yikes. As if interviewing isn't already a hellscape.
lbreakjai
I have a subscription through work, I've been trialing it, so far it looks on par, if not better, than opus.
m4rkuskk
I have been testing it in my app all morning, and the results line up with 4.6 Sonnet. This is just a "vibe" feeling with no real testing. I'm glad we have some real competition to the "frontier" models.
anon
undefined
dmix
I'm pretty Kimi is what Cursor uses for their "composer 2" model. Works pretty good as a fallback when Claude runs out, but definitely a downgrade.
mariopt
Really excited to try this one, I've been using kimi 2.5 for design and it's really good but borderline useless on backend/advanced tasks.Also discovered that using OpenCode instead of the kimi cli, really hurts the model performance (2.5).
pt9567
wow - $0.95 input/$4 output. If its anywhere near opus 4.6 that's incredible.
irthomasthomas
Beats opus 4.6! They missed claiming the frontier by a few days.
Banditoz
If the benchmarks are private, how do we reproduce the results? I looked up the Humanity's Last Exam (https://agi.safe.ai/) this model uses and I can't seem to access it.
antirez
Here I analyze the same linenoise PR with Kimi K2.6, Opus, GPT. https://www.youtube.com/watch?v=pJ11diFOjqoUnfortunately the generation of the English audio track is work in progress and takes a few hours, but the subtitles can already be translated from Italian to English.TLDR: It works well for the use case I tested it against. Will do more testing in the future.
verdverm
https://huggingface.co/moonshotai/Kimi-K2.6Is this the same model?Unsloth quants: https://huggingface.co/unsloth/Kimi-K2.6-GGUF(work in progress, no gguf files yet, header message saying as much)
swingboy
Exciting benchmarks if true. What kind of hardware do they typically run these benchmarks on? Apologies if my terminology is off, but I assume they're using an unquantized version that wouldn't run on even the beefiest MacBook?
cassianoleal
If only their API wasn't tied to a Google or phone login...
anon
undefined
nisegami
The choice of example task for Long-Horizon Coding is a bit spooky if you squint, since it's nearing the territory of LLMs improving themselves.
greenavocado
I pray the benchmark figures are true so I can stop paying Anthropic after screwing me over this quarter by dumbing down their models, making usage quotas ridiculously small, and demanding KYC paperwork.
esafak
K2.5 was already pretty decent so I would try this. Starting at $15/month: https://www.kimi.com/membership/pricingedit: Note that you can run it yourself with sufficient resources (e.g., companies), or access it from other providers too: https://openrouter.ai/moonshotai/kimi-k2.6/providers
cmrdporcupine
Running it through opencode to their API and... it definitely seems like it's "overthinking" -- watching the thought process, it's been going for pages and pages and pages diagnosing and "thinking" things through... without doing anything. Sitting at 50k+ output tokens used now just going in thought circles, complete analysis paralysis.Might be a configuration or prompt issue. I guess I'll wait and see, but I can't get use out of this now.
oliver236
isnt this better than qwen?
XCSme
A bit weird to be comparing it to Opus-4.5 when 4.7 was released...EDIT: Wrong comment: they compared it with 4.6, my comment was for the Qwen-3.6 Max release blog post...