Well, this is certainly not benchmaxxed, I&#x27;ll give it that.  And props for being honest about how far behind Qwen 3.6 MoE is this model.<p>But yeah, it&#x27;s not the best look to have to stretch and say it&#x27;s &quot;competitive&quot; with other models in it&#x27;s weight class, when it offers not much else that&#x27;s useful or novel.

Are these models trained from scratch or do they necessarily need distillation from bigger models to be competitive? It&#x27;s usually the case that they&#x27;re a small model for a family with a bigger model. In the first case, does anybody know what&#x27;s the economy of training this 30B-A3B model vs. training a DeepSeek V4 Pro or Flash size of models (1.6T, 200 something B, less activated)?

&gt; Hardware (minimum): 1× H100 @ FP8<p>Cool to see this but seems like it would be pretty expensive to run

I was a fan of coheres general purpose LLM. Command A I think? Before they came out with their reasoning model.<p>More competition is better.

strange, I already submitted the same url 6 days ago:<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=48475095">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=48475095</a>

Wasn&#x27;t aware that Cohere was still around but this release doesn&#x27;t exactly instill confidence.

looks like it&#x27;s just qwen 3.6 coder.

&gt;Our plan to being profitable is to give mediocre stuff for free

HN

Cohere's First Model for Developers

Comments (33)