<- Back
Comments (33)
- montroserWell, this is certainly not benchmaxxed, I'll give it that. And props for being honest about how far behind Qwen 3.6 MoE is this model.But yeah, it's not the best look to have to stretch and say it's "competitive" with other models in it's weight class, when it offers not much else that's useful or novel.
- amunozoAre these models trained from scratch or do they necessarily need distillation from bigger models to be competitive? It's usually the case that they're a small model for a family with a bigger model. In the first case, does anybody know what's the economy of training this 30B-A3B model vs. training a DeepSeek V4 Pro or Flash size of models (1.6T, 200 something B, less activated)?
- matt_daemon> Hardware (minimum): 1× H100 @ FP8Cool to see this but seems like it would be pretty expensive to run
- moojacobI was a fan of coheres general purpose LLM. Command A I think? Before they came out with their reasoning model.More competition is better.
- AbuAssarstrange, I already submitted the same url 6 days ago:https://news.ycombinator.com/item?id=48475095
- tonyriceI'm excited to see more OSS models
- zuzululuWasn't aware that Cohere was still around but this release doesn't exactly instill confidence.
- chattermate[flagged]
- cyanydeezlooks like it's just qwen 3.6 coder.
- moralestapia>Our plan to being profitable is to give mediocre stuff for free