The M×N problem of tool calling and open-source models

<- Back

The M×N problem of tool calling and open-source models

remilouf

Comments (39)

zbentley
The key part of the article is that token structure interpretation is a training time concern, not just an input/output processing concern (which still leads to plenty of inconsistency and fragmentation on its own!). That means both that training stakeholders at model development shops need to be pretty incorporated into the tool/syntax development process, which leads to friction and slowdowns. It also means that any current improvements/standardizations in the way we do structured LLM I/O will necessarily be adopted on the training side after a months/years lag, given the time it takes to do new-model dev and training.That makes for a pretty thorny mess ... and that's before we get into disincentives for standardization (standardization risks big AI labs' moat/lockin).
evelant
I guess I fail to see why this is such a problem. Yes it would be nice if the wire format were standardized or had a standard schema description, but is writing a parser that handles several formats actually a difficult problem? Modern models could probably whip up a "libToolCallParser" with bindings for all popular languages in an afternoon. Could probably also have an automated workflow for adding any new ones with minimal fuss. An annoyance, yes, but it does not seem like a really "hard" problem. It seems more of a social problem that open source hasn't coalesced around a library that handles it easily yet or am I missing something?
airstrike
One of the most relevant posts about AI on HN this year. It's not hype-y, but it's imperative to discuss.I find it strange that the industry hasn't converged in at least somewhat standardized format, but I guess despite all the progress we're still in the very early days...
all2
I wonder if stuffing tool call formatting into an engram layer (see Deepseek's engram paper) that could be swapped at runtime would be a useful solution here.The idea would be to encode tool calling semantics once on a single layer, and inject as-needed. Harness providers could then give users their bespoke tool calling layer that is injected at model load-time.Dunno, seems like it might work. I think most open source models can have an engram layer injected (some testing would be required to see where the layer best fits).
jonathanhefner
Does anyone know why there hasn’t been more widespread adoption of OpenAI’s Harmony format? Or will it just take another model generation to see adoption?
Witty0Gore
Useful article, I was fighting with GLM's tool calling format just last night. Stripping and sanitization to make it compatible with my UI consistently has been... fun.
hashmap
The native way to skip all that is train a small thingy to map hidden state -> token/thingy you care about once per model family, or just do it once and procrustes over the state from the model you're using to whatever you made the map for.
R00mi
MCP is the wire format between agent and tool, not the format the model itself uses to emit the call. That part (Harmony, JSON, XML-ish) is still model-specific. So the M×N the article describes is really two problems stacked — MCP only solves the lower half.Also in practice Claude Code, Cursor and Codex handle the same MCP tool differently — required params, tool descriptions, response truncation. So MCP gives you the contract but the client UX still leaks.
kleton
Don't inference servers like vllm or sglang just translate these things to openai-compat API shapes?
seamossfet
Great article, but your site background had me trying to clean my laptop screen thinking I splashed coffee on it.
ontouchstart
https://mariozechner.at/posts/2025-11-30-pi-coding-agent/#to...
Nevermark
Feedback: I don't usually comment on formatting, but that fat indent is too much. I applied "hide distracting items" to the graphic, and the indent is still there. Reader works.
ikidd
This sounds like a problem that LLMs were built to solve.
0xnadr
This is a real problem. The function calling format fragmentation across models makes it painful to build anything provider-agnostic.
jiehong
Am I misunderstanding, or isn't this supposed to be the point of MCP?
jeremie_strand
[dead]
agent-kay
[dead]