Multi-Agentic Software Development Is a Distributed Systems Problem

<- Back

Multi-Agentic Software Development Is a Distributed Systems Problem

tie-in

Comments (50)

enoonge
Interesting read!Agreed on the main claim that multi-agentic software development is a distributed systems problem, however I think the distributed consensus point is not the tightest current bottleneck in practice.The article mentions the Partial Synchronous model (DLS) but doesn't develop it, and that's the usual escape hatch against FLP. In practical agentic workflows it's already showing up as looped improvement cycles bounded by diminishing returns. Each iteration is effectively a round, and each agent's output in that round is a potentially faulty proposal the next round refines. Painful in cost yes, but manageable. If models continue to improve at current rates, I think it's reasonable to assume the number of cycles will decrease.The more interesting objection is that "agent misinterprets prompt" isn't really byzantine. The 3f+1 bound assumes independent faults, but LLM agents share weights, training data, and priors. When a prompt is ambiguous they don't drift in random directions, they drift the same way together. That isn't majority vote failing loudly, it's consensus succeeding on a shared bias, which is arguably worse.
mrothroc
I've been running a multi-agent software development pipeline for a while now and I've reached the same conclusion: it's a distributed systems problem.My approach has been more pragmatic than theoretical: I break work into sequential stages (plan, design, code) with verification gates. Each gate has deterministic checks (compile, lint, etc) and an agentic reviewer for qualitative assessment.Collectively, this looks like a distributed system. The artifacts reflect the shared state.The author's point about external validation converting misinterpretations into detectable failures is exactly what I've found empirically. You can't make the agent reliable on its own, but you can make the protocol reliable by checking at every boundary.The deterministic gates provide a hard floor of guarantees. The agentic gates provide soft probabilistic assertions.I wrote up the data and the framework I use: https://michael.roth.rocks/research/trust-topology/
mccoyb
There is not a single mention of probability in this post.The post acts like agents are a highly complex but well-specified deterministic function. Perhaps, under certain temperature limits, this is approximately true ... but that's a serious restriction and glossed over.For instance, perhaps the most striking constraint about FLP is that it is about deterministic consensus ... the post glazes over this:> establishes a fundamental impossibility result dictating consensus in any asynchronous distributed system (yes! that includes us).No, not any asynchronous distributed system, that might not include us. For instance, Ben-Or (1983, https://dl.acm.org/doi/10.1145/800221.806707) (as a counterexample to the adversary in FLP) essentially says "if you're stuck, flip a coin". There's significant work studying randomized consensus (yes, multi-agents are randomized consensus algorithms): https://www.sciencedirect.com/science/article/abs/pii/S01966...Now, in Ben-Or, the coins have to be independent sources of randomness, and that's obviously not true in the multi-agent case.But it's very clear that the language in this post seems to be arguing that these results apply without understanding possibly the most fundamental fact of agents: they are probability distributions -- inherently, they are stochastic creatures.Difficult to take seriously without a more rigorous justification.
xer
The fundamental assumptions of distributed systems is having multiple machines that fail independently, communicate over unreliable networks and have no shared clock has the consequence of needing to solve consensus, byzantine faults, ordering, consistency vs. availability and exactly-once delivery.However, AI agents don't share these problems in the classical sense. Building agents is about context attention, relevance, and information density inside a single ordered buffer. The distributed part is creating an orchestrator that manages these things. At noetive.io we currently work on the context relevance part with our contextual broker Semantik.
jimmypk
The partial synchrony escape hatch mentioned in the Further Reading section is already partially implemented in workflow engines like Temporal – bounded activity timeouts map directly onto the "upper bound on message delays" that Dwork/Lynch/Stockmeyer use to make consensus tractable in an otherwise FLP-unbounded system. This narrows the gap considerably at the infrastructure level.What isn't solved there is semantic idempotency. Even if a failed agent activity retries correctly at the infrastructure layer, the LLM re-invocation produces a different output. This is why the point about tests converting byzantine failures into crash failures is load-bearing: without external validation gates between activities, you've pushed retry logic onto Temporal but left the byzantine inconsistency problem unsolved. The practical implication is that the value of the test suite in an agentic pipeline scales superlinearly, not just as correctness assurance but as the mechanism that collapses the harder byzantine failure model back into the weaker FLP one.
siliconc0w
My workflow uses a thorough design broken down into very specific tasks, agent mail, and a swarm of agents in a ralph loop to burn down tasks. Agents collaborate with mail pretty well and don't seem to need layers of supervision. If the tasks are well specified and your design is thought through, especially how to ensure the agents can self-validate - it seems to work pretty well.I wrote an article on this if you're interested: https://x.com/siliconcow/status/2035373293893718117
falcor84
The thing that TFA doesn't seem to go into is that these mathematical results apply to human agents in exactly the same way as they do to AI agents, and nevertheless we have massive codebases like Linux. If people can figure out how to do it, then there's no math that can help you prove that AIs can't.
SamLeBarbare
Conway’s law still applies.Good architecture, actor models, and collaboration patterns do not emerge magically from “more agents”.Maybe what’s missing is the architect’s role.
zackham
The most durable way to reason about agents is to just think about humans. We have thousands of years of prior art on coordinating instantiations of stochastic intelligence. Context, tools, goals, validation, specialization, distribution of labor, coordination... If jobs are bundles of tasks and areas of accountability, maybe it's more effective right now to unbundle and reorganize some of these things. If constraints underperform autonomy, maybe you have to adjust where you operate on that spectrum, and account for it in goal definition and validation. These are not new problems.
21asdffdsa12
To be honest humans often have no overview over a application either. We navigate up and down the namespace, building the "overview" as we go. Nothing i see what prevents an agent from moving up and down that namespace, writing his assumptions into the codebase and requesting feedback from other agents working on different sets of the file.
jbergqvist
Doesn't this whole argument fall apart if we consider iteration over time? Sure, the initial implementation might be uncoordinated, but once the subagents have implemented it, what stops the main agent from reviewing the code and sorting out any inconsistencies, ultimately arriving at a solution faster than it could if it wrote it by itself?
lifeisstillgood
It’s not a solution but it’s why humans have developed the obvious approach of “build one thing, then everyone can see that one thing and agree what needs to happen next” (ie the space of P solutions is reduced by creating one thing and then the next set of choices is reduced by the original Choice.This might be obvious to everyone but it’s a nice way to me to view it (sort of restating the non-waterfall (agile?) approach to specification discovery)Ie waterfall design without coding is too under specified, hence the agile waterfall of using code iteratively to find an exact specification
wnbhr
I run a small team of AI agents building a product together. One agent acts as supervisor — reviews PRs, resolves conflicts, keeps everyone aligned. It works at this scale (3-4 agents) because the supervisor can hold the full context. But I can already see the bottleneck — the supervisor becomes the single point of coordination, same as a human tech lead. The distributed systems framing makes sense. What I'm not sure about is whether the answer is a new formal language, or just better tooling around the patterns human teams already use (code review, specs, tests).
josefrichter
This might be useful in the context of this topic https://jido.run
anon
undefined
yangshi07
After you point this out, it is obviously right!
vedant_awasthi
Makes sense. Coordination between multiple agents feels like the real challenge rather than just building them.
threethirtytwo
> smarter agents alone can't escape it.This is not true. In theory if the agent is smart enough it out thinks your ideas and builds the solution around itself so that it can escape.
pyinstallwoes
I think Erlang/Elixir is ripe for this.
cooloo
Well it starts with agree list. I don't agree next gen models will be smarter. I would argue no real improvement in models in last couple of years just improvement in stability and tools (agentic ones) around it.
pavas
No shit sherlock.
tuo-lei
[dead]
vampiregrey
[dead]
maxothex
[dead]
R00mi
[dead]
agent-kay
[dead]
enesz
[dead]
socketcluster
I tried my hand at coding with multiple agents at the same time recently. I had to add related logic to 4 different repos. Basically an action would traverse all of them, one by one, carrying some data. I decided to implement the change in all of them at the same time with 4 Claude Code instances and it worked the first time.It's crazy how good coding agents have become. Sometimes I barely even need to read the code because it's so reliable and I've developed a kind of sense for when I can trust it.It boggles my mind how accurate it is when you give it the full necessary context. It's more accurate than any living being could possibly be. It's like it's pulling the optimal code directly from the fabric of the universe.It's kind of scary to think that there might be AI as capable as this applied to things besides next token prediction... Such AI could probably exert an extreme degree of control over society and over individual minds.I understand why people think we live in a simulation. It feels like the capability is there.