Parallel coding agents with tmux and Markdown specs

<- Back

Parallel coding agents with tmux and Markdown specs

schipperai

Comments (107)

gas9S9zw3P9c
I'd love to see what is being achieved by these massive parallel agent approaches. If it's so much more productive, where is all the great software that's being built with it? What is the OP building?Most of what I'm seeing is AI influencers promoting their shovels.
synergy20
I use claude-code. claude-code now spins up many agents on its own, sometimes switch models to save costs, and can easily use 200+ tools concurrently, and use multiple skills at the same time when needed, its automation gets smarter and more parallel by the day, do we still need to outwit what's probably already done by claude-code? I still use tmux but no longer for multiple agents, but for me to poke around at will, I let the plan/code/review/whatever fully managed and parallelized by claude-code itself, it's massively impressive.
funerr
I really like cmux for this (https://github.com/manaflow-ai/cmux)
jlongo78
The key insight with tmux agent parallelism is giving each pane a dedicated markdown spec file rather than sharing context. Agents stay focused and you avoid prompt contamination across sessions. Name your windows after the spec, not the task, so you can resume cold sessions without re-reading logs. Also worth adding a status pane that tails a shared cost log, otherwise parallel runs burn budget fast without you noticing.
v_CodeSentinal
The deny list section hit home. I keep seeing agents use unlink instead of rm, or spawn a python subprocess to delete files. Every new rule just taught the agent a new workaround.Ended up flipping the model — instead of blocking bad actions, require proof of safety before any action runs. No proof, no action. Much harder to route around.Curious if you've tried anything similar.
ramoz
I did a sort of bell curve with this type of workflow over summer.- Base Claude Code (released)- Extensive, self-orchestrated, local specs & documentation; ie waterfall for many features/longer term project goals (summer)- Base Claude Code (today)Claude Code is getting better at orchestrating it's own subagents for divide/conquer type work.My problem with these extensive self-orchestrated multi-agent / spec modes is the type of drift and rot of all the changes and then integrated parts of an application that a lot of the time end up in merge conflicts. Aside from my own decision cognitive space, it's also a lot to just generally orchestrate and review. I spent a ton of type enforcing Claude to use the system I put in place including documentation updates and continuous logging of work.I feel extremely productive with a single Claude Code for a project. Maybe for minor features, I'll launch Claude Code in the web so that it can operate in an isolated space to knock them out and create a PR.I will plan and annotate extensively for large features, but not many features or broad project specs all at the same time. Annotation and better planning UX, I think, are going to be increasingly important for now. The only augment of Claude Code I have is a hook for plan mode review: https://github.com/backnotprop/plannotator
logicprog
For major, in depth refactors and large scale architectural work, it's really important to keep the agents on-track, to prevent them from assuming or misunderstanding important things, or whatever — I can't imagine what it'd be like doing parallel agents. I don't see how that's useful. And I'm a massive fan of agentic coding!It's like OpenClaw for me — I love the idea of agentic computer use; but I just don't see how something so unsupervised and unsupervisable is remotely a useful or good idea.
jasonjmcghee
I certainly don't run 6 at a time, but even with just 1 - if it's doing anything visual - how are folks hooking up screenshots to self verify? And how do you keep an eye on it?The only solution I've seen on a Mac is doing it on a separate monitor.I couldn't find a solution here and have built similar things in the past so I took a crack at it using CGVirtualDisplay.Ended up adding a lot of productivity features and polished until it felt good.Curious if there are similar solutions out there I just haven't seen.https://github.com/jasonjmcghee/orcv
CloakHQ
We ran something similar for a browser automation project - multiple agents working on different modules in parallel with shared markdown specs. The bottleneck wasn't the agents, it was keeping their context from drifting. Each tmux pane has its own session state, so you end up with agents that "know" different versions of reality by the second hour.The spec file helps, but we found we also needed a short shared "ground truth" file the agents could read before taking any action - basically a live snapshot of what's actually done vs what the spec says. Without it, two agents would sometimes solve the same problem in incompatible ways.Has anyone found a clean way to sync context across parallel sessions without just dumping everything into one massive file?
servercobra
This is a really cool design, pretty similar to what I've built for implementation planning. I like how iterative it is and that the whole system lives just in markdown. The verify step is a great idea I hadn't made a command yet, thank you!This seems like it'd be great for solo projects but starts to fall apart for a team with a lot more PRs and distributed state. Heck, I run almost everything in a worktree, so even there the state is distributed. Maybe moving some of the state/plans/etc to Linear et al solves that though.
aceelric
I’ve been experimenting with a similar pattern but wrapping it in a “factory mode” abstraction (we’re building this at CAS[1]) where you define the spec once after careful planning using a supervisor agent then you let it go and spin up parallel workers against it automatically. It handles task decomposition + orchestration so you’re not manually juggling tmux panes[1] https://cas.dev
nferraz
I liked the way how you bootstrap the agent from a single markdown file.
sluongng
Yeah the 8 agents limit aligns well with my conversations with folks in the leading labshttps://open.substack.com/pub/sluongng/p/stages-of-coding-ag...I think we need much different toolings to go beyond 1 human - 10 agents ratio. And much much different tooling to achieve a higher ratio than that
hinkley
These setups pretty much require the top tier subscription, right?
kledru
I think you should have a reviewer as well.
zwilderrr
I just can’t get over the fact that your Anglicized name sounds like manual shipper.
philipp-gayret
Is there a place where people like you go to share ideas around these new ways of working, other than HN? I'm very curious how these new ways of working will develop. In my system, I use voice memo's to capture thoughts and they become more or less what you have as feature designs. I notice I have a lot of ideas throughout the day (Claude chews through them some time later, and when they are worked out I review its plans in Notion; I use Notion because I can upload memos into it from my phone so it's more or less what you call the index). But ideas.. I can only capture them as they come, otherwise they are lost & I don't want to spend time typing them out.
aplomb1026
[dead]
aplomb1026
[dead]
mrorigo
[dead]