Get Shit Done: A Meta-Prompting, Context Engineering and Spec-Driven Dev System

<- Back

Get Shit Done: A Meta-Prompting, Context Engineering and Spec-Driven Dev System

stefankuehnel

Comments (115)

gtirloni
I was using this and superpowers but eventually, Plan mode became enough and I prefer to steer Claude Code myself. These frameworks are great for fire-and-forget tasks, especially when there is some research involved but they burn 10x more tokens, in my experience. I was always hitting the Max plan limits for no discernable benefit in the outcomes I was getting. But this will vary a lot depending on how people prefer to work.
loveparade
"I am a super productive person that just wants to get shit done"Looked at profile, hasn't done or published anything interesting other than promoting products to "get stuff done"This is like the TODO list book gurus writing about productivity
vinnymac
I tried this for a week and gave up. Required far too much back and forth. Ate too many tokens, and required too much human in the loop.For this reason I don’t think it’s actually a good name. It should be called planning-shit instead. Since that’s seemingly 80%+ of what I did while interacting with this tool. And when it came to getting things done, I didn’t need this at all, and the plans were just alright.
AndyNemmity
I have a ai system i use. I'd like to release it so others can benefit, but at the same time it's all custom to myself and what i do, and work on.If I fork out a version for others that is public, then I have to maintain that variation as well.Is anyone in a similar situation? I think most of the ones I see released are not particularly complex compraed to my system, but at the same time I don't know how to convey how to use my system as someone who just uses it alone.it feels like I don't want anyone to run my system, I just want people to point their ai system to mine and ask it what there is valuable to potentially add to their own system.I don't want to maintain one for people. I don't want to market it as some magic cure. Just show patterns that others can use.
DamienB
I've compared this to superpowers and the classic prd->task generator. And I came away convinced that less is more. At least at the moment. gsd performed well, but took hours instead of minutes. Having a simple explanation of how to create a PRD followed by a slightly more technical task list performed much better. It wasn't that grd or superpowers couldn't find a solution, it's just that they did it much slower and with a lot more help. For me, the lesson was that the workflow has changed, and we that we can't apply old project-dev paradigms to this new/alien technology. There's a new instruction manual and it doesn't build on the old one.
Frannky
I tried it once; it was incredibly verbose, generating an insane amount of files. I stopped using it because I was worried it would not be possible to rapidly, cheaply, and robustly update things as interaction with users generated new requirements.The best way I have today is to start with a project requirements document and then ask for a step-by-step implementation plan, and then go do the thing at each step but only after I greenlight the strategy of the current step. I also specify minimal, modular, and functional stateless code.
maccam912
I've had a good experience with https://github.com/obra/superpowers. At first glance this looks similar. Has anyone tried both who can offer a comparison?
gbrindisi
I like openspec, it lets you tune the workflow to your liking and doesn’t get in the way.I started with all the standard spec flow and as I got more confident and opinionated I simplified it to my liking.I think the point of any spec driven framework is that you want to eventually own the workflow yourself, so that you can constraint code generation on your own terms.
galexyending
I gave it a shot, but won't be using it going forward. It requires a waterfall process. And, I found it difficult, and in some cases impossible, to adjust phases/plans when bugs or changes in features arise. The execution prompts didn't do a good job of steering the code to be verified while coding and relies on the user to manually test at the end of each phase.
recroad
I use openspec and love it. I’m doing 5-7x with close to 100% of code AI generated, and shipping to production multiple times a day. I work on a large sass app with hundreds of customers. Wrote something here:https://zarar.dev/spec-driven-development-from-vibe-coding-t...
melvinroest
If you want some context about spec-driven development and how it could be used with LLMs I recommend [1]. Having some background like helps me to understand tools like this a bit more.[1] https://www.riaanzoetmulder.com/articles/ai-assisted-program...
yoaviram
I've been using GSD extensively over the past 3 months. I previously used speckit, which I found lacking. GSD consistently gets me 95% of the way there on complex tasks. That's amazing. The last 5% is mostly "manual" testing. We've used GSD to build and launch a SaaS product including an agent-first CMS (whiteboar.it).It's hard to say why GSD worked so much better for us than other similar frameworks, because the underlying models also improved considerably during the same period. What is clear is that it's a huge productivity boost over vanilla Claude Code.
jankhg
Apart from GSD and superpowers, there's another system, called PAUL [1]. It apparently requires fewer tokens compared to GSD, as it does not use subagents, but keeps all in one session. A detailed comparison with GSD is part of the repo [2].[1] https://github.com/ChristopherKahler/paul[2] https://github.com/ChristopherKahler/paul/blob/main/PAUL-VS-...
theodorewiles
I think the research / plan / execute idea is good but feels like you would be outsourcing your thinking. Gotta review the plan and spend your own thinking tokens!
dfltr
GSD has a reputation for being a token burner compared to something like Superpowers. Has that changed lately? Always open to revisiting things as they improve.
obsidianbases1
> If you know clearly what you wantThis is the real challenge. The people I know that jump around to new tools have a tough time explaining what they want, and thus how new tool is better than last tool.
arjie
I could not produce useful output from this. It was useful as a rubber duck because it asks good motivating questions during the plan phase, but the actual implementation was lacklustre and not worth the effort. In the end, I just have Claude Opus create plans, and then I have it write them to memory and update it as it goes along and the output is better.
Andrei_dev
250K lines in a month — okay, but what does review actually look like at that volume?I've been poking at security issues in AI-generated repos and it's the same thing: more generation means less review. Not just logic — checking what's in your .env, whether API routes have auth middleware, whether debug endpoints made it to prod.You can move that fast. But "review" means something different now. Humans make human mistakes. AI writes clean-looking code that ships with hardcoded credentials because some template had them and nobody caught it.All these frameworks are racing to generate faster. Nobody's solving the verification side at that speed.
yoavsha1
How come we have all these benchmarks for models, but none whatsoever for harnesses / whatever you'd call this? While I understand assigning "scores" is more nuanced, I'd love to see some website that has a catalog of prompts and outputs as produced with a different configuration of model+harness in a single attepmt
smusamashah
There should be an "Examples" section in projects like this one to show what has actually been made using it. I scrolled to the end and was really expecting an example the way it's being advertised.If it was game engine or new web framework for example there would be demos or example projects linked somewhere.
chrisss395
I'm curious if anyone has used this (or similar) to build a production system?I'm facing increasing pressure from senior executives who think we can avoid the $$$ B2B SaaS by using AI to vibe code a custom solution. I love the idea of experimenting with this but am horrified by the first-ever-case being a production system that is critical to the annual strategic plan. :-/
DIVx0
I’ve tried GSD several times. I actually like the verbosity and it’s a simple chore for Claude to refresh project docs from GSD planning docs.Like most spec driven development tools, GSD works well for greenfield or first few rounds of “compound engineering.” However, like all others, the project gets too big and GSD can’t manage to deliver working code reliably.Agents working GSD plans will start leaving orphans all over, it won’t wire them up properly because verification stages use simple lexical tools to search code for implementation facts. I tried giving GSD some ast aware tools but good luck getting Claude to reliably use them.Ultimately I put GSD back on the shelf and developed my own “property graph” based planner that is closer to Claude “plan mode” but the design SOT is structured properties and not markdown. My system will generate docs from the graph as user docs. Agents only get tasked as my “graph” closes nodes and re-sorts around invariants, then agents are tasked directly.
LoganDark
This seems like something I'd want to try but I am wholly opposed to `npx` being the sole installation mechanism. Let me install it as a plugin in Claude Code. I don't want `npx` to stomp all over my home directory / system configuration for this, or auto-find directories or anything like that.
anon
undefined
MeetingsBrowser
I've tried it, and I'm not convinced I got measurably better results than just prompting claude code directly.It absolutely tore through tokens though. I don't normally hit my session limits, but hit the 5-hour limits in ~30 minutes and my weekly limits by Tuesday with GSD.
jatora
Another heavily overengineered AND underengineered abomination. I'm convinced anyone who advocates for these types of tools would find just as much success just prompting claude code normally and taking a little bit to plan first. Such a waste of time to bother with these tools that solve a problem that never existed in the first place.
dhorthy
it is very hard for me to take seriously any system that is not proven for shipping production code in complex codebases that have been around for a while.I've been down the "don't read the code" path and I can say it leads nowhere good.I am perhaps talking my own book here, but I'd like to see more tools that brag about "shipped N real features to production" or "solved Y problem in large-10-year-old-codebase"I'm not saying that coding agents can't do these things and such tools don't exist, I'm just afraid that counting 100k+ LOC that the author didn't read kind of fuels the "this is all hype-slop" argument rather than helping people discover the ways that coding agents can solve real and valuable problems.
thr0waway001
At the risk of sounding stupid what does the author mean by: “I’m not a 50-person software company. I don’t want to play enterprise theatre.” ?
Relisora
Did anyone compare it with everything-claude-code (ECC)?
canadiantim
I use Oh-My-Opencode (Now called Oh-My-OpenAgent), but it's effectively the same as GSD, but better imo
ibrahim_h
The README recommends --dangerously-skip-permissions as the intended workflow. Looking at gsd-executor.md you can see why — subagents run node gsd-tools.cjs, git checkout -b, eslint, test runners, all generated dynamically by the planner. Approving each one kills autonomous mode.There is a gsd-plan-checker that runs before execution, but it only verifies logical completeness — requirement coverage, dependency graphs, context budget. It never looks at what commands will actually run. So if the planner generates something destructive, the plan-checker won't catch it because that's not what it checks for. The gsd-verifier runs after execution, checking whether the goal was achieved, not whether anything bad happened along the way. In /gsd:autonomous this chains across all remaining phases unattended.The granular permissions fallback in the README only covers safe reads and git ops — but the executor needs way more than that to actually function. Feels like there should be a permission profile scoped to what GSD actually needs without going full skip.
prakashrj
With GSD, I was able to write 250K lines of code in less than a month, without prior knowledge of claude.
seneca
I've tried several of these sorts of things, and I keep coming away with the feeling that they are a lot of ceremony and complication for not much value. I appreciate that people are experimenting with how to work with AI and get actual value, but I think pretty much all of these approaches are adding complexity without much, or often any, gain.That's not a reason to stop trying. This is the iterative process of figuring out what works.
hermanzegerman
For me it was awesome. I needed a custom Pipeline for Preprocessing some Lab Data, including Visualization and Manipulation and it got me exactly what I wanted, as opposed to Codex Plan Mode, which just burned my weekly quota and produced Garbage
greenchair
terrible name, DOA
openclaw01
[flagged]
tkiolp4
The whole gsd/agents folder is hilarious. Like a bunch of MD that never breaks. How do you is it minimally correct? Subjective prose. Sad to see this on the frontpage