Need help?
<- Back

Comments (50)

  • beshrkayali
    > long contexts are still expensive and can also introduce additional noise (if there is a lot of irrelevant info)I think spec-driven generation is the antithesis of chat-style coding for this reason. With tools like Claude Code, you are the one tracking what was already built, what interfaces exist, and why something was generated a certain way.I built Ossature[1] around the opposite model. You write specs describing behavior, it audits them for gaps and contradictions before any code is written, then produces a build plan toml where each task declares exactly which spec sections and upstream files it needs. The LLM never sees more than that, and there is no accumulated conversation history to drift from. Every prompt and response is saved to disk, so traceability is built in rather than something you reconstruct by scrolling back through a chat. I used it over the last couple of days to build a CHIP-8 emulator entirely from specs[2]. I have some more example projects on GitHub[3]1: https://github.com/ossature/ossature2: https://github.com/beshrkayali/chomp83: https://github.com/ossature/ossature-examples
  • IceWreck
    > This is speculative, but I suspect that if we dropped one of the latest, most capable open-weight LLMs, such as GLM-5, into a similar harness, it could likely perform on par with GPT-5.4 in Codex or Claude Opus 4.6 in Claude Code.People have been doing that for over a year already? GLM officially recommends plugging into Claude Code https://docs.z.ai/devpack/tool/claude and any model can be plugged into Codex CLI (it's open source and can be set via config file).
  • Yokohiii
    The example is really lean and straightforward. I don't use coding agents, but this is some good overview and should help everyone to understand that coding agents may have sophisticated outcomes, but the raw interaction isn't magical at all.It's also a good example that you can turn any useful code component that requires 1k LOC into a mess of 500k LOC.
  • armcat
    I still find it incredible at the power that was unleashed by surrounding an LLM with a simple state machine, and giving it access to bash
  • zbyforgotpass
    Isn't there a better word than harness? I understand the metaphor of leading and constraining a raw power - but I don't like it.
  • MrScruff
    > This is speculative, but I suspect that if we dropped one of the latest, most capable open-weight LLMs, such as GLM-5, into a similar harness, it could likely perform on par with GPT-5.4 in Codex or Claude Opus 4.6 in Claude Code.Unless I'm misunderstanding what's being described here, running Claude Code with different backend models is pretty common.https://docs.z.ai/scenario-example/develop-tools/claudeIt doesn't perform on par with Anthropic's models in my experience.
  • crustycoder
    A timely link - I've just spent the last week failing to get a ChatGPT Skill to produce a reproducible management reporting workflow. I've figured out why and this article pretty much confirms my conclusions about the strengths & weaknesses of "pure" LLMS, and how to work around them. This article is for a slightly different problem domain, but the general problems and architecture needed to address them seem very similar.
  • jeremie_strand
    [dead]
  • redoh
    [dead]
  • ryguz
    [dead]
  • Adam_cipher
    [flagged]
  • aplomb1026
    [dead]
  • nareyko
    [dead]