LLM=True

<- Back

LLM=True

avh3

Comments (134)

skerit
> Then a brick hits you in the face when it dawns on you that all of our tools are dumping crazy amounts of non-relevant context into stdout thereby polluting your context windows.I've found that letting the agent write its own optimized script for dealing with some things can really help with this. Claude is now forbidden from using `gradlew` directly, and can only use a helper script we made. It clears, recompiles, publishes locally, tests, ... all with a few extra flags. And when a test fails, the stack trace is printed.Before this, Claude had to do A TON of different calls, all messing up the context. And when tests failed, it started to read gradle's generated HTML/XML files, which damaged the context immensely, since they contain a bunch of inline javascript.And I've also been implementing this "LLM=true"-like behaviour in most of my applications. When an LLM is using it, logging is less verbose, it's also deduplicated so it doesn't show the same line a hundred times, ...> He sees something goes wrong, but now he cut off the stacktraces by using tail, so he tries again using a bigger tail. Not satisfied with what he sees HE TRIES AGAIN with a bigger tail, and … you see the problem. It’s like a dog chasing its own tail.I've had the same issue. Claude was running the 5+ minute test suite MULTIPLE TIMES in succession, just with a different `| grep something` tacked at the end. Now, the scripts I made always logs the entire (simplified) output, and just prints the path to the temporary file. This works so much better.
lucumo
> Then a brick hits you in the face when it dawns on you that all of our tools are dumping crazy amounts of non-relevant context into stdout thereby polluting your context windows.Not just context windows. Lots of that crap is completely useless for humans too. It's not a rare occurrence for warnings to be hidden in so much irrelevant output that they're there for years before someone notices.
thrdbndndn
Something related to this article, but not related to AI:As someone who loves coding pet projects but is not a software engineer by profession, I find the paradigm of maintaining all these config files and environment variables exhausting, and there seem to be more and more of them for any non-trivial projects.Not only do I find it hard to remember which is which or to locate any specific setting, their mechanisms often feel mysterious too: I often have to manually test them to see if they actually work or how exactly. This is not the case for actual code, where I can understand the logic just by reading it, since it has a clearer flow.And I just can’t make myself blindly copy other people's config/env files without knowing what each switch is doing. This makes building projects, and especially copying or imitating other people's projects, a frustrating experience.How do you deal with this better, my fellow professionals?
Lerc
I think the concept has value, but I think targeting today's LLMs like this is short sighted.It's making what is likely to be a permanent change to fix a temporary problem.I think the thing that would have value in the long term is an option to be concise, accurate, and unambiguous.This isn't something that should be considered to be only for LLMs. Sometimes humans want readability to understand something quickly adding context helps a great deal here, but sometimes accuracy and unambiguity are paramount (like when doing an audit) if dealing with a batch of similar things, the same repeated context adds nothing and limits how much you can see at once.So there can be a benefit when a human can request output like this for them to read directly. On top of this is the broad range of of output processing tools that we have (some people still awk).So yes, this is needed, but LLMs will probably not need this in a few years. The other uses will remain
moritonal
It feels wild to have to keep reminding people, but AI changes very little. Tools have always had a variety of output, and ways to control this, and bad tools output a lot by default, whilst good tools hide it behind version of "-v" or easy greps. Don't add a --LLM or whatever, do add cleaner and consistent verbosity controls.
caerwy
The UNIX philosophy of tools that handle text streams, staying "quiet" unless something goes wrong, doing one thing well, etc. are all still so well suited to the modern age of AI coding agents.
exitb
Also an acceptable solution - create a "runner" subagent on a cheap model, that's tasked with running a command and relaying the important parts to the main agent.
vidarh
Rather than an LLM=true, this is better handled with standardizing quiet/verbose settings, as this is a question of verbosity, where an LLM is one instance where you usually want it to be quieter, but not always.Secondly, a helper to capture output and cache it, and frankly a tool or just options to the regular shell/bash tools to cache output and allow filtered retrieval of the cached output, as more so than context and tokens the frustration I have with the patterns shown is that often the agent will re-execute time-consuming tasks to retrieve a different set of lines from the output.A lot of the time it might even be best to run the tool with verbose output, but it'd be nice if tools had a more uniform way of giving output that was easier to systematically filter to essentials on first run (while caching the rest).
burkaman
Why can't the agent harness dynamically decide whether outputs should be put into the context or not? It could check with an LLM to determine if the verbatim output seems important, and if not, store the full output locally but replace it in the prompt with a brief summary and unique ID. Then make a tool available so the full output can be retrieved later if necessary. That's roughly how humans do it, you scroll through your terminal and make quick decisions about what parts you can ignore, and then maybe come back later when you realize "oh I should probably read that whole stack trace".It wouldn't even need to send the full output to make a decision, it could just send "npm run build output 500 lines and succeeded, do we need to read the output?" and based on the rest of the conversation the LLM can respond yes or no.
robkop
We’ve got a long way to go in optimising our environments for these models. Our perception of a terminal is much closer to feeding a video into Gemini than reading a textbook of logs. But we don’t make that ax affordance at the moment.I wrote a small game for my dev team to experience what it’s like interacting through these painful interfaces over the summer www.youareanagent.appJump to the agentic coding level or the mcp level to experience true frustration (call it empathy). I also wrote up a lot more thinking here www.robkopel.me/field-notes/ax-agent-experience/
DoctorOetker
So frequently beginners in linux command lines complain about the irregularity or redundance in command line tool conventions (sometimes actual command parameters -h --help or /h ? other times: man vs info; etc...)When the first transformers that did more than poetry or rough translation appeared everybody noticed their flaws, but I observed that a dumb enough (or smart enough to be dangerous?) LLM could be useful in regularizing parameter conventions. I would ask an LLM how to do this or that, and it would "helpfully" generate non-functional command invocations that otherwise appeared very 'conformant' to the point that sometimes my opinion was that -even though the invocation was wrong given the current calling convention for a specific tool- it would actually improve the tool if it accepted that human-machine ABI or calling convention.Now let us take the example of man vs info, I am not proposing to let AI decide we should all settle on man; nor do I propose to let AI decide we should all use info instead, but with AI we could have the documentation made whole in the missing half, and then it's up to the user if they prefer man or info to fetch the documentation of that tool.Similarily for calling conventions, we could ask LLM's to assemble parameter styles and analyze command calling conventions / parameters and then find one or more canonical ways to communicate this, perhaps consulting an environment variable to figure out what calling convention the user declares to use.
rel_ic
> The environment wins (less tokens burned = less energy consumed)This is understandable logic, but at a systemic level it's not how things always go. Increasing efficiency can lead to increased consumption overall. You might save 50% in energy for your workload, but maybe now you can run it 3 times as much, or maybe 3 times more people will use it, because it's cheaper. The result might be a 50% INCREASE in energy consumed.https://en.wikipedia.org/wiki/Jevons_paradox
rustybolt
Surprisingly often people refuse to document their architecture or workflow for new hires. However, when it's for an LLM some of these same people are suddenly willing to spend a lot of time and effort detailing architecture, process, workflows.I've seen projects with an empty README and a very extensive CLAUDE.md (or equivalent).
googlielmo
I like the gist of this, however LLM may not be the best name for this: what if a new tech (e.g., SLM) takes over? AGENT may be a more faithful name until something better is standardized.
hrpnk
Looks like the blog could use a HN=True. Hope the author won't get banned...> Error: API rate limit exceeded for app ID 7cc6c241b6e6762bf384. If you reach out to GitHub Support for help, please include the request ID E9FC:7BEBA:6CDB3B4:6485458:699EE247 and timestamp 2026-02-25 11:51:35 UTC. For more on scraping GitHub and how it may affect your rights, please review our Terms of Service (https://docs.github.com/en/site-policy/github-terms/github-t...).
sirk390
I would use this as a human. That npm output is crazy. Maybe a better variable would be "CONCISE=1". For LLMs, there are a few easier solutions, like outputing in a file (and then tail)., or running a subagent
troethe
On a lot of linux distros there is the `moreutils` package, which contains a command called `chronic`. Originally intended to be used in crontabs, it executes a command and only outputs its output if it fails. I think this could find another use case here.
TobTobXX
Many unix tools already print less logging when used im a script, ie. non-interactively. (I don't know how they detect that.) For example, `ls` has formatting/coloring and `ls | cat` does not. This solution seems like it would fit the problem from the article?
tacone
For Claude the most pollution usually comes from Claude itself.It's worth noting thet just by setting the right tone of voice, choosing the right words, and instructing it to be concise, surgical in what it says and writes, things change drastically - like night and day.It then starts obeying, CRITICALs are barely needed anymore and the docs it produces are tidy and pretty.
m0rde
I think about what I do in these verbose situations; I learn to ignore most of the output and only take forward the important piece. That may be a success message or error. I've removed most of the output from my context window / memory.I see some good research being done on how to allow LLMs to manage their own context. Most importantly, to remove things from their context but still allow subsequent search/retrieval.
canto
This is merely scratching the surface.LLMs (Claude Code in particular) will explicitly create token intensive steps, plans and responses - "just to be sure" - "need to check" - "verify no leftovers", will do git diff even tho not asked for, create python scripts for simple tasks, etc. Absolutely no cache (except the memory which is meh) nor indexing whatsoever.Pro plan for 20 bucks per month is essentially worthless and, because of this and we are entering new era - the era of $100+ monthly single subscription being something normal and natural.
bearjaws
This is basically what RTK "Rust Token Killer" does.Removes all the fluff around commands that agents use frequently.https://github.com/rtk-ai/rtk
isoprophlex
Huh. I've noticed CC running build or test steps piped into greps, to cull useless chatter. It did this all by itself, without my explicit instructions.Also, I just restart when the context window starts filling up. Small focused changes work better anyway IMO than single god-prompts that try do do everything but eventually exceed context and capability...
mohsen1
Funny! I built an entire cli and ecosystem around this:https://github.com/bodo-run/stop-nagging
vorticalbox
could we not instruct the LLM to run build commands in a sub agents which could then just return a summary of what happened?this avoids having to update everything to support LLM=true and keep your current context window free of noise.
eptcyka
If the output of your build tool is too verbose for a mechanical brain to keep on top of, did the meat brain ever stand a chance?Why was the output so verbose in the first place then?
yoz-y
All of this because we only have stdout and stderr and nothing in between. I wish there was a stdlog or stddebug or something
fergie
Given that most of the utility of Typescript is to make VSCode play nice for its human operator, _should_ we be using Typescript for systems that are written by machines?
titzer
Seeing a JSON configuration file that stores environment variables makes me want to cry. Just to think that somewhere, somehow, it's going to launch an entire JavaScript VM (tens of megabytes) just to parse a file with 12 lines in it, then extract from a JavaScript the fields, munge it, eventually turn into more or less an array of VAR=val C strings which get passed to a forked shell....
gormen
Most of what helps LLMs here is exactly what helps humans: less noise, clearer signals, predictable output.
subhajeet2107
Can TOON format help in this, with "LLM=true" we can reduce the noise which pollutes context
bigblind
I never considered the volume of output tokens from dev tools, but yeah, I like this idea a lot.
Gertig
I've been using CODING_AGENT=true
jascha_eng
I noticed this on my spring boot side project. Successful test runs produce thousands of log lines in default mode because I like to e.g. log every executed SQL statement during development. It gives me visibility into what my orm is actually doing (yeh yeh I know I should just write SQL myself). For me it's just a bit of scrolling and cmd+f if I need to find something specific but Claude actually struggles a lot with this massive output. Especially when it then tries to debug things finding the actual error message in the haystack of logs is suddenly very hard for the LLM. So I spent some time cleaning up my logs locally to improve the "agentic ergonomics" so to say.In general I think good DevEx needs to be dialed to 11 for successful agentic coding. Clean software architecture and interfaces, good docs, etc. are all extremely valuable for LLMs because any bit of confusion, weird patterns or inconsistency can be learned by a human over time as a "quirk" of the code base. But for LLMs that don't have memory they are utterly confusing and will lead the agent down the wrong path eventually.
exabrial
We’ve reinvented exit codes…
mark_l_watson
This seems like a really solid idea: using an environment variable in command line tools and small apps to control output for AI vs. human digestion. Even given efficient attention mechanisms, slop tokens in the context window are bad.I also like a discussion in this thread: using custom tools to reduce the frequency of tool calls in general, that is, write tool wrappers specific for your applications or agents.
block_dagger
Interesting idea but bad tspec. A better approach would be a single env var (DEV_MODE perhaps) with “agent” and “human” as values (and maybe “ci”).
cubefox
I wonder whether attention-free architectures like Mamba or Gated DeltaNet are distracted less by irrelevant context, because they don't recall every detail inside their context window in the first place. Theoretically it should be fairly easy to test this via a dedicated "context rot benchmark" (standard benchmarks but with/without irrelevant context).
Bishonen88
great idea. thought about the waste of tokens dozens of times when I saw claude code increase the token count in the CLI after a build. I was wondering if there's a way to stop that, but not enough to actually look into it. I'd love for popular build tools to implement something along those lines!
scotty79
I think I noticed LLMs doing >/dev/null on routine operations.
pelasaco
`Humans=True`The best friend isn't a dog, but the family that you build. Wife/Husband/kids. Those are going to be your best friends for life.
philipwhiuk
MCP as an env-var ;)
user3939382
Actually what we have is an entire stack, starting with Von Neumann arch, the kernel, the browser, auth —- it is optimized for the intuition of neither humans nor agents. All the legacy cruft that we glibly told people to RTFM on is now choking your agent and burning your tokens.I have a solution to all this of course but why should I tell anyone.
haarlemist
Can we just instruct the agents to redirect output streams to files, and then use grep to retrieve the necessary lines?
keybored
Speaking of obvious questions. Why are you counting pennies instead of getting the LLM to do it? (Unless the idea was from an LLM and the executive decision was left to the operator, as well as posting the article)So much content about furnishing the Markdown and the whatnot for your bots. But content is content?
deafpolygon
Or, stop outputting crap and use a logger. Make an LLM-only logger for output LLMs need and use stdout for HUMAN things.
MarcLore
[dead]
octoclaw
[dead]
Peritract
This all seems like a lot of effort so that an agent can run `npm run build` for you.I get the article's overall point, but if we're looking to optimise processing and reduce costs, then 'only using agents for things that benefit from using agents' seems like an immediate win.You don't need an agent for simple, well-understood commands. Use them for things where the complexity/cost is worth it.