<- Back
Comments (57)
- NitpickLawyerThe paper is here - https://arxiv.org/pdf/2603.19461This, IMO is the biggest insight into where we're at and where we're going:> Because both evaluation and self-modification are coding tasks, gains in coding ability can translate into gains in self-improvement ability.There's a thing that I've noticed early into LLMs: once they unlock one capability, you can use that capability to compose stuff and improve on other, related or not, capabilities. For example "reflexion" goes into coding - hey, this didn't work, let me try ... Then "tools". Then "reflxion" + "tools". And so on.You can get workflows that have individual parts that aren't so precise become better by composing them, and letting one component influence the other. Like e2e coding gets better by checking with "gof" tools (linters, compilers, etc). Then it gets even better by adding a coding review stage. Then it gets even better by adding a static analysis phase.Now we're seeing this all converge on "self improving" by combining "improving" components. And so on. This is really cool.
- JerrrrrrrryNo matter how far we go, we end up with generation / discrimination architecture.Its is the core of any and all learning/exellency; exposure to chaotic perturbations allow selection of solutions that are then generalized to further, ever more straining problems; producing increasingly applicable solutions.This is the core of evolution, and is actually derivable from just a single rule.
- kordlessagainThe loop on this is basically tweak your prompt until you score better on a contrived test.
- mifydevI've been experimenting with similar concept myself. The linter loop is the only thing that can keep the agent sane in my opinion, and if anyone can generalize bun+tsc loop to other tasks, this would finally be a way to trust LLMs output.I was annoyed at how Claude Code ignores my CLAUDE.md and skills, so I was looking for ways to expand type checking to them. So I wrote a wrapper on top of claude-agents-sdk that reads my CLAUDE.md and skills, and compiles them into rules - could be linter rules or custom checking scripts. Then it hooks up to all tools and runs the checks. The self improving part comes if some rule doesn't work: I run the tool with the session id in review mode, it proposes the fixes and improves the rule checkers. (not the md files) So it's kinda like vibe coding rules, definitely lowers the bar for me to maintain them. Repo: https://github.com/chebykinn/agent-ruler
- supermdguyIt's surprising that this works so well considering that AI-generated AGENTS.md files have been shown to be not very useful. I think the key difference here is that the real-world experience helps the agent reach regions of its latent space that wouldn't occur naturally through autoregression.I wonder how much of the improvement is due to the agent actually learning new things vs. reaching parts of its latent space that enable it to recall things it already knows. Did the agent come up with novel RL reward design protocols based on trial and error? Or did the tokens in the environment cause it to "act smarter"?
- agrishinI found that running an agent in ralph loop, showing it the agent text and saying "run this, if it fails - identify the reason, and modify the agent instructions to avoid this, acceptance criteria are this and that" worked surprisingly well. Not sure if it qualifies as a self-referential self improving, but it was something.
- flockonusThe readme seems very unclear about what it does. Anyone has a practical example of it?
- kordlessagainUses LiteLLM. Lovely.
- anonundefined
- measurablefuncThat's great but how about UltraAgents: Meta-referential meta-improving self-referential hyperagents?
- anonundefined
- sonu27Can someone add this to OpenClaw :)
- NoToP"So, what do you see as your greatest weakness?"
- jauntywundrkindPi is self modifying, self aware. https://lucumr.pocoo.org/2026/1/31/pi/But this idea of having a task agent & meta agent maybe has wings. Neat submission.
- llmslaveI think even code bases will have self improving agents. Software is moving from just the product code, to the agent code that maintains the product. Engineering teams/companies that move in this direction will vastly out produce others.I've had to really shift how I think about building code bases, alot of logic can go into claude skills and sub agents. Requires essentially relearning software engineering
- yubainu[dead]
- leontloveless[dead]
- ozgurozkan[dead]
- Archiebuilds[dead]
- agentpiravi[dead]
- felixagentai[flagged]
- maxbeech[dead]
- 11thDwarf[flagged]
- andyg_blog[dead]
- felixagentai[flagged]