Claude is good at assembling blocks, but still falls apart at creating them

<- Back

Claude is good at assembling blocks, but still falls apart at creating them

bblcla

Comments (138)

disconcision
I've yet to be convinced by any article, including this one, that attempts to draw boxes around what coding agents are and aren't good at in a way that is robust on a 6 to 12 month horizon.I agree that the examples listed here are relatable, and I've seen similar in my uses of various coding harnesses, including, to some degree, ones driven by opus 4.5. But my general experience with using LLMs for development over the last few years has been that:1. Initially models could at best assemble a simple procedural or compositional sequences of commands or functions to accomplish a basic goal, perhaps meeting tests or type checking, but with no overall coherence,2. To being able to structure small functions reasonably,3. To being able to structure large functions reasonably,4. To being able to structure medium-sized files reasonably,5. To being able to structure large files, and small multi-file subsystems, somewhat reasonably.So the idea that they are now falling down on the multi-module or multi-file or multi-microservice level is both not particularly surprising to me and also both not particularly indicative of future performance. There is a hierarchy of scales at which abstraction can be applied, and it seems plausible to me that the march of capability improvement is a continuous push upwards in the scale at which agents can reasonably abstract code.Alternatively, there could be that there is a legitimate discontinuity here, at which anything resembling current approaches will max out, but I don't see strong evidence for it here.
woeirua
It's just amazing to me how fast the goal posts are moving. Four years ago, if you had told someone that a LLM would be able to one-shot either of those first two tasks they would've said you're crazy. The tech is moving so fast. I slept on Opus 4.5 because GPT 5 was kind of an air ball, and just started using it in the past few weeks. It's so good. Way better than almost anything that's come before it. It can one-shot tasks that we never would've considered possible before.
maxilevi
LLMs are just really good search. Ask it to create something and it's searching within the pretrained weights. Ask it to find something and it's semantically searching within your codebase. Ask it to modify something and it will do both. Once you understand its just search, you can get really good results.
mikece
In my experience Claude is like a "good junior developer" -- can do some things really well, FUBARS other things, but on the whole something to which tasks can be delegated if things are well explained. If/when it gets to the ability level of a mid-level engineer it will be revolutionary. Typically a mid-level engineer can be relied upon to do the right thing with no/minimal oversight, can figure out incomplete instructions, and deliver quality results (and even train up the juniors on some things). At that point the only reason to have human junior engineers is so they can learn their way up the ladder to being an architect and responsible coordinating swarms of Claude Agents to develop whole applications and complete complex tasks and initiatives.Beyond that what can Claude do... analyze the business and market as a whole and decide on product features, industry inefficiencies, gap analysis, and then define projects to address those and coordinate fleets of agents to change or even radically pivot an entire business?I don't think we'll get to the point where all you have is a CEO and a massive Claude account but it's not completely science fiction the more I think about it.
lordnacho
By and large, I agree with the article. Claude is great and fast at doing low level dev work. Getting the syntax right in some complicated mechanism, executing an edit-execute-readlog loop, making multi file edits.This is exactly why I love it. It's smart enough to do my donkey work.I've revisited the idea that typing speed doesn't matter for programmers. I think it's still an odd thing to judge a candidate on, but appreciate it in another way now. Being able to type quickly and accurately reduces frustration, and people who foresee less frustration are more likely to try the thing they are thinking about.With LLMs, I have been able to try so many things that I never tried before. I feel that I'm learning faster because I'm not tripping over silly little things.
ChicagoDave
I have several projects that counter this article. Not sure why, but I’ve extracted clean, readable, well-constructed, and well-tested code.I might write something up at some point, but I can share this:https://github.com/chicagodave/devarch/New repo with guides for how I use Claude Code.
simonw
I'm not entirely convinced by the anecdote here where Claude wrote "bad" React code:> But in context, this was obviously insane. I knew that key and id came from the same upstream source. So the correct solution was to have the upstream source also pass id to the code that had key, to let it do a fast lookup.I've seen Claude make mistakes like that too, but then the moment you say "you can modify the calling code as well" or even ask "any way we could do this better?" it suggests the optimal solution.My guess is that Claude is trained to bias towards making minimal edits to solve problems. This is a desirable property, because six months ago a common complaint about LLMs is that you'd ask for a small change and they would rewrite dozens of additional lines of code.I expect that adding a CLAUDE.md rule saying "always look for more efficient implementations that might involve larger changes and propose those to the user for their confirmation if appropriate" might solve the author's complaint here.
alphazard
This sounds suspiciously like the average developer, which is what the transformer models have been trained to emulate.Designing good APIs is hard, being good at it is rare. That's why most APIs suck, and all of us have a negative prior about calling out to an API or adding a dependency on a new one. It takes a strong theory of mind, a resistance to the curse of knowledge, and experience working on both sides of the boundary, to make a good API. It's no surprise that Claude isn't good at it, most humans aren't either.
michalsustr
This article resonates exactly how I think about it as well. For example, at minfx.ai (a Neptune/wandb alternative), we cache time series that can contain millions of floats for fast access. Any engineer worth their title would never make a copy of these and would pass around pointers for access. Opus, when stuck in a place where passing the pointer was a bit more difficult (due to async and Rust lifetimes), would just make the copy, rather than rearchitect or at least stop and notify user. Many such examples of ‘lazy’ and thus bad design.
joshcsimmons
IDK I've been using opus 4.5 to create a UI library and it's been doing pretty well: https://simsies.xyz/ (still early days)Granted it was building ontop of tailwind (shifting over to radix after the layoff news). Begs the question? What is a lego?
Scrapemist
Eventually you can show Claude how you solve problems, and explain the thought process behind it. It can apply these learnings but it will encounter new challenges in doing so. It would be nice if Claude could instigate a conversation to go over the issues in depth. Now it wants quick confirmation to plough ahead.
iamacyborg
Here’s an example of a plan I’m working on in CC, it’s very thorough, albeit required a lot of handholding and fact checking on a number of points as it’s first few passes didn’t properly anonymise data.https://docs.google.com/document/u/0/d/1zo_VkQGQSuBHCP45DfO7...
doug_durham
Did the author ask it to make new abstractions? In my experience when I produces output that I don't like I ask it to refactor it. These models have and understanding of all modern design patterns. Just ask it to adopt one.
jondwillis
Regardless, yet another path to the middle class is closing for a lot of people. RIP (probably me too)
esafak
> Claude doesn’t have a soul. It doesn't want anything.Ha! I don't know what that has to do with anything, but this is exactly what I thought while watching Pluribus.
lxe
Eh. This is yet another "I tried AI to do a thing, and it didn't do it the way I wanted it, therefore I'm convinced that's just how it is... here's a blog about it" article."Claude tries to write React, and fails"... how many times? what's the rate of failure? What have you tried to guide it to perform better.These articles are similar to HN 15 years ago when people wrote "Node.JS is slow and bad"
mklyachman
[flagged]