Need help?
<- Back

Comments (122)

  • lmeyerov
    Most of this rings true for us for the same reasons. We have been moving large old projects in this direction, and new ones start there. It's easier to do these via tool checks than trust skills files. I wouldn't say the resulting code is good, which folks are stumbling on, but it is rewarding better code - predictable, boring, tested, pure, and fast to iterate on, which are all indeed part of our SDLC principles.Some of the advice is a bit more extreme, like I haven't found value in 100% code coverage, but 90% is fine. Others miss nuance like we have to work hard to prevent the AI from subverting the type checks, like by default it works around type errors by using getattr/cast/typeignore/Any everywhere.One item I'm hoping is AI coders get better at is using static analysis tools and verification tools. My experiments here have been lukewarm/bad, like adding an Alloy model checker for some parts of GFQL (GPU graph query language) took a lot of prodding and found no bugs, but straight up asking codex to do test amplification on our unit test suite based on our code and past bugs works great. Likewise, it's easy to make it port conformance tests from standards and help with making our docs executable to help prevent drift.A new area we are starting to look at is automatic bug patches based on production logs. This is practical for the areas we setup for vibe coding, which in turn are the areas we care about more and work most heavily on. We never trusted automated dependency update bots, but this kind of thing gets much more trustworthy & reviewable. Another thing we are eyeing is new 'teleport' modes so we can shift PRs to remote async development, which previously we didn't think worth supporting.
  • tempodox
    This is hallucination. Or maybe a sales pitch. If production bugs and the requirement to retain a workable code base don’t get us to write “good” code, then nothing will. And at the current state of the art, “AI” will tend to make it worse.
  • tombert
    Something I just started doing yesterday, and I'm hoping it catches on, is that I've been writing the spec for what I want in TLA+/PlusCal at a pretty high level, and then I tell Codex implement exactly to the spec. I tell it to not deviate from the spec at all, and be as uncreative as possible.Since it sticks pretty close to the spec and since TLA+ is about modifying state, the code it generates is pretty ugly, but ugly-and-correct code beats beautiful code that's not verified.It's not perfect; something that naively adheres to a spec is rarely optimized, and I've had to go in and replace stuff with Tokio or Mio or optimize a loop because the resulting code is too slow to be useful, and sometimes the code is just too ugly for me to put up with so I need to rewrite it, but the amount of time to do that is generally considerably lower than if I were doing the translation myself entirely.The reason I started doing this: the stuff I've been experimenting with lately has been lock-free data structures, and I guess what I am doing is novel enough that Codex does not really appear to generate what I want; it will still use locks and lock files and when I complain it will do the traditional "You're absolutely right", and then proceed to do everything with locks anyway.In a sense, this is close to the ideal case that I actually wanted: I can focus on the high-level mathey logic while I let my metaphorical AI intern deal with the minutia of actually writing the code. Not that I don't derive any enjoyment out of writing Rust or something, but the code is mostly an implementation detail to me. This way, I'm kind of doing what I'm supposed to be doing, which is "formally specify first, write code second".
  • afro88
    Without having tried it (caveat), I worry that 100% coverage to an LLM will lock in bad assumptions and incorrect functionality. It makes it harder for it to identify something that is wrong.That said, we're not talking about vibe coding here, but properly reviewed code, right? So the human still goes "no, this is wrong, delete these tests and implement for these criteria"?
  • pgroves
    This is sort of why I think software development might be the only real application of LLMs outside of entertainment. We can build ourselves tight little feedback loops that other domains can't. I somewhat frequently agree on a plan with an LLM and a few minutes or hours later find out it doesn't work and then the LLM is like "that's why we shouldn't have done it like that!". Imagine building a house from scratch and finding out that it was using some american websites to spec out your electric system and not noticing the problem until you're installing your candadian dishwasher.
  • zmmmmm
    Very little there about the code itself being good. A lot about putting good guardrails around it and making it fast and safe to develop. Which is good for sure. But I feel it's misconstruing it to say the actual code is "good". The whole reason the guard rails provide value is the code is, by default, "not good" and how good the result is presumably sitting in a spectrum between "the worst possible that satisfies the guardrails" and "actually good".
  • nathan_f77
    This is exactly how I've been working with AI this year and I highly recommend it. This kind of workflow was not feasible when I was working alone and typing every line of code. Now it's suprisingly easy to achieve. In my latest project, I've enforced extremely strict linting rules and completely banned any ignore comments. No file over 500 lines, and I'm even using all the default settings to prevent complex functions (which I would have normally turned off a long time ago.)Now I can leave an agent running, come back an hour or two later, and it's written almost perfect, typed, extremely well tested code.
  • mkozlows
    I like this. "Best practices" are always contingent on the particular constellation of technology out there; with tools that make it super-easy to write code, I can absolutely see 100% coverage paying off in a way that doesn't for human-written code -- it maximizes what LLMs are good at (cranking out code) while giving them easy targets to aim for with little judgement.(A thing I think is under-explored is how much LLMs change where the value of tests are. Back in the artisan hand-crafted code days, unit tests were mostly useful as scaffolding: Almost all the value I got from them was during the writing of the code. If I'd deleted the unit tests before merging, I'd've gotten 90% of the value out of them. Whereas now, the AI doesn't necessarily need unit tests as scaffolding as much as I do, _but_ having them put in there makes future agentic interactions safer, because they act as reified context.)
  • danieka
    I thought that the article would be about if we want AI to be effective, we should write good code.What I notice is that Claude stumbles more on code that is illogical, unclear or has bad variable names. For example if a variable is name "iteration_count" but actually contains a sum that will "fool" AI.So keeping the code tidy gives the AI clearer hints on what's going on which gives better results. But I guess that's equally true for humans.
  • bwhiting2356
    I agree with this. 100% test coverage for front end is harder, I don't know if I'm going to reach for that yet. So far I've been making my linting rules stricter.
  • sandblast2
    The expertise in software engineering typical in these promptfondling companies shine through this blog post.Surely they know 100% code coverage is not a magical bullet because the code flow and the behavior can differ depending on the input. Just because you found a few examples which happen to hit every line of code you didn't hit every possible combination. You are living in a fool's paradise which is not a surprise because only fools believe in LLMs. You are looking for a formal proof of the codebase which of course no one does because the costs would be astronomical (and LLMs are useless for it which is not at all unique because they are useless for everything software related but they are particularly unusable for this).
  • altmanaltman
    Wouldn't a better title be "How we're forcing AI to write good code (because it's normally not that good in general, which is crazy, given how many resources it's sucking, that we need to add an extra layer on top of it and use it to get anything decent)"
  • cube00
    I can't reconcile how the CEO of an AI startup is; on one hand pushing "100% Percent [sic] Code Coverage" while also selling the idea of "Less than 60 seconds to production" on their product (which is linked in the first screen-full of the blog post so it's not like these are personal thoughts).If 100% code coverage is a good thing, you can't tell me anyone (including parallel AI bots) is going to do this correctly and completely for a given use case in 60 seconds.I don't mind it mind it being fast, but to sell it as 60 second fast while trying to give the appearance you support high quality and correct code isn't possible.
  • the_king
    I think good names and a good file structure are the most important thing to get right here.
  • brynary
    Strong agreement with everything in this post.At Qlty, we are going so far as to rewrite hundreds of thousands of lines of code to ensure full test coverage, end-to-end type checking (including database-generated types).I’ll add a few more:1. Zero thrown errors. These effectively disable the type checker and act as goto statements. We use neverthrow for Rust-like Result types in TypeScript.2. Fast auto-formatting and linting. An AI code review is not a substitute for a deterministic result in sub-100ms to guarantee consistency. The auto-formatter is set up as a post-tool use Claude hook.3. Side-effect free imports and construction. You should be able to load all the code files and construct an instance of every class in your app without a network connection spawning. This is harder than it sounds and without it you run into all sorts of trouble with the rest.3. Zero mocks and shared global state. By mocks, I mean mocking frameworks which override functions on existing types or global. These effectively are injecting lies into the type checker.Should put to tsgo which has dramatically lowered our type checking latency. As the tok/sec of models keeps going up, all the time is going to get bottlenecked on tool calls (read: type checking and tests).With this approach we now have near 100% coverage with a test suite that runs in under 1,000ms.
  • jillesvangurp
    This goes in the right direction. It could go further though. Types are indeed nice. So, why use a language why using those is optional? There are many reasons but many of those have to do with people and their needs/wants rather than tool requirements. AI agents benefit from good tool feedback, so maybe switch to languages and frameworks that provide plenty of that and quickly. Switching used to be expensive. Because you had to do a lot of the work manually. That's no longer true. We can make LLMs do all of the tedious stuff.Including using more rigidly typed languages, making sure things are covered with tests, using code analysis tools to spot anti patterns and addressing all the warnings, etc. That was always a good idea but we now have even less excuses to skip all that.
  • AuthAuth
    >Statement about how AI is actually really good and we should rely on it more. Doesnt cover any downsides.>CEO of an AI companyMany such cases
  • anon
    undefined
  • user____name
    I've been wondering if AI startups are running bots to downvote negative AI sentiment on HN. The hype is sort of ridiculous at times.
  • jennyholzer3
    I don't know about all this AI stuff.How are LLMs going to stay on top of new design concepts, new languages, really anything new?Can LLMs be trained to operate "fluently" with regards to a genuinely new concept?I think LLMs are good for writing certain types of "bad code", i.e. if you're learning a new language or trying to quickly create a prototype.However to me it seems like a security risk to try to write "good code" with an LLM.
  • jaredcwhite
    I'm sad programmers lacking a lot of experience will read this and think it's a solid run-down of good ideas.
  • firemelt
    yeah beekeeping, I think about it alot, I mean the agentic should be isolated on their own environment, its dangerous to give then ur whole pc who nows they silently putting some rootkit or backdoor to ur pc, like appending allowed ssh keys
  • anon
    undefined
  • pizlonator
    Why would I write code that makes it easier for a clanker to compete with me
  • mrits
    Author should ask AI to write a small app with 100% code coverage that breaks in every path except what is covered in the tests.
  • badgersnake
    I’m increasingly finding that the type of engineer that blogs is not they type of engineer anyone should listen to.
  • bgwalter
    https://logic.inc/"Ship AI features and tools in minutes, not weeks. Give Logic a spec, get a production API—typed, tested, versioned, and ready to deploy."
  • sublinear
    What? We're already so far down the list of things to try with AI that we're saying hallucinated tests are better than no tests at all?Seems actively harmful, and the AI hype died out faster than I thought it would.> Agents will happily be the Roomba that rolls over dog poop and drags it all over your houseThere it is, folks!
  • block_dagger
    I stopped reading at “static typing.” That is not what “good code” always looks like.
  • devhouse
    [dead]