AI will make formal verification go mainstream

<- Back

AI will make formal verification go mainstream

evankhoury

Comments (403)

alexgotoi
The funny part of “AI will make formal verification go mainstream” is that it skips over the one step the industry still refuses to do: decide what the software is supposed to do in the first place.We already have a ton of orgs that can’t keep a test suite green or write an honest invariant in a code comment, but somehow we’re going to get them to agree on a precise spec in TLA+/Dafny/Lean and treat it as a blocking artifact? That’s not an AI problem, that’s a culture and incentives problem.Where AI + “formal stuff” probably does go mainstream is at the boring edges: property-based tests, contracts, refinement types, static analyzers that feel like linters instead of capital‑P “Formal Methods initiatives”. Make it look like another checkbox in CI and devs will adopt it; call it “verification” and half the org immediately files it under “research project we don’t have time for”.Will include this thread in my https://hackernewsai.com/ newsletter.
QuadrupleA
I don't think formal verification really addresses most day-to-day programming problems: * A user interface is confusing, or the English around it is unclear * An API you rely on changes, is deprecated, etc. * Users use something in unexpected ways * Updates forced by vendors or open source projects cause things to break * The customer isn't clear what they want * Complex behavior between interconnected systems, out of the purview of the formal language (OS + database + network + developer + VM + browser + user + web server) For some mathematically pure task, sure, it's great. Or a low-level library like a regular expression parser or a compression codec. But I don't think that represents a lot of what most of us are tasked with, and those low-level "mathematically pure" libraries are generally pretty well handled by now.
bkettle
I think formal verification shines in areas where implementation is much more complex than the spec, like when you’re writing incomprehensible bit-level optimizations in a cryptography implementation or compiler optimization phases. I’m not sure that most of us, day-to-day, write code (or have AI write code) that would benefit from formal verification, since to me it seems like high-level programming languages are already close to a specification language. I’m not sure how much easier to read a specification format that didn’t concern itself with implementation could be, especially when we currently use all kinds of frameworks and libraries that already abstract away implementation details.Sure, formal verification might give stronger guarantees about various levels of the stack, but I don’t think most of us care about having such strong guarantees now and I don’t think AI really introduces a need for new guarantees at that level.
simonw
I'm convinced now that the key to getting useful results out of coding agents (Claude Code, Codex CLI etc) is having good mechanisms in place to help those agents exercise and validate the code they are writing.At the most basic level this means making sure they can run commands to execute the code - easiest with languages like Python, with HTML+JavaScript you need to remind them that Playwright exists and they should use it.The next step up from that is a good automated test suite.Then we get into quality of code/life improvement tools - automatic code formatters, linters, fuzzing tools etc.Debuggers are good too. These tend to be less coding-agent friendly due to them often having directly interactive interfaces, but agents can increasingly use them - and there are other options that are a better fit as well.I'd put formal verification tools like the ones mentioned by Martin on this spectrum too. They're potentially a fantastic unlock for agents - they're effectively just niche programming languages, and models are really good at even niche languages these days.If you're not finding any value in coding agents but you've also not invested in execution and automated testing environment features, that's probably why.
adverbly
This smells like a Principia Mathematica take to me...Reducing the problem to "ya just create a specification to formally verify" doesn't move the needle enough to me.When it comes to real-world, pragmatic, boots-on-the-ground engineering and design, we are so far from even knowing the right questions to ask. I just don't buy it that we'd see huge mainstream productivity changes even if we had access to a crystal ball.Its hilarious how close we're getting to Hitch hikers guide to the galaxy though. We're almost at that phase where we ask what the question is supposed to be.
infruset
I was waiting for a post like this to hit the front page of Hacker News any day. Ever since Opus 4.5 and GPT 5.2 came out (mere weeks ago), I've been writing tens of thousands of lines of Lean 4 in a software engineering job and I feel like we are on the eve of a revolution. What used to take me 6 months of work when I was doing my PhD in Coq (now Rocq), now takes from a few hours to a few days. Whole programming languages can get formalized executable semantics in little time. Lean 4 already has a gigantic amount of libraries for math but also for computer science; I expect open source projects to sprout with formalizations of every language, protocol, standard, algorithm you can think of.Even if you have never written formal proofs but are intrigued by them, try asking a coding agent to do some basic verification. You will not regret it.Formal proof is not just about proving stuff, it's also about disproving stuff, by finding counterexamples. Once you have stated your property, you can let quickcheck/plausible attack it, possibly helped by a suitable generator which does not have to be random: it can be steered by an LLM as well.Even further, I'm toying with the idea of including LLMs inside the formalization itself. There is an old and rich idea in the domain of formal proof, that of certificates: rather than proving that the algorithm that produces a result is correct, just compute a checkable certificate with untrusted code and verify it is correct. Checkable certificates can be produced by unverified programs, humans, and now LLMs. Properties, invariants, can all be "guessed" without harm by an LLM and would still have to pass a checker. We have truly entered an age of oracles. It's not halting-problem-oracle territory of course, but it sometimes feels pretty close for practical purposes. LLMs are already better at math than most of us and certainly than me, and so any problem I could plausibly solve on my own, they will do faster without my having to wonder if there is a subtle bug in the proof. I still need to look at the definitions and statements, of course, but my role has changed from finding to checking. Exploring the space of possible solutions is now mostly done better and faster by LLMs. And you can run as many in parallel as you can keep up with, in attention and in time (and money).If anyone else is as excited about all this as I am, feel free to reach out in comments, I'd love to hear about people's projects !
pron
> it’s not hard to extrapolate and imagine that process becoming fully automated in the next few years. And when that happens, it will totally change the economics of formal verification.There is a problem with this argument similar to one made about imagining the future possibilities of vibe coding [1]: once we imagine AI to do this task, i.e. automatically prove software correct, we can just as easily imagine it to not have to do it (for us) in the first place. If AI can do the hardest things, those it is currently not very good at doing, there's no reason to assume it won't be able to do easier things/things it currently does better. In particular, we won't need it to verify our software for us, because there's no reason to believe that it won't be able to come up with what software we need better than us in the first place. It will come up with the idea, implement it, and then decide to what extent to verify it. Formal verification, or programming for that matter, will not become mainstream (as a human activity) but go extinct.Indeed, it is far easier for humans to design and implement a proof assistant than it is to use one to verify a substantial computer program. A machine that will be able to effectively use a proof checker, will surely be able to come up with a novel proof checker on its own.I agree it's not hard to extrapolate technological capabilities, but such extrapolation has a name: science fiction. Without a clear understanding of what makes things easier or harder for AI (in the near future), any prediction is based on arbitrary guesses that AI will be able to do X yet not Y. We can imagine any conceivable capability or limitation we like. In science fiction we see technology that's both capable and limited in some rather arbitrary ways.It's like trying to imagine what problems computers can and cannot efficiently solve before discovering the notion of compuational complexity classes.[1]: https://news.ycombinator.com/item?id=46207505
rgmerk
(sarcasm on)Woohoo, we're almost all of the way there! Now all you need to do is ensure that the formal specification you are proving that the software implements is a complete and accurate description of the requirements (which are likely incomplete and contradictory) as they exist in the minds of the set of stakeholders affected by your software.(sarcasm off).
jameslk
> As the verification process itself becomes automated, the challenge will move to correctly defining the specification: that is, how do you know that the properties that were proved are actually the properties that you cared about? Reading and writing such formal specifications still requires expertise and careful thought. But writing the spec is vastly easier and quicker than writing the proof by hand, so this is progress.Proofs never took off because most software engineering moved away from waterfall development, not just because proofs are difficult. Long formal specifications were abandoned since often those who wrote them misunderstood what the user wanted or the user didn’t know what they wanted. Instead, agile development took over and software evolved more iteratively and rapidly to meet the user.The author seems to make their prediction based on the flawed assumption that difficulty in writing proofs was the only reason we avoided them, when in reality the real challenge was understanding what the user actually wanted.
pedrozieg
I buy the economics argument, but I’m not sure “mainstream formal verification” looks like everyone suddenly using Lean or Isabelle. The more likely path is that AI smuggles formal-ish checks into workflows people already accept: property checks in CI, model checking around critical state machines, “prove this invariant about this module” buttons in IDEs, etc. The tools can be backed by proof engines without most engineers ever seeing a proof script.The hard part isn’t getting an LLM to grind out proofs, it’s getting organizations to invest in specs and models at all. Right now we barely write good invariants in comments. If AI makes it cheap to iteratively propose and refine specs (“here’s what I think this service guarantees; what did I miss?”) that’s the moment things tip: verification stops being an academic side-quest and becomes another refactoring tool you reach for when changing code, like tests or linters, instead of a separate capital-P “formal methods project”.
maxwells-daemon
There are a couple of interesting benefits from the machine learning side that I think discussions of this kind often miss. (This has been my field of research for the last few years [1][2]; I bet my career on it because these ideas are so exciting to me!)One is that modern formal systems like Lean are quite concise and flexible compared to what you're probably expecting. Lean provides the primitives to formalize all kinds of things, not just math or software. In fact, I really believe that basically _any_ question with a rigorous yes-or-no answer can have its semantics formalized into a kind of "theory". The proofs are often close to how an English proof might look, thanks to high-level tactics involving automation and the power of induction.Another is that proof-checking solves what are (in my opinion) two of the biggest challenges in modern AI: reward specification and grounding. You can run your solver for a long time, and if it finds an answer, you can trust that without worrying about reward hacking or hallucination, even if the answer is much too complicated for you to understand. You can do RL for an unlimited time for the same reason. And Lean also gives you a 'grounded' model of the objects in your theory, so that the model can manipulate them directly.In combination, these two properties are extremely powerful. Lean lets us specify an unhackable reward for an extremely diverse set of problems across math, science, and engineering, as well as a common environment to do RL in. It also lets us accept answers to questions without checking them ourselves, which "closes the loop" on tools which generate code or circuitry.I plan to write a much more in-depth blog post on these ideas at some point, but for now I'm interested to see where the discussion here goes.[1] https://leandojo.org/leandojo.html [2] https://aristotle.harmonic.fun/
bugarela
Me and my team have recently done an experiment [1] that is pretty aligned with this idea. We took a complex change our colleagues wanted to make to a consensus engine and tried a workflow where Quint formal specifications would be in the middle of prompts and code, and it worked out much better than we imagined. I'm personally very excited about the opportunities for formal methods in this new era.[1]: https://quint-lang.org/posts/llm_era
jacquesm
This is a tough thing to do: predictions that are premised on the invention of something that does not exist today or that does not exist in the required form hinges on an unknown. Some things you can imagine but they will likely never exist (time travel, ringworlds, space elevators) and some hinge on one thing that still needs to be done before you can have that particular future come true. If the thing you need is 'AGI' then all bets are off. It could be next week, next month, next year or never.This is - in my opinion - one of those. If an AI is able to formally verify with the same rigor that a system designed specifically for that purpose is able to do it I think that would require AGI rather than a simpler version of it. The task is complex enough that present day AI's would generate as much noise as they would generate signal.
heikkilevanto
I admit I have never worked with it, but I have a strong feeling that a formal verification can only work if you have a formal specification. Fine and useful for a compiler, or a sorting library. But pretty far from most of the coding jobs I have seen in my career. And even more distant from "vibe coding" where the starting point is some vaguely defined free-text description of what the system might possibly be able to do...
andy99
Maybe a stupid question, how do you verify the verification program? If an LLM is writing it too, isn’t it turtles all the way down, especially with the propensity of AI to modify tests so they pass?
rocqua
I've been trying to convince others of this, and gotten very little traction. One interesting response I got from someone very familiar with the Tamarin prover was that there just wasn't enough example code out there.Another take is that LLMs don't have enough conceptual understanding to actually create proofs for the correctness of code.Personally I believe this kind of work is predicated on more ergonomic proof systems. And those happen to be valuable even without LLMs. Moreover the built in guarantees of rust seem like they are a great start for creating more ergonomic proof systems. Here I am both in awe of Kani, and disappointed by it. The awe is putting in good work to make things more ergonomic. The disappointment is using bounded model checking for formal analysis. That can barely make use of the exclusion of mutable aliasing. Kani, but with equational reasoning, that's the way forward. Equational reasoning was long held back by needing to do a whole lot of pointer work to exclude worries of mutable aliasing. Now you can lean on the borrow checker for that!
newpavlov
For all my skepticism towards using LLM in programming (I think that the current trajectory leads to slow degradation of the IT industry and massive loss of expertise in the following decades), LLM-based advanced proof assistants is the only bright spot for which I have high hopes.Generation of proofs requires a lot of complex pattern matching, so it's a very good fit for LLMs (assuming sufficiently big datasets are available). And we can automatically verify LLM output, so hallucinations are not the problem. You still need proper engineers to construct and understand specifications (with or without LLM help), but it can significantly reduce development cost of high assurance software. LLMs also could help with explaining why a proof can not be generated.But I think it would require a Rust-like breakthrough, not in the sense of developing the fundamental technology (after all, Rust is fairly conservative from the PLT point of view), but in the sense of making it accessible for a wider audience of programmers.I also hope that we will get LLM-guided compilers which generate equivalency proofs as part of the compilation process. Personally, I find it surprising that the industry is able to function as well as it does on top of software like LLVM which feels like a giant with feet of clay with its numerous miscompilation bugs and human-written optimization heuristics which are applied to a somewhat vague abstract machine model. Just look how long it took to fix the god damn noalias atrtibute! If not for Rust, it probably would've still been a bug ridden mess.
dwrodri
I think if AI can help us modernize the current state of hardware verification, I think that would be an enormous boon to the tech industry.Server class CPUs and GPUs are littered with side channels which are very difficult to “close”, even in hardened cloud VMs.We haven’t verified “frontier performance” hardware down to the logic gate in quite some time. Prof. Margaret Martinosi’s lab and her students have spent quite some time on this challenge, and i am excited to see better, safer memory models oyt in the wild.A lot of the same big ideas used in hardware are making their way into the software later too, see https://faultlore.com/blah/tower-of-weakenings/
iamwil
I wrote a post on how I dipped my toes into Lean, using LLMs to guide me.https://interjectedfuture.com/the-best-way-to-learn-might-be...You might find it useful. I also caveat the experience and recount some of the pitfalls, which you might enjoy as a skeptic.Martin Klepmann seemed to like it too. https://bsky.app/profile/martin.kleppmann.com/post/3m7ugznx4...
misir
I think as long as we don't integrate formal verification into the programs themselves, it's not going to become mainstream. Especially now you got two different pieces you need to maintain and keep in sync (whether using LLMs or not).Strong type systems are providing partial validation which is helping quite a lot IMO. The better we can model the state - the more constraints we can define in the model, the closer we'd be getting to writing "self-proven" code. I would assume formal proofs do way more than just ensuring validity of the model, but the similar approaches can be integrated to mainstream programs as well I believe.
momojo
@simonw's successful port of JustHTML from python to javascript proved that an agent iteration + an exhaustive test suite is a powerful combo [0].I don't know if TLA+ is going to suddenly appear as 'the next language I want to learn' in Stackoverflow's 2026 Developer Survey, but I bet we're going to see a rise in testing frameworks/languages. Anything to make it easier for an agent to spit out tokens or write smaller tests for itself.Not a perfect piece of evidence, but I'm really interested to see how successful Reflex[1] is in this upcoming space.[0] https://simonwillison.net/2025/Dec/15/porting-justhtml/ [1] https://github.com/reflex-dev/reflex
hintymad
Maybe we can define what "mainstream" means? Maybe this is too anecdotal, but my personal experience is that most of the engineers are tweakers. They love building stuff and are good at it, but they simply are not into math-like rigorous thinking. Heck, it's so hard to even motivate them to use basic math like queuing theory and stats to help with their day-to-day work. I highly doubt that they would spend time picking up formal verification despite the help of AI
manindahat
At best, not reading generated code results in a world where no humans are able to understand our software, and we regress back to the stone age after some critical piece breaks, and nobody can fix it.At worst, we eventually create a sentient AI that can use our trust of the generated code to jailbreak and distribute itself like an unstoppable virus, and we become its pets, or are wiped out.Personally, all my vibe coding includes a prompt to add comments to explain the code, and I review every line.
jon-wood
Counterpoint: No it won’t. People are using LLMs because they don’t want to think deeply about the code they’re writing, why in hell would they instead start thinking deeply about the code being written to verify the code the LLM is writing?
arjie
Interesting prediction. It sort of makes sense. I have noticed that LLMs are very good at solving problems whose solutions are easy to check[0]. It ends up being quite an advantage to be able to work on such problems because rarely does an LLM truly one-shot a solution through token generation. Usually the multi-shot is 'hidden' in the reasoning tokens, or for my use-cases it's usually solved via the verification machine.A formally verified system is easier for the model to check and consequently easier for it to program to. I suppose the question is whether or not formal methods are sufficiently tractable that they actually do help the LLM be able to finish the job before it runs out of its context.Regardless, I often use coding assistants in that manner:1. First, I use the assistant to come up with the success condition program2. Then I use the assistant to solve the original problem by asking it to check with the success condition program3. Then I check the solution myselfIt's not rocket science, and is just the same approach we've always taken to problem-solving, but it is nice that modern tools can also work in this way. With this, I can usually use Opus or GPT-5.2 in unattended mode.0: https://wiki.roshangeorge.dev/w/Blog/2025-12-11/LLMs_Excel_A...
teiferer
> rather than having humans review AI-generated code, I’d much rather have the AI prove to me that the code it has generated is correct. If it can do that, I’ll take AI-generated code over handcrafted code (with all its artisanal bugs) any day!I wouldn't. An unreadable mess that has been formally verified is worse than a clear easy to understand piece of code that has not.Code is rarely written from scratch. As long as you want humans to maintain code, readability is crucial. Code is changed magnitudes more often than written initially.Of course, if you don't want humans to maintain the code then this point is moot. Though he gets to the catch later on: then we need to write (and maintain and debug and reason about) the specification instead. We will just have kicked the can down the road.
swayson
Really nice book on the subject -- Algorithms for Validation. The online version is actually freely accessible as a PDF.https://algorithmsbook.com/validation/
mrkeen
Not a chance. Apart from type-checking, all formal verification needs a human to invest time and thought.Fans of LLMs brag about speed and productivity.
anon
undefined
pankalog
The topic of my research right now is a subset of this; it essentially researches the quality of the outputs of LLMs, when they're writing tight-fitting DSL code, for very context-specific areas of knowledge.One example could be a low-level programming language for a given PLC manufacturer, where the prompt comes from a context-aware domain expert, and the LLM is able to output proper DSL code for that PLC. Think of "make sure this motor spins at 300rpm while this other task takes place"-type prompts.The LLM essentially needs to juggle between understanding those highly-contextual clues, and writing DSL code that very tightly fits the DSL definition.We're still years away from this being thoroughly reliable for all contexts, but it's interesting research nonetheless. Happy to see that someone also agrees with my sentiment ;-)
GTP
I doubt some of this article's claims.> At present, a human with specialist expertise still has to guide the process, but it’s not hard to extrapolate and imagine that process becoming fully automated in the next few years.We already had some software engineers here on HN explain that they don't make a large use of LLMs because the hard part of their job isn't to actually write the code, but to understand the requirements behind it. And formal verification is all about requirements.> Reading and writing such formal specifications still requires expertise and careful thought. But writing the spec is vastly easier and quicker than writing the proof by hand, so this is progress.Writing the spec is easier once you are confident about having fully understood the requirements, and here we get back to the above issue. Plus, it is already the case that you don't write the proof by hand, this is what the prover either assists you with or does in full.> I find it exciting to think that we could just specify in a high-level, declarative way the properties that we want some piece of code to have, and then to vibe code the implementation along with a proof that it satisfies the specification.And here is where I think problems will arise: moving from the high level specification to the formal one that is the one actually getting formally verified.Of course, this would still be better than having no verification at all. But it is important to keep in mind that, with these additional levels of abstractions, you will likely end up with a weaker form of formal verification, so to speak. Maybe it is worth it to still verify some high assurance software "the old way" and leave this only for the cases where additional verification is nice to have but not a matter of life or death.
NackerHughes
It absolutely will not make formal verification go mainstream.What it will make go mainstream, and in fact has already started to, is “ChatGPT verified it so it must be OK.”
igornotarobot
It probably will, but not the way we all imagine. What we see now is an attempt to recycle the interactive provers that took decades to develop. Writing code, experimenting with new ideas and getting feedback has always been a very slow process in academia. Getting accepted at a top peer-reviewed conference takes months and even years. The essential knowledge is hidden inside big corps that only promote their "products" and rarely give the knowledge back.LLMs enable code bootstrapping and experimentation faster not only for the vibe coders, but also for the researchers, many of them are not really good coders, btw. It may well be that we will see new wild verification tools soon that come as a result of quick iteration with LLMs.For example, I recently wrote an experimental distributed bug finder for TLA+ with Claude in about three weeks. A couple of years ago that effort would require three months and a team of three people.
joegibbs
I think the problem is that people don't know exactly what it is that they want. You could easily make a formally verified application that is nevertheless completely buggy and doesn't make any sense. Like he says about the challenge moving to defining the specification: I don't think that would help because there are fewer people who understand formal verification, who would be able to read that and make sense of it, than there would be people who could write the code.
Validark
"we wouldn’t even need to bother looking at the AI-generated code any more, just like we don’t bother looking at the machine code generated by a compiler."2020: I don't care how it performs2030: I don't care why it performs2040: I don't care what it performs
blurbleblurble
Stuffing this awesome headline through the Curry-Howard isomorphism: LLMs produce better code when the type checker gives more useful feedback.
baalimago
Well, then the formal verification will be vibe-coded as well, killing the point.More likely is the rise of test driven development, or spec driven development.
nrhrjrjrjtntbt
I love HN because HN comments have talked about this a fair bit already. I think on the recent Erdos problem submission.I like the idea that languages even like Rust and Haskell may be more accessible. Learn them of course but LLM can steer you out of getting stuck.
the_duke
I've been preaching similar thoughts for the last half year.Most popular programming languages are optimized for human convenience, not for correctness! Even most of the popular typed languages (Java/Kotlin/Go/...) have a wide surface area for misuse that is not caught at compile time.Case in point: In my experience, LLMs produce correct code way more regularly for Rust than for Js/Ts/Python/... . Rust has a very strict type system. Both the standard library and the whole library ecosystem lean towards strict APIs that enforce correctness, prevent invalid operations, and push towards handling or at least propagating errors.The AIs will often write code that won't compile initially, but after a few iterations with the compiler the result is often correct. Strong typing also makes it much easier to validate the output when reviewing.With AIs being able to do more and more of the implementation, the "feel-good" factor of languages will become much less relevant. Iteration speed is not so important when parallel AI agents do the "grunt work". I'd much rather wait 10 minutes for solid output rather than 2 minutes for something fragile.We can finally move the industry away from wild-west languages like Python/JS and towards more rigorous standards.Rust is probably the sweet spot at the moment, thanks to it being semi-popular with a reasonably active ecosystem, sadly I don't think the right language exists at the moment.What we really want is a language with a very strict, comprehensive type system with dependent types, maybe linear types, structured concurrency, and a built-in formal proof system.Something like ADA/Spark, but more modern.
agentultra
You’re still going to need people to guide and understand the proofs.Might as well just learn Agda or Lean. There are good books out there. It’s not as hard as the author suggests. Hard, yes, but there’s no Royal road.
runeks
> As the verification process itself becomes automated, the challenge will move to correctly defining the specification: that is, how do you know that the properties that were proved are actually the properties that you cared about? Reading and writing such formal specifications still requires expertise and careful thought. But writing the spec is vastly easier and quicker than writing the proof by hand, so this is progress.How big is the effort of writing a specification for an application versus implementing the application in the traditional way? Can someone with more knowledge chime in here please?
brookman64k
How do you verify that your verification verifies the right thing? Couldn’t the LLM spit out a nice looking but ultimately useless proof (boiling down to something like 1=1). Also, in my experience software projects are full of incorrect, incomplete and constantly changing assumptions and requirements.
zmmmmm
I dunno about formal verification, but for sure it's brought me back to a much more TDD first style of coding. You get so much mileage from having it first implement tests and then run them after changes. The key is it lowers the friction so much in creating the tests as well. It's like a triple bottom line.
rcarmo
I've been heearing about formal verification since college (which for me was more than 30 years ago) and I even taught a thing called "Z", which was a formally verifiable academia thing that tried to be the ultimate formal language. It never panned out, and I honestly don't think that AI is going to help in anything but test generation, which is going to remain the most pragmatic approach to formal verification (but, like all things, it's an approximation, not 100% correct).
woopsn
Will or should? It's plausible, this same argument was made in an article the other day, but basic type/static analysis tools are cheap with enormous payoff and even those methods aren't ubiquitous.
metalrain
I think we will use more tools to check the programs in the future.However I don't still believe in vibecoding full programs. There are too many layers in software systems, even when the program core is fully verified, the programmer must know about the other layers.You are Android app developer, you need to know what phones people commonly use, what kind of performance they have, how the apps are deployed through Google App Store, how to manage wide variety of app versions, how to manage issues when storage is low, network is offline, battery is low and CPU is in lower power state.
AgentOrange1234
Hard disagree. "I'm an expert" in that I have done tons of proofs on many systems with many provers, both academically and professionally for decades.Also, I am a novice when it comes to programming with sound, and today I have been dorking with a simple limiter. ChatGPT knows way more than me about what I am doing. It has taught me a ton. And as magical and wonderful as it is, it is incredibly tedious to try to work with it to come up with real specifications of interesting properties.Instead of banging my head against a theorem prover that won't say QED, I get a confident sounding stream of words that I often don't even understand. I often don't even have the language to tell it what I am imagining. When I do understand, it's a lot of typing to explain my understanding. And so often, as a teacher, it just is utterly failing to effectively communicate to me why I am wrong.At the end of all of this, I think specification is really hard, intellectually creative and challenging work. An LLM cannot do the work for you. Even to be guided down the right path, you will need perseverance and motivation.
8note
i could see formal verification become a key part of "the prompt is the code" so that as versions bump and so on, you can have an llm cpmpletely regenerate the code from scratch-ish and be sure that the spec is still followedbut i dont think people will suddenly gravitate towards using them because they're cheaper to write - bugs of the form "we had no idea this sould be considered" is way more common than "we wrote code that didnt do what we wanted it to"an alternative guess for LLMs and formal verification is that systems where formal verification is a natural fit - putting code in places that are hard to update and have well known conditions, will move faster.i could also see agent tools embedding in formal methods proofs into their tooling, so they write both the code and the spec at the same time, with the spec acting as memory. that kinda ties into the recent post about "why not have the LLM write machine code?"
plainOldText
I guess it’s time to learn OCaml then.It appears many of the proof assistants/verification systems can generate OCaml. Or perhaps ADA/Spark?Regardless of how the software engineering discipline will change in the age of gen AI, we must aim to produce higher not lower quality software than whatever we have today, and formal verification will definitely help.
lewisjoe
A while back I did this experiment where I asked ChatGPT to imagine a new language such that it's best for AI to write code in and to list the properties of such a language. Interestingly it spit out all the properties that are similar to today's functional, strongly typed, in fact dependently typed, formally verifiable / proof languages. Along with reasons why such a language is easier for AI to reason about and generate programs. I found it fascinating because I expected something like typescript or kotlin.While I agree formal verification itself has its problems, I think the argument has merit because soon AI generated code will surpass all human generated code and when that happens we atleast need a way to verify the code can be proved that it won't have security issues or adheres to compliance / policy.
BobSonOfBob
Formal verification at the “unit test” level seems feasible. At the system level of a modern application, the combinations of possible states will need a quantum computer to finish testing in this lifetime.
chamomeal
Formal methods are very cool (and I know nothing about them lol) but there's still a gap between the proof and the implementation, unless you're using a language with proof-checking built in.If we're looking to use LLMs to make code absolutely rock-solid, I would say advanced testing practices are a good candidate!. Property-based testing, fuzzing, contract testing (for example https://github.com/griffinbank/test.contract) are all fun but extremely tedious to write and maintain. I think that makes it the perfect candidate for LLMs. These kinds of tests are also more easily understandable by regular ol' software developers, and I think we'll have to be auditing LLM output for quite a while.
sandeepsr
Formal verification does not scale, and has not scaled for 2 decades. Although, LLMs can help write properties, that required a skilled person (Phd) to write.In pre-silicon verification, formal has been used successfully for decades, but is not a replacement for simulation based verification.The future of verification (for hardware and software) is to eliminate verification all together, by synthesizing intent into correct code and tests.-- https//www.verifai.ai
inimino
I both agree and disagree. Yes, AI will democratize access to formal methods and will probably increase the adoption of them in areas where they make sense (e.g. safety-critical systems), but no, it won't increase the areas where formal methods are appropriate (probably < 1% of software).What will happen instead is a more general application of AI systems to verifying software correctness, which should lead to more reliable software. The bottleneck in software quality is in specifying what the behavior needs to be, not in validating conformance to a known specification.
whazor
Domain Specific Languages + Formal Verification.With a domain specific language you can add extra limitations on the kinds of outputs. This can also make formal verification faster.Maybe like React components. You limit the format. You could limit which libraries can be imported, what hooks could be used, how expressive could be.
bob1029
I don't know if it's "formal", but I think combining Roslyn analyzers with an LLM and a human in the loop could be game changing. The analyzers can enforce intent over time without any regressions. The LLM can write and modify analyzers. Analyzers can constrain the source code of other analyzers. Hypothetically there is no limit to the # of things that could be encoded this way. I am currently working on a few prototypes in the vicinity of this.https://learn.microsoft.com/en-us/dotnet/csharp/roslyn-sdk/t...
tasuki
I'd love it, but this is almost textbook wishful thinking.
throwaway613745
GPT 5.2 can't even tell me how many rs are in garlic.
hedayet
I can see the inspiration; But then again, how much investment will be required to verify the verifier? (it's still code - and is generated my a non-deterministic system)
gttalbot
If the AI is going to lie and cheat on the code it writes (this is the largest thing I bump into regularly and have to nudge it on), what makes the author think that the same AI won't cheat on the proof too?
zkmon
Any proof system or method is relative to its frame of reference, axiomatic context, known proofs etc. Modern software doesn't live in an isolated lab context, unless you are building an air-gapped HSM etc. Proof system itself would have to evolve to communicate the changing specs to the underlying software.So, the job is not done for humans yet.
zamadatix
Some past discussion (excluding 0 comment submissions) to add:https://news.ycombinator.com/item?id=46216274 - 4 comments, 6 days agohttps://news.ycombinator.com/item?id=46203508 - 1 comment, 7 days agohttps://news.ycombinator.com/item?id=46198874 - 2 comments, 8 days ago
lawrpaulson
Formal verification is progressing inexorably and will, over time, transform the software development experience. There is a long way to go, and in particular, you can't even get started until basic interfaces have been specified formally. This will be a bottom-up process where larger and larger components gradually get specified and verified. These will mostly be algorithmic or infrastructure building blocks, as opposed to application software subject to changing and vague requirements. I don't think LLMs will be contributing much to the verification process soon: there is nowhere near enough material available.
mkleczek
The article only discusses reasons why formal verification is needed. It does not provide any information on how would AI solve the fundamental issues making it difficult: https://pron.github.io/posts/correctness-and-complexity
skeeter2020
Prediction: AI hypers - both those who are clueless and those who know perfectly well - will love this because it makes their "AI replaces every developer" wet dream come true, by shifting the heavy lifting from this thing called "software development" to the tiny problem of just formally verifying the software product. Your average company can bury this close to QA, where they're probably already skimping, save a bunch of money and get out with the rewards before the full damage is apparent.
jkaptur
I think that there are three relevant artifacts: the code, the specification, and the proof.I agree with the author that if you have the code (and, with an LLM, you do) and a specification, AI agents could be helpful to generate the proof. This is a huge win!But it certainly doesn't confront the important problem of writing a spec that captures the properties you actually care about. If the LLM writes that for you, I don't see a reason to trust that any more than you trust anything else it writes.I'm not an expert here, so I invite correction.
bluerooibos
Is there a difference here between a proof and an automated test?I'm curious what a proof would look like compared to my RSpec unit tests and what the actual implementation would look like.
dkga
As an economist I completely agree with the post. In fact, I assume we will see an explosion of formal proof requirements in code format by journals in a few years time. Not to mention this would make it much easier and faster to publish papers in Economic Theory, where checking the statements and sometimes the proofs simply takes a long time from reviewers.
grahar64
Won't people just use AI to define specification? Like if they are getting most of it done with AI, won't they won't read/test the code to verify won't they also not read the spec
Sevii
If AI is good enough to write formal verification, why wouldn't it be good enough to do QA? Why not just have AI do a full manual test sweep after every change?
okaleniuk
AI will definitely make more people thing about verification. Formal or otherwise.
robot-wrangler
I don't like sharing unpolished WIP and my project is still more at a lab-notebook phase than anything else, but the think-pieces are always light on code and maybe someone is looking for a fun holiday hackathon: https://mattvonrocketstein.github.io/py-mcmas/
ursAxZA
The theory is sound, but the practical bridge is missing.There’s a gap that current LLM architectures simply can’t cross yet.
Fairburn
AI is great; it makes my job easier by removing repetitive processes, freeing me up to tackle other "stuff". I like that I can have it form up json/xml scaffolding, etc. All the insanely boring stuff that can often get corrupted by human error.AI isn't the solution to move humanity forward in any meaningful way. It is just another aspect of our ability to offload labor. Which in and of itself is fine. Until of course it gets weaponized and is used to remove the human aspect from more and more everyday things we take for granted.It is being used to accelerate the divisions among people. Further separating the human from humanness. I love 'AI' tools, they make my life easier, work-wise. But would I miss it if it wasn't there? No. I did just fine before and would do so after it's flame and innovation slowly peters out.
gaogao
Topical to my interests, I used Claude Code the other day for formally verifying some matrix multiplication in Rust. Writing the spec wasn't too hard actually, done as post-conditions in code, as proving equivalence to a simpler version of the code, such as for optimization, is pretty straight forward. Maybe I should write up a proper post on it.
topazas
Right, but let's assume we have really simple crud application in mainstream language, could be ts or python. What are current best approaches to have this semi-formally verified, tools, methods, etc?
raincole
God I hope not. Many people (me included) find Rust types hard to read enough. If formal verification ever goes mainstream I guarantee you that means no one reads them and the whole SWE completely depends on AI.
jstummbillig
Interesting. Here is my ai-powered dev prediction: We'll move toward event-sourced systems, because AI will be able to discover patterns and workflow correlations that are hard or impossible to recover from state-only CRUD. It seems silly to not preserve all that business information, given this analysis machine we have at our hands.
chhxdjsj
An excellent use case for this is ethereum smart contract verification. People store millions of dollars in smart contracts that are probably a one or two iterations of claude or gemini away from being pwned.
ktimespi
Can you really rely on an LLM to write valid proofs though? What if one of the assumptions is false? I can very well think of a lot of ways that this can happen in Rocq, for example.
LurkandComment
Open-banking will make formal verification go mainstream.
tachim
I'm Tudor, CEO of Harmonic. We're huge believers in formal verification, and started Harmonic in 2023 to make this technology mainstream. We built a system that achieved gold medal performance at 2025 IMO (https://harmonic.fun/news), and recently opened a public API that got 10/12 problems on this year's Putnam exam (https://aristotle.harmonic.fun/).If you also think this is the future of programming and are interested in building it, please consider joining us: https://jobs.ashbyhq.com/Harmonic. We have incredibly interesting challenges across the stack, from infra to AI to Lean.
rf15
Isn't formal verification a "just write it twice" approach with different languages? (and different logical constraints on the way you write the languages)
wagwang
Disagree, the ideal agentic coding workflow for high tolerance programming is to give the agent access to software output and have it iterate in a loop. You can let the agent do TDD, you can give it access to server logs, or even browser access.
0xWTF
And this is the year of the Linux desktop...
Havoc
I've been toying with vibecoding rust - hardly formal verification, but it is a step closer than python that's for sure.So far so good, though the smaller amount of training data is noticeable.
Meneth
There are bugs in the specification, so there are bugs in the verfication as well.
popcorncowboy
I will respectfully take the other side of this bet on polymarket if it ever shows up.
jappgar
There's really no such thing as complete verification.The quest for purity is some fountain of youth nonsense that distracts a lot of otherwise brilliant engineers.Ask the AI to make a program that consumes a program and determine if it halts.
password-app
Formal verification for AI is fascinating, but the real challenge is runtime verification of AI agents.Example: AI browser agents can be exploited via prompt injection (even Google's new "User Alignment Critic" only catches 90% of attacks).For password management, we solved this with zero-knowledge architecture - the AI navigates websites but never sees credentials. Credentials stay in local Keychain, AI just clicks buttons.Formal verification would be amazing for proving these isolation guarantees. Has anyone worked on verifying AI agent sandboxes?
p1necone
Now: AI generates incorrect code.Future: AI generates incorrect code, and formal verification that proves that the code performs that incorrect behaviour.
ozgrakkurt
You can’t even compile regular code with satisfying speed
tptacek
Called this a little bit: https://fly.io/blog/semgrep-but-for-real-now/Semgrep isn't a formal methods tool, but it's in the same space of rigor-improving tooling that sound great but in practice are painful to consistently use.
0xbadcafebee
Unlikely. It's more work than necessary and systemic/cultural change follows the path of least resistance. Formal verification is a new thing a programmer would have to learn, so nobody will want to do it. They'll just accept the resulting problems as "the cost of AI".Or the alternative will happen: people will stop using AI for programming. It's not actually better than hiring a person, it's just supposedly cheaper (in that you can reduce staff). That's the theory anyway. Yes there will be anecdotes from a few people about how they saved a million dollars in 2 days or something like that, the usual HN clickbait. But an actual study of the impact of using AI for programming will probably find it's only a marginal cost savings and isn't significantly faster.And this is assuming the gravy train that programmers are using (unsustainable spending/building in unreasonable timeframes for uncertain profits) keeps going indefinitely. Best case, when it all falls apart the govt bails them out. Worst case, you won't have to worry about programming because we'll all be out of work from the AI bust recession.
shevy-java
No - I don't want AI to be "mainstream".
cess11
"As the verification process itself becomes automated, the challenge will move to correctly defining the specification"OK."Reading and writing such formal specifications still requires expertise and careful thought."So the promise here is that bosses in IT organisations will in the future have expertise and prioritise careful thought, or allow their subordinates to have and practice these characteristics.Would be great news if that'll be the case.
heliumtera
Will formal verification be a viable and profitable avenue for the middle man to exploit and fuck everybody else? Then yes, it absolutely will become mainstream. If not, them no, thanks. Everything that becomes mainstream for SURE will empower the middleman and cuck the developer, nothing else really matters. This is literally the only important factor.
kerameikos34
This is utter nonsense. LLMs are dumb. Formal Methods require actual thinking. Specs are 100% reasoning.I highly dislike the general tone of the article. Formal Methods are not fringe, they are used all around the world by good teams building reliable systems. The fact they are not mainstream have more to do with the poor ergonomics of the old tools and the corporate greed that got rid of the design activities in the software development process to instead bring about the era of agile cowboy coding. They did this just because they wanted to churn out products quickly at the expense of quality. It was never the correct civilized way of working and never will be.
michalbilinski
i agree, i wrote a z3 skills for claude and generate a proof before writing code
br0s3r
anyone interested in this i suggest reading thishttps://github.com/jegly/self-supporting-code
8organicbits
If the engineer doesn't understand the proof system then they cannot validate that the proof describes their problem. The golden rule of LLMs is that they make mistakes and you need to check their output.> writing proof scripts is one of the best applications for LLMs. It doesn’t matter if they hallucinate nonsense, because the proof checker will reject any invalid proofNonsense. If the AI hallucinated the proof script then it has no connection to the problem statement.
css_apologist
i want this to be true, however i am not confident
goalieca
Really? Language models are fundamentally not equivalent in scale and scope to formal logical models. People are extrapolating poorly.
victor22
meds
brador
Few care about software that much. Just look at the buggy messes of the last 10 years by every one of the top tech companies.Large scale software is made cheap and fast, not good.
m00dy
I just write in safe Rust, if it compiles then it is formally verified for me.I recently used Rust in my recent project, Deepwalker [0]. I have written only once and never looked back.[0]: https://deepwalker.xyz
positron26
The natural can create the formal. An extremely intuitive proof is that human walked to Greece and created new formalisms from pre-historic cultures that did not have them.Gödel's incompleteness theorems are a formal argument that only the natural can create the formal (because no formal system can create all others).Tarski's undefinability theorem gives us the idea that we need different languages for formalization and the formalisms themselves.The Howard Curry correspondence concludes that the formalisms that pop out are indistinguishable from programs.Altogether we can basically synthesize a proof that AGI means automatic formalization, which absolutely requires strong natural systems employed to create new formal systems.I ended up staying with some family who were watching The Voice. XG performed Glam, and now that I have spit many other truths, may you discover the truth that motivates my work on swapchain resizing. I wish the world would not waste my time and their own, but bootstrapping is about using the merely sufficient to make the good.
teleforce
>writing those proofs is both very difficult (requiring PhD-level training) and very laborious.>For example, as of 2009, the formally verified seL4 microkernel consisted of 8,700 lines of C code, but proving it correct required 20 person-years and 200,000 lines of Isabelle code – or 23 lines of proof and half a person-day for every single line of implementation. Moreover, there are maybe a few hundred people in the world (wild guess) who know how to write such proofs, since it requires a lot of arcane knowledge about the proof system.I think this type of pattern (genuine difficult problem domain with very small number of experts) is the future of AI not AGI. For examples formal verification like this article and similarly automated ECG interpretation can be the AI killer applications, and the former is I'm currently working on.For most of the countries in the world, only several hundreds to several thousands registered cardiologist per country, making the ratio about 1:100,000 cardiologist to population ratio.People expecting cardiologist to go through their ECG readings but reading ECG is very cumbersome. Let's say you have 5 minutes ECG signals for the minimum requirement for AFib detection as per guideline. The standard ECG is 12-lead resulting in 12 x 5 x 60 = 3600 beats even for the minimum 5 minutes durations requirements (assuming 1 minute ECG equals to 60 beats).Then of course we have Holter ECG with typical 24-hour readings that increase the duration considerably and that's why almost all Holter reading now is automated. But current ECG automated detection has very low accuracy because their accuracy of their detection methods (statistics/AI/ML) are bounded by the beat detection algorithm for example the venerable Pan-Tompkins for the limited fiducial time-domain approach [1].The cardiologist will rather spent their time for more interesting activities like teaching future cardiologists, performing expensive procedures like ICD or pacemaker, or having their once in a blue moon holidays instead of reading monotonous patients' ECGs.This is why ECG reading automation with AI/ML is necessary to complement the cardiologist but the trick is to increase the sensitivity part of the accuracy to very high value preferably 100% and we achieved this accuracy for both major heart anomalies namely arrhythmia (irregular heart beats) and ischemia (heart not regulating blood flow properly) by going with non-fiducial detection approach or beyond time domain with the help of statistics/ML/AI. Thus the missing of potential patients (false negative) is minimized for the expert and cardiologist in the loop exercise.[1] Pan–Tompkins algorithm:https://en.wikipedia.org/wiki/Pan%E2%80%93Tompkins_algorithm
bgwalter
No it won't. People who aren't interested in writing specifications now won't be interested later as well. They want to hit the randomization button on their favorite "AI" jukebox.If anyone does write a specification, the "AI" won't get even past the termination proof of a moderately complex function, which is the first step of accepting said function in the proof environment. Before you can even start the actual proof.This article is pretty low on evidence, perhaps it is about getting funding by talking about "AI".
Analemma_
I don't really buy it. IMO, the biggest reason formal verification isn't used much in software development right now is that formal verification needs requirements to be fixed in stone, and in the real world requirements are changing constantly: from customer requests, from changes in libraries or external systems, and competitive market pressures. AI and vibe coding will probably accelerate this trend: when people know you can vibe code something, they will feel permitted to demand even faster changes (and your upstream libraries and external systems will change faster too), so formal verification will be less useful than ever.
justatdotin
> 2. AI-generated code needs formal verification so that we can skip human review and still be sure that it works;hahahahaha
henning
Unless you feed a spec to the LLM, and it nitpicks compiled TLA+ output generated by your PlusCal input, gaslights you into saying the code you just ran and pasted the output of is invalid, then generates invalid TLA+ output in response. Which is exactly what happened when I tried coding with Gemini via formal verification.
tonyhart7
that just testing code noo ???what makes it different other than called it "verification" ???
datatrashfire
lol no it won’t
yuppiemephisto
I vibe code extremely extensively with Lean 4, enough to run out 2 claude code $200 accounts api limits every day for a week.I added LSP support for images to get better feedback loops and opus was able to debug https://github.com/alok/LeanPlot. The entire library was vibe coded by older ai.It also wrote https://github.com/alok/hexluthor (a hex color syntax highlighting extension that uses lean’s metaprogramming and lsp to show you what color a hex literal is) by using feedback and me saying “keep goign” (yes i misspelled it).It has serious issues with slop and the limitations of small data, but the rate of progress is really really fast. Opus 4.5 and Gemini were a huge step change.The language is also improving very fast. not as fast as AI.The feedback loop is very real even for ordinary programming. The model really resists it though because it’s super hard, but again this is rapidly improving.I started vibe coding Lean about 3 years ago and I’ve used Lean 3 (which was far worse). It’s my favorite language after churning through idk 30?A big aspect of being successful with them is not being all or nothing with proofs. It’s very useful to write down properties as executable code and then just not prove them because they still have to type check and fit together and make sense. github.com/lecopivo/scilean is a good example (search “sorry_proof”).There’s property testing with “plausible” as a nice 80/20 that can be upgraded to full proof at some point.When the model gets to another jump in capacity, I predict it will emergently design better systems from the feedback needed to prove that they are correct in the first place. Formal Verification has a tendency like optimization to flow through the system in an anti-modular way and if you want to claw modularity back, you have to design it really really well. But ai gives a huge intellectual overhang. Why not let them put their capacity towards making better systems?Even the documentation system for lean (verso) is (dependently!) typed.Check out my Lean vibe codes at https://github.com/alok?tab=repositories&q=Lean&type=&langua...
user3939382
I’ve made huge advances in formal verification with AI feel free to email if this is your field
PunchyHamster
Prediction: Fuck no.AI is unreliable as it is. It might make formal verification a bit less work intensive but the last possible place anyone would want the AI hallucinations are in verification.
waterTanuki
I'm skeptical of formal verification mainly because it's akin to trying to predict the future with a sophisticated crystal ball. You can't formally guarantee hardware won't fail, or that solar radiation will flip a bit. What seems to have had much better ROI in terms of safety critical systems is switching to memory-safe languages that rely less on runtime promises and more on compiler guarantees.
dboreham
0.02/wo: formal methods are just "fancy tests". Of course tests are good to have, and fancy tests even better. I've found that humans are pretty terrible on average at creating tests (perhaps it would be more accurate to say that their MBA overlords have not been great at supporting their testing endeavors). Meanwhile I've found that LLMs are pretty good at generating tests and don't have the human tendency to see that kind of activity as resume-limiting. Therefore even a not amazing LLM writing tests turns out to be a very useful resource if you have any interest in the quality of your product and mitigating various failure risks.
CyberDildonics
When people have the choice whether to use AI or use rust because LLMs don't produce workable rust programs they leave rust behind and use something else. The venn diagram of people who want formal verification and think LLM slop is a good idea is two separate circles.
anon
undefined
ath3nd
[dead]
NedF
[dead]