A new era for software testing

<- Back

A new era for software testing

Chrisszz

Comments (54)

rglover
> I have the feeling that the introduction of automatic QA may raise the bar of quality for new releases of software, and maybe partially compensate for the lower quality of the code produced at high speed with the use of automatic programming.In theory. The only difference between today and "the aughts" is that we have machines that can spit out a ton of code very quickly.Nothing has changed about the discipline or honesty around testing (you can skip automated tests even faster now if you wish). You can and should work with AI to write tests, but you have to know the difference between a good test and a "looks good on paper" test in order for it to truly be effective and raise the quality of what you're building.
jason_s
Please use a more readable variable-width font
mlmonkey
Writing unit tests used to be the bane of my existence. I used to hate them. Often times, the LoC for unit tests was 3X the LoC of the actual code.But not any more! Now I point the LLM to the code and order it to write unit tests, covering all edge cases, etc. I'd rather spend 3 hours arguing with the LLM than writing unit tests! :-D
bob1029
If you are working with a web application, playwright + frontier LLM is incredibly capable. They added some recent features to make this sort of use case go a lot smoother:https://playwright.dev/docs/release-notes#version-159If you set this up correctly, you can have a main agent issue natural language testing instructions to this playwright agent which returns a natural language summary of what it experiences. This is the sort of thing where I begin to get interested in the idea of agents working while I sleep.
avensec
> The idea is to create a markdown file where an AI agent is asked to work as a QA engineerGiven your code-base is mature enough, please don't have a single Skill/Steering/Persona/Ruleset (or whatever) for your "QA Engineer." This is just the same "my behavioral file can one-shot the entire system build" kind of thinking that will give you expensive, marginal results as the system grows.If you want to have success in this space, get really fine-grained. Every single test scope needs its own behavioral files.Have your core behavioral file define some simple specifics around Test Pyramid, Test Purposes, checks for tautological tests, etc. Then get _really_ specific;<test-type>-architect (plan)<test-type>-engineer (execute)<test-type>-resolver (problem solver, maintenance, how to manage a failure, etc.)e.g., playwright-architect, etc.Then create additional ones for Unit tests, API tests, contract tests, or any other required test layer for the SUT.Overengineered? Maybe given the size of your codebase. But for anything significant, you are codifying what humans and their skillsets do.
kulahan
Isn’t this explicitly the one place you’d never want to use AI? Like, the only actual problem with AI is that it sometimes ignores errors in output like it has a PHD in Blindness To Problems. I always figured the path forward was strictly enforced and managed tests written by hand, because who gives a shit about the code behind it as long as you can prove that the output is real?Ten million blackboxes with ten billion tests or whatever. Otherwise it’s literally the blind leading the blind
marshalhq
I ran mutation testing on a side project recently and found a test that passed even if the production method returned an empty string. AI-generated tests at scale will have exactly this problem. High coverage, confident test names, zero actual verification.
simianwords
Scenario testing is the new word for it and I think this is a game changer.Two of the reasons I never liked writing tests is- they didn’t seem to usually assert much internal logic- they would have to be maintained along with the original codeI think scenario testing is much better instead because the actual way a person uses a feature hardly changes but the internals might change a lot.So imagine I’m making an e-commerce website. There are lots of internal mechanisms. I’ll have an agent testing all the functionalities as if it were a customer. This gives me much much more confidence while writing code because it is more uncorellated with the code.Tomorrow I can change a lot of internals but the testing agent stays the same.There’s something to note though: not all code is possible to be scenario tested. Like data engineering and other things where the feedback time is huge.
wrxd
I believe this can work if done on top of traditional testing. I would feel very uneasy to replace deterministic (ok, not always but mostly) test suites with something that is not deterministic at all
wesselbindt
The idea of injecting more indeterminacy in pipelines is beyond me.
npodbielski
What is the point of asking LLM to do manual testing? IMHO it would be much better to make it write automated tests. So you can just rerun them?
kofj
[dead]
tomaspiaggio12
[dead]
3192875
[flagged]