Future of Work7w ago

AI QA Automation Fails to Catch Novel Bugs, Causing Major Business Disruption

Reddit Community

Community Problem

Elevator Pitch

Companies are over-automating QA with AI, missing critical novel bugs in new features. This leads to increased customer escalations and security risks, demonstrating the need for a hybrid AI-human approach to testing.

Full Description

We got swept up in the AI automation wave. Cut QA team from 8 to 4. Implemented AI-powered testing that promised equivalent coverage at lower headcount.

Quarter results: highest bug rate we'd ever shipped. Customer escalations tripled. Two enterprise customers demanded emergency security reviews.

What went wrong:

AI testing was excellent at regression testing. It found bugs in existing functionality when we changed code. Coverage there actually improved.

AI testing was terrible at exploratory testing. Finding unexpected issues. Testing edge cases that weren't in the training data. The things that experienced QA engineers catch through intuition.

The bugs that shipped weren't regression bugs. They were novel bugs in new functionality. The areas where AI testing had no historical data to learn from.

We've since re-hired 2 QA engineers. The AI still does regression testing. Humans do exploratory testing. The combination works better than either alone.

The automation promise wasn't wrong, it was incomplete. AI is a tool, not a replacement. The companies that treat it as a replacement are learning expensive lessons.

Get involved

Discussion

No comments yet. Be the first to share your thoughts.

From the Reddit thread(8 top comments)

  • 67·Reddit commenter·1mo ago

    My company removed nearly 60% of the qa team. Let's see how it goes in my firm.

    permalink ↗
  • 28·Reddit commenter·1mo ago

    Bruh, QA is where you need to put more resources if you are going to push hard on AI

    permalink ↗
  • 14·Reddit commenter·1mo ago·reply

    Thoughts and prayers. You aren't in flight control / avionics / missile guidance software dev are you? Actually - don't answer that XD

    permalink ↗
  • 9·Reddit commenter·1mo ago

    When say it wasn't very good at the edge cases, what are the kind of edge cases what are undocumented that would still get picked up by intuition? I struggle with understanding this concept in a way that makes me not believe this kind of stuff.

    permalink ↗
  • 9·Reddit commenter·1mo ago·reply

    Small visual bugs and misalignments, years of experience knowing certain things cause issues generally but not specifically in this data set, and the most valuable: this just doesn’t work right even though it’s exactly what the PM asked for.

    permalink ↗
  • 7·Reddit commenter·1mo ago

    Isn't it too that the Devs used more AI to make these features so didn't account for all the edge cases there while they did before?

    permalink ↗
  • 7·Reddit commenter·1mo ago

    Time for devs to show some ownership and be responsible for what they’re shipping.

    permalink ↗
  • 6·Reddit commenter·1mo ago

    Sounds like you learned the hard way that AI excels at \*validation\* (does it still work?) but struggles with \*verification\* (does it do the right thing?). AI can't anticipate edge cases or creatively break things like a seasoned QA engineer. You need to refocus the remaining 4 people on exploratory testing and critical path workflows, and use the AI for the repetitive stuff. Think of AI as a force multiplier for your \*best\* testers, not as a replacement for them.

    permalink ↗

More in Future of Work