QA in the AI Agent Era: Why Testing Still Matters and How to Keep Up

AI coding agents are transforming how we build software. What once took days now takes hours — or minutes. But there’s a growing problem: the faster we ship code, the harder it becomes to verify it actually works.

In this post, we’ll look at how teams traditionally ensure software quality, where each approach falls short, and why the AI agent era demands a new kind of QA.

Three Pillars of Quality Assurance

Most teams rely on some combination of three approaches to ensure their software works correctly: test code, manual QA, and production monitoring. Each has clear strengths — and equally clear limits.

Test Code

Automated tests are the foundation of modern software development. Unit tests, integration tests, and end-to-end tests let you validate logic at the module level with precision.

Strengths:

Verify detailed logic at the module and function level
Run automatically in CI/CD pipelines with clear pass/fail output
Catch regressions early in the development cycle

Limitations:

Realistic data is hard to prepare — test fixtures rarely match the complexity of production
Scenario-based testing across multiple user interactions creates a combinatorial explosion that’s impossible to fully cover
External service integrations (social login, payment gateways) are difficult to test authentically

Test code excels at answering “does this function return the right value?” but struggles with “does this workflow actually work for a real user?”

Manual QA

Manual QA fills the gaps that automated tests can’t reach. A human tester interacts with the application as a user would, validating behavior from the outside in.

Strengths:

Validates the application from the user’s perspective before release
Can work with realistic, production-like data
Catches regressions in areas unrelated to the current change
Handles complex, stateful scenarios — specific account states, multi-step workflows, edge cases that require careful setup

Limitations:

External service dependencies (e.g., SNS login, third-party APIs) may be limited or unavailable in test environments
Ad-hoc by nature — the test plan changes with every release, making automation difficult
Slow and expensive to scale as the product grows
Knowledge lives in people’s heads, not in a system

Manual QA is powerful but inherently difficult to systematize. Every release requires a fresh assessment of what to test, and that assessment is hard to reuse.

Production Monitoring and Error Tracking

The final safety net: watching the application in production and catching issues as they happen.

Strengths:

Observes actual user behavior with real data
Covers external service integrations that can’t be tested in staging
Catches issues that no pre-release testing anticipated

Limitations:

Some actions are impractical or risky in production — payment flows, bulk data creation, destructive operations
By the time you find a bug here, it’s already affecting users — this is a reactive approach, not a preventive one

Production monitoring is essential, but it’s not QA — it’s damage control.

The AI Development Speed Problem

AI coding agents have dramatically accelerated the development side of the equation. Features that took a week to implement can now be built in a day. Pull requests are larger, more frequent, and sometimes touch parts of the codebase the human reviewer hasn’t deeply studied.

This creates an asymmetry:

Development speed: 10x faster with AI agents
QA speed: Still largely manual, still the same pace

Automated tests can’t cover everything. Production monitoring catches problems too late. Manual QA is thorough but can’t keep up with the pace of AI-assisted development.

QA is the bottleneck. And it’s a bottleneck that’s getting worse every day.

Why QA Automation Is Harder Than It Looks

If QA is the bottleneck, why not just automate it? The answer is that QA automation has several unique challenges that make traditional approaches insufficient:

Ad-Hoc Test Content

QA plans are inherently ad-hoc. Every pull request changes something different, and the test plan must adapt accordingly. Fixed test scripts break down quickly — you need something that can generate context-aware test plans on the fly.

This is where AI can help with planning, but simply asking an AI agent to “test this PR” introduces its own problems.

Reproducibility

When you ask an AI agent to test something, it might do a great job — once. But can you run the same test again in a different environment? Against staging? Against production after deploy?

Ad-hoc AI-driven testing lacks reproducibility. Without structured, reusable plans, every test run is a one-off.

Consistent Recording

QA results need to be recorded in a consistent, reviewable format. When AI agents handle testing ad-hoc, the output format varies wildly — different levels of detail, different structures, different information captured.

Teams need a unified record of what was tested, what passed, and what failed — not a collection of inconsistent AI chat logs.

Credential Security

QA often requires authentication — logging in as specific users, using API keys, accessing admin panels. Passing these credentials directly to an AI agent is a security risk. Credentials could end up in logs, in context windows, or worse.

Any serious QA automation solution must handle secrets without exposing them to the AI.

How aqua Solves These Challenges

aqua was built specifically for this moment — when AI agents make development fast, but QA can’t keep up.

AI-Powered Planning, Structured Execution

aqua separates planning from execution. Your AI coding agent creates structured QA plans by analyzing your codebase and PR changes. These plans are then executed by aqua’s dedicated execution engine — not by the AI agent itself.

This means you get the intelligence of AI for planning what to test, with the reliability of a purpose-built engine for actually running the tests.

Reusable, Versioned Plans

Every QA plan in aqua is versioned and immutable. Write a plan once, run it across environments — local, staging, production. When you push a fix, re-run the same plan to verify. No need to re-explain or regenerate.

Automatic Result Capture

aqua automatically records HTTP request/response pairs, screenshots, DOM snapshots, and assertion results in a consistent format. Every team member can review results in the web dashboard — no more pasting screenshots in Slack.

Secrets Never Leave Your Machine

aqua integrates with secret providers like 1Password, AWS Secrets Manager, GCP Secret Manager, and HashiCorp Vault. Credentials are resolved locally and never sent to AI agents or the aqua service. Sensitive data in test artifacts is automatically masked before reaching the server.

Project Memory

aqua’s project memory accumulates knowledge across QA sessions — reliable selectors, authentication flows, API patterns. Your AI agent gets smarter about your application over time, reducing the need to re-explain your app’s structure with every test cycle.

Getting Started

aqua works with any MCP-compatible AI coding agent — Claude Code, Cursor, Windsurf, and more. Install the CLI, connect it to your agent, and start creating QA plans in minutes.

Check out the quickstart guide to get started, or visit aquaqa.com to learn more.

The AI agent era demands a new approach to QA — one that’s as intelligent and fast as the development tools we now use. aqua brings that approach to life.

This post covered the big picture — why QA is the bottleneck and what it takes to fix it. In upcoming posts, we’ll dig into the specifics: real workflows, CI/CD integration patterns, environment management strategies, and lessons learned from teams using aqua in production. Stay tuned.