AI coding agents are transforming how we build software. What once took days now takes hours — or minutes. But there’s a growing problem: the faster we ship code, the harder it becomes to verify it actually works.
In this post, we’ll look at how teams traditionally ensure software quality, where each approach falls short, and why the AI agent era demands a new kind of QA.
Three Pillars of Quality Assurance
Most teams rely on some combination of three approaches to ensure their software works correctly: test code, manual QA, and production monitoring. Each has clear strengths — and equally clear limits.
Test Code
Automated tests are the foundation of modern software development. Unit tests, integration tests, and end-to-end tests let you validate logic at the module level with precision.
Strengths:
- Verify detailed logic at the module and function level
- Run automatically in CI/CD pipelines with clear pass/fail output
- Catch regressions early in the development cycle
Limitations:
- Realistic data is hard to prepare — test fixtures rarely match the complexity of production
- Scenario-based testing across multiple user interactions creates a combinatorial explosion that’s impossible to fully cover
- External service integrations (social login, payment gateways) are difficult to test authentically
Test code excels at answering “does this function return the right value?” but struggles with “does this workflow actually work for a real user?”
Manual QA
Manual QA fills the gaps that automated tests can’t reach. A human tester interacts with the application as a user would, validating behavior from the outside in.
Strengths:
- Validates the application from the user’s perspective before release
- Can work with realistic, production-like data
- Catches regressions in areas unrelated to the current change
- Handles complex, stateful scenarios — specific account states, multi-step workflows, edge cases that require careful setup
Limitations:
- External service dependencies (e.g., SNS login, third-party APIs) may be limited or unavailable in test environments
- Ad-hoc by nature — the test plan changes with every release, making automation difficult
- Slow and expensive to scale as the product grows
- Knowledge lives in people’s heads, not in a system
Manual QA is powerful but inherently difficult to systematize. Every release requires a fresh assessment of what to test, and that assessment is hard to reuse.
Production Monitoring and Error Tracking
The final safety net: watching the application in production and catching issues as they happen.
Strengths:
- Observes actual user behavior with real data
- Covers external service integrations that can’t be tested in staging
- Catches issues that no pre-release testing anticipated
Limitations:
- Some actions are impractical or risky in production — payment flows, bulk data creation, destructive operations
- By the time you find a bug here, it’s already affecting users — this is a reactive approach, not a preventive one
Production monitoring is essential, but it’s not QA — it’s damage control.
The AI Development Speed Problem
AI coding agents have dramatically accelerated the development side of the equation. Features that took a week to implement can now be built in a day. Pull requests are larger, more frequent, and sometimes touch parts of the codebase the human reviewer hasn’t deeply studied.
This creates an asymmetry:
- Development speed: 10x faster with AI agents
- QA speed: Still largely manual, still the same pace
Automated tests can’t cover everything. Production monitoring catches problems too late. Manual QA is thorough but can’t keep up with the pace of AI-assisted development.
QA is the bottleneck. And it’s a bottleneck that’s getting worse every day.
Why QA Automation Is Harder Than It Looks
If QA is the bottleneck, why not just automate it? The answer is that QA automation has several unique challenges that make traditional approaches insufficient:
Ad-Hoc Test Content
QA plans are inherently ad-hoc. Every pull request changes something different, and the test plan must adapt accordingly. Fixed test scripts break down quickly — you need something that can generate context-aware test plans on the fly.
This is where AI can help with planning, but simply asking an AI agent to “test this PR” introduces its own problems.
Reproducibility
When you ask an AI agent to test something, it might do a great job — once. But can you run the same test again in a different environment? Against staging? Against production after deploy?
Ad-hoc AI-driven testing lacks reproducibility. Without structured, reusable plans, every test run is a one-off.
Consistent Recording
QA results need to be recorded in a consistent, reviewable format. When AI agents handle testing ad-hoc, the output format varies wildly — different levels of detail, different structures, different information captured.
Teams need a unified record of what was tested, what passed, and what failed — not a collection of inconsistent AI chat logs.
Credential Security
QA often requires authentication — logging in as specific users, using API keys, accessing admin panels. Passing these credentials directly to an AI agent is a security risk. Credentials could end up in logs, in context windows, or worse.
Any serious QA automation solution must handle secrets without exposing them to the AI.
How aqua Solves These Challenges
aqua was built specifically for this moment — when AI agents make development fast, but QA can’t keep up.
AI-Powered Planning, Structured Execution
aqua separates planning from execution. Your AI coding agent creates structured QA plans by analyzing your codebase and PR changes. These plans are then executed by aqua’s dedicated execution engine — not by the AI agent itself.
This means you get the intelligence of AI for planning what to test, with the reliability of a purpose-built engine for actually running the tests.
Reusable, Versioned Plans
Every QA plan in aqua is versioned and immutable. Write a plan once, run it across environments — local, staging, production. When you push a fix, re-run the same plan to verify. No need to re-explain or regenerate.
Automatic Result Capture
aqua automatically records HTTP request/response pairs, screenshots, DOM snapshots, and assertion results in a consistent format. Every team member can review results in the web dashboard — no more pasting screenshots in Slack.
Secrets Never Leave Your Machine
aqua integrates with secret providers like 1Password, AWS Secrets Manager, GCP Secret Manager, and HashiCorp Vault. Credentials are resolved locally and never sent to AI agents or the aqua service. Sensitive data in test artifacts is automatically masked before reaching the server.
Project Memory
aqua’s project memory accumulates knowledge across QA sessions — reliable selectors, authentication flows, API patterns. Your AI agent gets smarter about your application over time, reducing the need to re-explain your app’s structure with every test cycle.
Getting Started
aqua works with any MCP-compatible AI coding agent — Claude Code, Cursor, Windsurf, and more. Install the CLI, connect it to your agent, and start creating QA plans in minutes.
Check out the quickstart guide to get started, or visit aquaqa.com to learn more.
The AI agent era demands a new approach to QA — one that’s as intelligent and fast as the development tools we now use. aqua brings that approach to life.
This post covered the big picture — why QA is the bottleneck and what it takes to fix it. In upcoming posts, we’ll dig into the specifics: real workflows, CI/CD integration patterns, environment management strategies, and lessons learned from teams using aqua in production. Stay tuned.