How QA Agent Works (Architecture)
Read the field note below to see how we apply this pattern in real QA Agent projects.
How QA Agent Works (Architecture)
Before contributing, understand what runs when you type python cli.py run --suite smoke.
What you'll learn
- The execution pipeline from CLI command to HTML report
- What each module in
qa_agent/does - How the agent makes pass/fail decisions
- Which integration points are extension-friendly
Execution pipeline
CLI command
│
▼
cli.py (entry point)
│ parses --suite / --file / --tag flags
│
▼
runner.py (RunnerService)
│ resolves test files, manages parallelism (asyncio + semaphore)
│
▼
parser.py (parse_test_case)
│ reads markdown, extracts title, persona, priority, tags, steps, expected
│
▼
agent_factory.py (build_agent)
│ creates browser-use Agent with the configured LLM
│ loads persona session cookies if persona != unauthenticated
│
▼
browser-use Agent (executes steps)
│ each step → LLM call → browser action
│ screenshots captured at every step
│
▼
hooks.py (on_step_end, on_done)
│ captures screenshots, GIFs, video via Playwright
│
▼
reporter.py (generate_report)
│ builds HTML report from run results
│
▼
linear_reporter.py / supabase_uploader.py (optional)
│ creates Linear issues for failures, uploads report
│
▼
HTML report on disk
Module responsibilities
| File | Responsibility |
|---|---|
cli.py | Argument parsing, command dispatch (run, auth, report, release) |
runner.py | Parallel test execution, result aggregation |
parser.py | Markdown test case parsing → structured TestCase dataclass |
agent_factory.py | Constructs browser-use Agent with LLM config and persona session |
hooks.py | Step-level callbacks: screenshots, GIF/video recording |
reporter.py | HTML report generation from RunResult objects |
auth.py | Interactive persona session capture and persistence |
linear_reporter.py | Linear issue creation for failing tests |
sentry_reporter.py | Sentry event correlation (links test failures to error events) |
supabase_uploader.py | Report upload to Supabase Storage |
release_runner.py | CSV-driven release pipeline: reads a test manifest, runs specified suites, generates a consolidated report |
csv_parser.py | Parses release CSV format into runnable test batches |
custom_tools.py | Custom browser-use tools (extended actions available to the agent) |
How pass/fail works
QA Agent does not use assertions in the traditional sense. Instead:
- The agent executes every step in the
## Stepssection - At the end of the run, the agent evaluates the
## Expectedsection against what it observed - If all expected outcomes are satisfied, the test passes. If any are not, the test fails with a structured failure note explaining which expectation was not met and why
This means the AI model is both the executor and the judge. For most UI flows this works well. For precise numeric or time-based assertions ("price must be exactly $29.00"), be explicit in the Expected section to reduce ambiguity.
LLM configuration
The model is set in config.yaml:
llm:
model: google/gemini-3.1-flash-lite-preview
base_url: https://openrouter.ai/api/v1
agent_factory.py constructs the ChatOpenAI-compatible client from these values. To swap models, change model in config.yaml. Any OpenRouter-compatible model ID works.
Integration points
The cleanest places to extend QA Agent:
- New integrations (Jira, GitHub Issues, Slack). Add a module alongside
linear_reporter.py, implement areport(run_result)function, and call it fromrunner.pyafter the run completes - Custom agent tools. Add browser actions to
custom_tools.py(e.g., drag-and-drop, file upload helpers) - Custom report formats. Replace or extend
reporter.pyfor JSON output, CSV summaries, or custom HTML templates
What's next
Quick answers
What do I get from this cable?
You get a dated field note that explains how we handle this architecture workflow in real QA Agent projects.
How much time should I budget?
Typical effort is 20 min. The cable is marked intermediate.
How do I install the artifact?
This cable is guidance-only and does not ship an installable artifact.
How fresh is the guidance?
The cable is explicitly last verified on 2026-04-17, and includes source links for traceability.
Work with FRE|Nxt
We build the production AI systems we write about.
Cables are the field notes. The playbooks come from client engagements — multi-agent systems, RAG pipelines, and LLM cost cuts that ship and hold up in production. If something here maps to a problem on your roadmap, two ways in:
Audit capacity: 5 slots/month · No pitch deck · NDA on request
Use auto mode, not --dangerously-skip-permissions
Two flags promise to stop Claude Code from pausing at every tool call. One of them reads your settings, honours your allowlist, and refuses to run anything g…
Publish your stack to Cables (automated)
A skill that walks Claude Code through publishing your Claude stack to the Cables community in one conversation. No manual repo setup, no hand-written `stack…
Replicate Ragav's stack (skills + plugins + scripts)
Pick the stack that matches what you're building. Each one is a single `npx` command. Plugins installed, skills synced, marketplaces configured, no bash scri…