How QA Agent Works (Architecture)

Read the field note below to see how we apply this pattern in real QA Agent projects.

verified 2 months ago20 min

How QA Agent Works (Architecture)

Before contributing, understand what runs when you type python cli.py run --suite smoke.

What you'll learn

The execution pipeline from CLI command to HTML report
What each module in qa_agent/ does
How the agent makes pass/fail decisions
Which integration points are extension-friendly

Execution pipeline

CLI command
    │
    ▼
cli.py (entry point)
    │  parses --suite / --file / --tag flags
    │
    ▼
runner.py (RunnerService)
    │  resolves test files, manages parallelism (asyncio + semaphore)
    │
    ▼
parser.py (parse_test_case)
    │  reads markdown, extracts title, persona, priority, tags, steps, expected
    │
    ▼
agent_factory.py (build_agent)
    │  creates browser-use Agent with the configured LLM
    │  loads persona session cookies if persona != unauthenticated
    │
    ▼
browser-use Agent (executes steps)
    │  each step → LLM call → browser action
    │  screenshots captured at every step
    │
    ▼
hooks.py (on_step_end, on_done)
    │  captures screenshots, GIFs, video via Playwright
    │
    ▼
reporter.py (generate_report)
    │  builds HTML report from run results
    │
    ▼
linear_reporter.py / supabase_uploader.py (optional)
    │  creates Linear issues for failures, uploads report
    │
    ▼
HTML report on disk

Module responsibilities

File	Responsibility
`cli.py`	Argument parsing, command dispatch (`run`, `auth`, `report`, `release`)
`runner.py`	Parallel test execution, result aggregation
`parser.py`	Markdown test case parsing → structured `TestCase` dataclass
`agent_factory.py`	Constructs browser-use `Agent` with LLM config and persona session
`hooks.py`	Step-level callbacks: screenshots, GIF/video recording
`reporter.py`	HTML report generation from `RunResult` objects
`auth.py`	Interactive persona session capture and persistence
`linear_reporter.py`	Linear issue creation for failing tests
`sentry_reporter.py`	Sentry event correlation (links test failures to error events)
`supabase_uploader.py`	Report upload to Supabase Storage
`release_runner.py`	CSV-driven release pipeline: reads a test manifest, runs specified suites, generates a consolidated report
`csv_parser.py`	Parses release CSV format into runnable test batches
`custom_tools.py`	Custom browser-use tools (extended actions available to the agent)

How pass/fail works

QA Agent does not use assertions in the traditional sense. Instead:

The agent executes every step in the ## Steps section
At the end of the run, the agent evaluates the ## Expected section against what it observed
If all expected outcomes are satisfied, the test passes. If any are not, the test fails with a structured failure note explaining which expectation was not met and why

This means the AI model is both the executor and the judge. For most UI flows this works well. For precise numeric or time-based assertions ("price must be exactly $29.00"), be explicit in the Expected section to reduce ambiguity.

LLM configuration

The model is set in config.yaml:

llm:
  model: google/gemini-3.1-flash-lite-preview
  base_url: https://openrouter.ai/api/v1

agent_factory.py constructs the ChatOpenAI-compatible client from these values. To swap models, change model in config.yaml. Any OpenRouter-compatible model ID works.

Integration points

The cleanest places to extend QA Agent:

New integrations (Jira, GitHub Issues, Slack). Add a module alongside linear_reporter.py, implement a report(run_result) function, and call it from runner.py after the run completes
Custom agent tools. Add browser actions to custom_tools.py (e.g., drag-and-drop, file upload helpers)
Custom report formats. Replace or extend reporter.py for JSON output, CSV summaries, or custom HTML templates

What's next

Cable 7: Contributing to QA Agent →

Quick answers

What do I get from this cable?

You get a dated field note that explains how we handle this architecture workflow in real QA Agent projects.

How much time should I budget?

Typical effort is 20 min. The cable is marked intermediate.

How do I install the artifact?

This cable is guidance-only and does not ship an installable artifact.

How fresh is the guidance?

The cable is explicitly last verified on 2026-04-17, and includes source links for traceability.

Work with FRE|Nxt

We build the production AI systems we write about.

Cables are the field notes. The playbooks come from client engagements — multi-agent systems, RAG pipelines, and LLM cost cuts that ship and hold up in production. If something here maps to a problem on your roadmap, two ways in:

Get a free 2-page audit Book a 30-min discovery call

Audit capacity: 5 slots/month · No pitch deck · NDA on request

Same shelf · Fix a specific problem

claude-code·no artifact

Use auto mode, not --dangerously-skip-permissions

Two flags promise to stop Claude Code from pausing at every tool call. One of them reads your settings, honours your allowlist, and refuses to run anything g…

@frenxt · 8 mininstall →

claude-code·skill

Publish your stack to Cables (automated)

A skill that walks Claude Code through publishing your Claude stack to the Cables community in one conversation. No manual repo setup, no hand-written `stack…

@frenxt · 10 mininstall →

claude-code·skill

Replicate Ragav's stack (skills + plugins + scripts)

Pick the stack that matches what you're building. Each one is a single `npx` command. Plugins installed, skills synced, marketplaces configured, no bash scri…

@frenxt · 15 mininstall →

Share this cable

Share on Twitter Share on LinkedIn