How QA Agent Works (Architecture)

Read the field note below to see how we apply this pattern in real QA Agent projects.

verified 1 month ago20 min

How QA Agent Works (Architecture)

Before contributing, understand what runs when you type python cli.py run --suite smoke.

What you'll learn

  • The execution pipeline from CLI command to HTML report
  • What each module in qa_agent/ does
  • How the agent makes pass/fail decisions
  • Which integration points are extension-friendly

Execution pipeline

CLI command
    │
    ▼
cli.py (entry point)
    │  parses --suite / --file / --tag flags
    │
    ▼
runner.py (RunnerService)
    │  resolves test files, manages parallelism (asyncio + semaphore)
    │
    ▼
parser.py (parse_test_case)
    │  reads markdown, extracts title, persona, priority, tags, steps, expected
    │
    ▼
agent_factory.py (build_agent)
    │  creates browser-use Agent with the configured LLM
    │  loads persona session cookies if persona != unauthenticated
    │
    ▼
browser-use Agent (executes steps)
    │  each step → LLM call → browser action
    │  screenshots captured at every step
    │
    ▼
hooks.py (on_step_end, on_done)
    │  captures screenshots, GIFs, video via Playwright
    │
    ▼
reporter.py (generate_report)
    │  builds HTML report from run results
    │
    ▼
linear_reporter.py / supabase_uploader.py (optional)
    │  creates Linear issues for failures, uploads report
    │
    ▼
HTML report on disk

Module responsibilities

FileResponsibility
cli.pyArgument parsing, command dispatch (run, auth, report, release)
runner.pyParallel test execution, result aggregation
parser.pyMarkdown test case parsing → structured TestCase dataclass
agent_factory.pyConstructs browser-use Agent with LLM config and persona session
hooks.pyStep-level callbacks: screenshots, GIF/video recording
reporter.pyHTML report generation from RunResult objects
auth.pyInteractive persona session capture and persistence
linear_reporter.pyLinear issue creation for failing tests
sentry_reporter.pySentry event correlation (links test failures to error events)
supabase_uploader.pyReport upload to Supabase Storage
release_runner.pyCSV-driven release pipeline: reads a test manifest, runs specified suites, generates a consolidated report
csv_parser.pyParses release CSV format into runnable test batches
custom_tools.pyCustom browser-use tools (extended actions available to the agent)

How pass/fail works

QA Agent does not use assertions in the traditional sense. Instead:

  1. The agent executes every step in the ## Steps section
  2. At the end of the run, the agent evaluates the ## Expected section against what it observed
  3. If all expected outcomes are satisfied, the test passes. If any are not, the test fails with a structured failure note explaining which expectation was not met and why

This means the AI model is both the executor and the judge. For most UI flows this works well. For precise numeric or time-based assertions ("price must be exactly $29.00"), be explicit in the Expected section to reduce ambiguity.

LLM configuration

The model is set in config.yaml:

llm:
  model: google/gemini-3.1-flash-lite-preview
  base_url: https://openrouter.ai/api/v1

agent_factory.py constructs the ChatOpenAI-compatible client from these values. To swap models, change model in config.yaml. Any OpenRouter-compatible model ID works.

Integration points

The cleanest places to extend QA Agent:

  • New integrations (Jira, GitHub Issues, Slack). Add a module alongside linear_reporter.py, implement a report(run_result) function, and call it from runner.py after the run completes
  • Custom agent tools. Add browser actions to custom_tools.py (e.g., drag-and-drop, file upload helpers)
  • Custom report formats. Replace or extend reporter.py for JSON output, CSV summaries, or custom HTML templates

What's next

Cable 7: Contributing to QA Agent →

Quick answers

What do I get from this cable?

You get a dated field note that explains how we handle this architecture workflow in real QA Agent projects.

How much time should I budget?

Typical effort is 20 min. The cable is marked intermediate.

How do I install the artifact?

This cable is guidance-only and does not ship an installable artifact.

How fresh is the guidance?

The cable is explicitly last verified on 2026-04-17, and includes source links for traceability.

Work with FRE|Nxt

We build the production AI systems we write about.

Cables are the field notes. The playbooks come from client engagements — multi-agent systems, RAG pipelines, and LLM cost cuts that ship and hold up in production. If something here maps to a problem on your roadmap, two ways in:

Audit capacity: 5 slots/month · No pitch deck · NDA on request

Same shelf · Fix a specific problem
Share this cable