Running Tests & Reading Reports

Read the field note below to see how we apply this pattern in real QA Agent projects.

FRE|Nxt Labs

verified 2 months ago20 min

Running Tests & Reading Reports

A green result is only useful if you trust it. A red result is only useful if you can debug it fast.

What you'll learn

The full set of run command options
What the HTML report contains and how to read it
How to share reports with your team

The `run` command

Run a full suite

python cli.py run --suite smoke

Runs every .md file inside tests/smoke/. Suites run in parallel. The default is 3 concurrent browser sessions (configurable in config.yaml under parallel).

Run a single file

python cli.py run --file tests/smoke/homepage-loads.md

Useful when writing or debugging a new test case.

Run by tag

python cli.py run --tag critical

Runs all test cases across all suites that have the matching tag in their metadata.

Run in headed mode

python cli.py run --suite smoke --headed

Opens a visible browser window. Use this when developing test cases or investigating failures.

View the latest report

python cli.py report --latest

Opens the most recently generated HTML report in your default browser.

Reading the HTML report

The report has three sections:

Summary bar

Shows total tests, pass count, fail count, skip count, total duration, and the timestamp. This is what you screenshot for a release sign-off.

Per-test timeline

Each test gets its own card with:

Status badge. PASS (green), FAIL (red), or SKIP (grey)
Duration. How long the test took
Step log. Each step the agent executed, in order, with its interpretation
Screenshots. One per step, shown inline
Failure note (on FAIL). Structured text: which step failed, the agent's reasoning, and the screenshot path

A typical failure note:

Step failed: "Click 'Proceed to checkout'"
Reason: Element matched selector but returned pointer-events: none
Screenshot: reports/2026-04-17-checkout-01/step-4.png
Reproduction: Load /cart with free-user persona, add item, proceed to checkout

GIF / video evidence

At the bottom of each test card is a GIF of the full browser session. Share this directly in a Slack thread or Linear issue. No need for anyone to reproduce the failure locally to understand what happened.

Reports are static HTML files in the reports/ directory. Options for sharing:

For team review: Commit the report directory to a qa-reports branch and share the GitHub URL, or upload the folder to any static host.

For release pipelines: Use the built-in Supabase uploader (python cli.py report --upload) if you have SUPABASE_URL and SUPABASE_SERVICE_ROLE_KEY set in .env. This uploads the report and returns a public URL.

For Linear integration: Set linear.create_issues: true in config.yaml. QA Agent will automatically open a Linear issue for each failing test, attaching the failure note and GIF.

What's next

Cable 5: Organizing Suites & Personas →

Quick answers

What do I get from this cable?

You get a dated field note that explains how we handle this running-tests workflow in real QA Agent projects.

How much time should I budget?

Typical effort is 20 min. The cable is marked intermediate.

How do I install the artifact?

This cable is guidance-only and does not ship an installable artifact.

How fresh is the guidance?

The cable is explicitly last verified on 2026-04-17, and includes source links for traceability.

Work with FRE|Nxt

We build the production AI systems we write about.

Cables are the field notes. The playbooks come from client engagements — multi-agent systems, RAG pipelines, and LLM cost cuts that ship and hold up in production. If something here maps to a problem on your roadmap, two ways in:

Get a free 2-page audit Book a 30-min discovery call

Audit capacity: 5 slots/month · No pitch deck · NDA on request

Same shelf · Fix a specific problem

claude-code·no artifact

Use auto mode, not --dangerously-skip-permissions

Two flags promise to stop Claude Code from pausing at every tool call. One of them reads your settings, honours your allowlist, and refuses to run anything g…

@frenxt · 8 mininstall →

claude-code·skill

Publish your stack to Cables (automated)

A skill that walks Claude Code through publishing your Claude stack to the Cables community in one conversation. No manual repo setup, no hand-written `stack…

@frenxt · 10 mininstall →

claude-code·skill

Replicate Ragav's stack (skills + plugins + scripts)

Pick the stack that matches what you're building. Each one is a single `npx` command. Plugins installed, skills synced, marketplaces configured, no bash scri…

@frenxt · 15 mininstall →

Share this cable

Share on Twitter Share on LinkedIn

Running Tests & Reading Reports

Running Tests & Reading Reports

What you'll learn

The run command

Run a full suite

Run a single file

Run by tag

Run in headed mode

View the latest report

Reading the HTML report

Summary bar

Per-test timeline

GIF / video evidence

Sharing reports

What's next

Quick answers

We build the production AI systems we write about.

Use auto mode, not --dangerously-skip-permissions

Publish your stack to Cables (automated)

Replicate Ragav's stack (skills + plugins + scripts)

The `run` command