Writing Your First Test Case
Read the field note below to see how we apply this pattern in real QA Agent projects.
Writing Your First Test Case
A good test case reads like a conversation between a QA engineer and a developer. Not like code.
What you'll learn
- The anatomy of a QA Agent test case file
- How to write steps that survive UI changes
- How to run a single test and read its output
The test case format
Every test case is a markdown file inside a suite folder. Create your first one:
mkdir -p tests/my-suite
touch tests/my-suite/homepage-loads.md
A test case has three required sections:
# Homepage loads correctly
- **persona**: unauthenticated
- **priority**: high
- **tags**: smoke, homepage
## Steps
1. Navigate to /
2. Verify the main heading is visible
3. Verify no console errors appear
## Expected
- Page renders without a blank screen
- Primary CTA button is visible and clickable
Title
The first line (# Title) is both the test name shown in reports and what the AI uses to understand the test's intent. Be specific: "Homepage loads correctly" is better than "Test 1".
Metadata
- persona. The user context for this test. Options:
unauthenticated,free-user,pro-user. QA Agent loads the corresponding credentials frompersonas/. Start withunauthenticatedfor public flows. - priority. One of
critical,high,medium, orlow. Used to filter runs and prioritise triage. - tags. Comma-separated. Used to run targeted subsets (
python cli.py run --tag smoke).
Steps
Write steps in plain imperative language. The AI model interprets them:
- Good:
Click "Sign in". References visible UI text - Good:
Verify the dashboard heading shows "Welcome back" - Avoid:
Click button[data-testid="login-btn"]. Selectors break on refactors and the AI doesn't need them - Avoid:
Wait 2 seconds. The agent handles timing automatically
Expected
List what success looks like. These become the pass/fail criteria the agent checks at the end of the run.
Writing steps that survive refactors
The biggest advantage of prose steps is resilience. When a button's label changes from "Get started" to "Start free trial", update one line. When a nav item moves, update the step description. You don't hunt for selectors.
Practical rules:
- Reference what the user sees, not what's in the DOM
- Describe intent: "Complete the sign-in flow" rather than "Click the submit button on line 3 of the form"
- Keep each step to one action
Run your test
python cli.py run --file tests/my-suite/homepage-loads.md --headed
--headed opens a visible browser window so you can watch what the agent does on the first run. Once you're confident the steps are interpreted correctly, drop --headed for faster headless runs.
Output:
Running: tests/my-suite/homepage-loads.md
[homepage-loads] PASS (15.2s)
Report saved: reports/2026-04-17-my-suite-01/index.html
Open the HTML report. You'll see a timeline of browser actions with screenshots at each step.
Common first-run issues
The agent gets stuck on a cookie banner. Add a step: Dismiss any cookie consent banner if present. The AI handles conditional UI well.
A step fails because an element loads slowly. Add Wait for the page to fully load as the first step after navigation. The agent will wait for the visible state before proceeding.
The test passes but the report shows a step differently than expected. Read the agent's reasoning in the report. Rewrite the step to be more explicit if the interpretation was wrong.
What's next
Quick answers
What do I get from this cable?
You get a dated field note that explains how we handle this test-authoring workflow in real QA Agent projects.
How much time should I budget?
Typical effort is 20 min. The cable is marked beginner.
How do I install the artifact?
This cable is guidance-only and does not ship an installable artifact.
How fresh is the guidance?
The cable is explicitly last verified on 2026-04-17, and includes source links for traceability.
Work with FRE|Nxt
We build the production AI systems we write about.
Cables are the field notes. The playbooks come from client engagements — multi-agent systems, RAG pipelines, and LLM cost cuts that ship and hold up in production. If something here maps to a problem on your roadmap, two ways in:
Audit capacity: 5 slots/month · No pitch deck · NDA on request
Use auto mode, not --dangerously-skip-permissions
Two flags promise to stop Claude Code from pausing at every tool call. One of them reads your settings, honours your allowlist, and refuses to run anything g…
Publish your stack to Cables (automated)
A skill that walks Claude Code through publishing your Claude stack to the Cables community in one conversation. No manual repo setup, no hand-written `stack…
Replicate Ragav's stack (skills + plugins + scripts)
Pick the stack that matches what you're building. Each one is a single `npx` command. Plugins installed, skills synced, marketplaces configured, no bash scri…