Skip to content

v0.4.0 specification

Agent QC v0.4.0 is a portable draft standard for evidence-driven quality control of Agent projects.

An Agent project can be a runtime CLI, SDK, tool server, MCP/ACP gateway, multi-channel bot, GUI/TUI/desktop client, skills or plugin ecosystem, background scheduler, distribution package, or evaluation suite. Agent QC does not assume one product shape. It starts by classifying the project profile and then selects gates that match its risk.

Scope

Agent QC standardizes:

  1. Project profiles for Agent systems.
  2. Test plan, case, gate, run, evidence, verdict, and report objects.
  3. Gate taxonomy from static checks to live provider and release smoke.
  4. Evidence-backed pass/fail semantics.
  5. qcloop-compatible batch QC for repeated independent cases.
  6. Case-study mapping for representative runtime, TUI, gateway, scheduler, UI, skills, release, and eval projects.

Agent QC does not standardize any single programming language, CI vendor, test framework, browser driver, model protocol, storage backend, or UI skin.

Document set

The latest standard is split by use:

PagePurpose
Quickstartfastest path to a QC plan
Best practicesauthoring rules and anti-patterns
Project classificationprofile taxonomy and mixed-profile rules
Gate matrixprofile/surface/risk to gate mapping
Interaction surface testingCLI/TUI/WebUI/desktop/browser/channel/eval UI evidence
Evidence contractportable evidence, verdict, waiver fields
Performance and reliability metricstiming, flake, cleanup, scheduler, release metrics
Flow and taxonomycomplete lifecycle and taxonomy reference
Star project testing systemsrepresentative Agent project testing-system case studies

Project profiles

A qc_plan.project_profiles array declares which project shapes apply.

ProfileTypical risksExample gates
agent-runtime-clitool execution, sandboxing, permission, streams, resume, subprocess cleanupunit, protocol, fake model server, CLI e2e, sandbox tests
agent-sdk-apipublic API compatibility, generated contracts, fake server behavior, async cancellationsignature tests, generated contract diff, fake server integration
agent-tool-mcp-gatewaytool declaration drift, stdio/http transport, recovery, resource access, audit refsprotocol conformance, mock server, transport recovery, contract tests
multi-channel-agent-gatewaychannel adapters, auth, secrets, webhook verification, provider drift, media routingchannel contract tests, secret isolation, live opt-in, docker smoke
agent-ui-tui-desktoprendering, terminal/browser state, user controls, screenshots, accessibility, bridge readinessUI unit, snapshot, Playwright, terminal fixtures, GUI smoke
agent-skills-pluginsmanifest shape, loader, package boundary, trust, marketplace or registry driftschema, discovery, package export, fixture install, security scan
background-agent-schedulercron, queues, leases, retries, concurrency, idempotency, stuck-loop recoverydeterministic scheduler tests, race tests, stress tests, checkpoint/reclaim
agent-distribution-releaseinstall, package contents, Docker, cross-platform, lockfiles, supply-chaininstall smoke, package dry run, Docker smoke, OS matrix, lock checks
agent-evals-qualitymodel behavior regressions, prompt drift, rubric quality, answer groundingeval suite, baseline comparison, LLM/human judge, qcloop batch

A project MAY combine profiles. For example OpenClaw combines channel gateway, tool gateway, distribution, live provider, and plugin profiles.

Interaction surfaces

A project profile says what the project owns. An interaction surface says where users or operators observe the Agent. qc_case.surface is optional in the JSON schema but SHOULD be present for user-visible gates.

SurfaceApplies toExtra evidence required
cli-streamcommand output, JSONL/NDJSON, stdout/stderrexit status, stdout/stderr transcript, structured event sample
tuiterminal UI, Ink, ratatui, cursesterminal snapshot, viewport size, key sequence, runtime transcript
webuibrowser dashboard, extension UI, admin/QA consolescreenshot/trace, console log, route state, browser-only assertion
desktop-guiTauri, Electron, native shellshell start evidence, bridge health, workspace/session readiness, OS note
browser-automationCDP, Playwright, browser-use, remote browser providersscreenshot, DOM/a11y snapshot, console/network log, cleanup evidence
channel-uimobile, QR, chat apps, webhook surfaceschannel transcript, media fixture, auth/webhook replay, device/emulator log
eval-uiQA dashboards and semantic evaluation reportsrubric, judge output, baseline delta, reviewer note

A ui-interaction gate SHOULD name one of these surfaces. A pass without surface-specific evidence is incomplete. Surface proof SHOULD connect entrypoint, user action, visible frame, runtime backing, and cleanup evidence.

Core objects

ObjectPurpose
qc_planA test plan for one change, release, investigation, or regression sweep.
qc_caseOne behavior-level item with steps, expected result, required gates, and evidence.
qc_gateA validation boundary such as static, unit, contract, integration, e2e, live, stress, release, or review.
qc_runOne execution attempt with command, executor, environment, output refs, duration, and result.
qc_evidenceA reference to logs, reports, traces, screenshots, fixtures, qcloop attempts, CI runs, or review notes.
qc_verdictA judgment over evidence: passed, failed, blocked, exhausted, waived, or needs-review.
qc_reportThe aggregate result, remaining risk, waivers, and next action.

Gate families

FamilyPurposeEvidence examples
staticformat, lint, type, dependency and policy hygienecommand logs, SARIF, lockfile check output
unitdeterministic local behaviortest report, coverage, fixture output
property-fuzzinvariants and generated inputseed, corpus, failing case artifact
contract-protocolschemas, APIs, generated clients, command/tool surfacescontract report, schema diff, mock transcript
fake-integrationintegration against fake servers or local adaptersfake server log, request/response transcript
runtime-e2ereal CLI/runtime/task flow without external provider riskCLI transcript, process cleanup evidence, state snapshot
ui-interactionGUI/TUI/browser/terminal behaviorscreenshot, trace, video, accessibility report
live-provideropt-in real provider or network pathredacted transcript, credentials-scope note, cost/budget
stress-concurrencyraces, leases, retries, long-running loopsstress report, worker timeline, seed, benchmark
distribution-releaseinstall, package, Docker, cross-platform release readinesstarball manifest, Docker smoke, OS matrix, release check
semantic-evalmodel output quality, grounding, policy, user intenteval result, rubric, judge output, baseline delta
reviewhuman or LLM reviewreviewer decision, rubric, evidence refs

Status values

qc_case.status, qc_gate.status, and qc_report.status use:

  • planned
  • running
  • passed
  • failed
  • blocked
  • exhausted
  • waived
  • skipped
  • needs-review

A waived gate MUST include waiver.reason, waiver.approver, and waiver.expires when the project has a waiver process.

Evidence rules

A passed verdict MUST include evidence. A failed verdict MUST include the smallest actionable failure. A blocked verdict MUST identify the missing environment fact. An exhausted verdict MUST preserve attempts and verifier feedback.

Self-report is not evidence. The sentence "the agent checked it" is only valid when it links to command output, test report, transcript, trace, screenshot, or review record.

qcloop mapping

A qc_case can become a qcloop item_value. qcloop attempt maps to qc_run; qcloop qc_round maps to qc_verdict; qcloop exhausted maps to Agent QC exhausted, not generic failure.

Use qcloop when cases are repeated, independent, and verifier-friendly. Do not use qcloop to replace required project gates or to hide live-provider risk.

Draft standard for evidence-driven quality control of Agent projects.