Specification

Agent QC v0.2.0 is a portable draft standard for evidence-driven quality control of Agent projects.

An Agent project can be a runtime CLI, SDK, tool server, MCP/ACP gateway, multi-channel bot, GUI/TUI/desktop client, skills or plugin ecosystem, background scheduler, distribution package, or evaluation suite. Agent QC does not assume one product shape. It starts by classifying the project profile and then selects gates that match its risk.

Scope

Agent QC standardizes:

Project profiles for Agent systems.
Test plan, case, gate, run, evidence, verdict, and report objects.
Gate taxonomy from static checks to live provider and release smoke.
Evidence-backed pass/fail semantics.
qcloop-compatible batch QC for repeated independent cases.
Case-study mapping for representative runtime, gateway, scheduler, UI, release, and eval projects.

Agent QC does not standardize any single programming language, CI vendor, test framework, browser driver, model protocol, storage backend, or UI skin.

Project profiles

A qc_plan.project_profiles array declares which project shapes apply.

Profile	Typical risks	Example gates
`agent-runtime-cli`	tool execution, sandboxing, permission, streams, resume, subprocess cleanup	unit, protocol, fake model server, CLI e2e, sandbox tests
`agent-sdk-api`	public API compatibility, generated contracts, fake server behavior, async cancellation	signature tests, generated contract diff, fake server integration
`agent-tool-mcp-gateway`	tool declaration drift, stdio/http transport, recovery, resource access, audit refs	protocol conformance, mock server, transport recovery, contract tests
`multi-channel-agent-gateway`	channel adapters, auth, secrets, webhook verification, provider drift, media routing	channel contract tests, secret isolation, live opt-in, docker smoke
`agent-ui-tui-desktop`	rendering, terminal/browser state, user controls, screenshots, accessibility, bridge readiness	UI unit, snapshot, Playwright, terminal fixtures, GUI smoke
`agent-skills-plugins`	manifest shape, loader, package boundary, trust, marketplace or registry drift	schema, discovery, package export, fixture install, security scan
`background-agent-scheduler`	cron, queues, leases, retries, concurrency, idempotency, stuck-loop recovery	deterministic scheduler tests, race tests, stress tests, checkpoint/reclaim
`agent-distribution-release`	install, package contents, Docker, cross-platform, lockfiles, supply-chain	install smoke, package dry run, Docker smoke, OS matrix, lock checks
`agent-evals-quality`	model behavior regressions, prompt drift, rubric quality, answer grounding	eval suite, baseline comparison, LLM/human judge, qcloop batch

A project MAY combine profiles. For example OpenClaw combines channel gateway, tool gateway, distribution, live provider, and plugin profiles.

Core objects

Object	Purpose
`qc_plan`	A test plan for one change, release, investigation, or regression sweep.
`qc_case`	One behavior-level item with steps, expected result, required gates, and evidence.
`qc_gate`	A validation boundary such as static, unit, contract, integration, e2e, live, stress, release, or review.
`qc_run`	One execution attempt with command, executor, environment, output refs, duration, and result.
`qc_evidence`	A reference to logs, reports, traces, screenshots, fixtures, qcloop attempts, CI runs, or review notes.
`qc_verdict`	A judgment over evidence: passed, failed, blocked, exhausted, waived, or needs-review.
`qc_report`	The aggregate result, remaining risk, waivers, and next action.

Gate families

Family	Purpose	Evidence examples
`static`	format, lint, type, dependency and policy hygiene	command logs, SARIF, lockfile check output
`unit`	deterministic local behavior	test report, coverage, fixture output
`property-fuzz`	invariants and generated input	seed, corpus, failing case artifact
`contract-protocol`	schemas, APIs, generated clients, command/tool surfaces	contract report, schema diff, mock transcript
`fake-integration`	integration against fake servers or local adapters	fake server log, request/response transcript
`runtime-e2e`	real CLI/runtime/task flow without external provider risk	CLI transcript, process cleanup evidence, state snapshot
`ui-interaction`	GUI/TUI/browser/terminal behavior	screenshot, trace, video, accessibility report
`live-provider`	opt-in real provider or network path	redacted transcript, credentials-scope note, cost/budget
`stress-concurrency`	races, leases, retries, long-running loops	stress report, worker timeline, seed, benchmark
`distribution-release`	install, package, Docker, cross-platform release readiness	tarball manifest, Docker smoke, OS matrix, release check
`semantic-eval`	model output quality, grounding, policy, user intent	eval result, rubric, judge output, baseline delta
`review`	human or LLM review	reviewer decision, rubric, evidence refs

Status values

qc_case.status, qc_gate.status, and qc_report.status use:

planned
running
passed
failed
blocked
exhausted
waived
skipped
needs-review

A waived gate MUST include waiver.reason, waiver.approver, and waiver.expires when the project has a waiver process.

Evidence rules

A passed verdict MUST include evidence. A failed verdict MUST include the smallest actionable failure. A blocked verdict MUST identify the missing environment fact. An exhausted verdict MUST preserve attempts and verifier feedback.

Self-report is not evidence. The sentence "the agent checked it" is only valid when it links to command output, test report, transcript, trace, screenshot, or review record.

qcloop mapping

A qc_case can become a qcloop item_value. qcloop attempt maps to qc_run; qcloop qc_round maps to qc_verdict; qcloop exhausted maps to Agent QC exhausted, not generic failure.

Use qcloop when cases are repeated, independent, and verifier-friendly. Do not use qcloop to replace required project gates or to hide live-provider risk.

Specification ​

Scope ​

Project profiles ​

Core objects ​

Gate families ​

Status values ​

Evidence rules ​

qcloop mapping ​