流程与分类

本页是 Agent QC 的完整生命周期和分类参考。它沿用 Agent UI 的规范写法：明确维度、字段、约束、生命周期阶段和验证案例。

核心契约

Agent QC 是 Agent 项目质量的证据协议。兼容的 QC 计划会分类拥有的风险、选择门禁、执行检查、保存证据并输出 verdict，而不是把模型文字当证明。

兼容 QC 报告必须：

分类一个或多个 project profiles；
涉及用户可见行为时命名 interaction surfaces；
把每个 required gate 映射到本地命令、CI job、qcloop item 或 review step；
为每个 pass/fail/blocked/exhausted/waived verdict 保留可检查 evidence refs；
分离 deterministic、runtime、surface、live-provider、release 和 semantic-eval 声明；
明确写出限制和 waivers。

兼容 QC 报告不能：

把最终 assistant answer 当作证据，除非它链接了证据工件；
从 UI 文本推断 runtime 成功；
把 live-provider 调用藏进默认确定性测试；
把 screenshot、trace、terminal snapshot、protocol transcript 混成一句“UI 已检查”；
在 required evidence 缺失时宣称 gate 通过。

生命周期概览

text

change or release scope
  -> classify profiles
  -> identify touched surfaces
  -> assign fact owners and risk owners
  -> select gate lanes
  -> write behavior-level cases
  -> execute deterministic gates
  -> execute runtime and surface gates
  -> opt into live/release/eval gates when required
  -> collect evidence refs
  -> issue verdicts
  -> publish report, waivers, and next action

这个流程适用于 CLI agents、SDK、MCP/tool gateways、channel bots、TUI/GUI/WebUI products、browser automation systems、schedulers、skills/plugins、distribution packages 和 eval suites。

分类维度

Project profile

Profile 描述项目拥有的形态。

Profile	拥有	默认风险
`agent-runtime-cli`	agent loop、CLI、task execution、sandbox、tools、resume	stream drift、permissions、subprocess cleanup、resume consistency
`agent-sdk-api`	public SDK、generated client、API wrappers	signature drift、async cancellation、fake-server behavior
`agent-tool-mcp-gateway`	tool declarations、MCP/ACP bridge、connector runtime	protocol conformance、stdio/http recovery、resource permission
`multi-channel-agent-gateway`	chat/channel adapters、webhooks、auth、media	identity、webhook verification、media routing、secret redaction
`agent-ui-tui-desktop`	GUI、TUI、desktop shell、browser-visible flows	projection drift、stale success、bridge readiness、screenshots/traces
`agent-skills-plugins`	skills、plugins、manifests、loaders、marketplace	manifest drift、package boundary、trust policy、fixture install
`background-agent-scheduler`	cron、queues、workers、retries、long-running agents	duplicate work、lost checkpoints、race、stuck loop
`agent-distribution-release`	package、Docker、installers、cross-platform release	missing files、broken clean install、lock drift、supply chain
`agent-evals-quality`	task quality、model behavior、rubrics、generated outputs	prompt drift、judge instability、baseline regression、grounding gap

Interaction surface

Surface 描述行为在哪里被观察到。

Surface	使用场景	必需证据
`cli-stream`	stdout/stderr、JSONL/NDJSON、command UI	command、exit status、transcript、structured sample
`tui`	terminal UI、Ink、ratatui、curses	viewport、key sequence、terminal snapshot、runtime transcript
`webui`	browser dashboard、extension UI、QA/admin console	screenshot/trace、console log、route/state assertion
`desktop-gui`	Tauri、Electron、native shell	shell start、bridge health、workspace/session readiness、OS note
`browser-automation`	CDP、Playwright、browser-use、remote browser	DOM/a11y、screenshot、console/network、cleanup proof
`channel-ui`	chat app、QR、mobile、webhook-visible flows	channel transcript、media fixture、auth/webhook replay、redaction
`eval-ui`	QA dashboards 和 eval reports	rubric、judge output、baseline delta、reviewer note

Gate family

Gate family 描述验证方式，不是框架名称。

Family	默认用途	升级条件
`static`	format、lint、type、schema、dependency hygiene	generated files 或 policy boundaries 变化
`unit`	deterministic local behavior	algorithms、parsers、reducers、adapters 变化
`property-fuzz`	invariants 和 generated input	parser、sandbox、path、protocol、serializer 风险高
`contract-protocol`	schema/API/command/tool surfaces	wire shape、manifest、command 或 SDK shape 变化
`fake-integration`	local fake server 或 adapter flow	外部 API 行为被模拟
`runtime-e2e`	无 live provider 风险的真实 CLI/task/session	loop、tool、permission、resume、subprocess flow 变化
`ui-interaction`	GUI/TUI/WebUI/browser/channel 可见行为	用户或运维会看到改动
`live-provider`	显式 opt-in 的真实网络/model/channel path	provider/channel 行为属于声明的一部分
`stress-concurrency`	races、queue、leases、retries、long runs	scheduler、parallel agents、workers 或 locks 变化
`distribution-release`	package/install/Docker/OS matrix	有任何对外发布物变化
`semantic-eval`	task quality、prompt、rubric、judge	model 行为或输出质量是产品本身
`review`	human/LLM review	需要 safety、policy、UX 或语义判断

Evidence kind

Kind	例子	必须包含
`command-log`	shell output、CI step、cargo/npm/pytest/vitest output	command、exit status、environment note
`test-report`	JUnit、JSON、coverage、HTML report	suite id、failing ids、artifact path 或 URL
`protocol-transcript`	fake server、MCP/ACP、WebSocket、HTTP transcript	request/response refs、redaction note
`runtime-transcript`	CLI JSONL、TUI-linked events、session state	run/session ids、event order、cleanup
`surface-artifact`	screenshot、video、Playwright trace、terminal snapshot	viewport/device/OS、action sequence
`browser-diagnostic`	console、network、DOM/a11y snapshot	route、selector 或 accessibility assertion
`release-artifact`	package manifest、tarball list、Docker smoke	version、platform、install command
`eval-artifact`	rubric、judge output、baseline diff	dataset、model/judge、threshold
`review-note`	human 或 LLM review	reviewer、scope、evidence refs、decision
`qcloop-run`	attempt 和 QC round refs	item value、attempt id、verifier feedback

Verdict status

Status	含义	必需字段
`passed`	证据证明所有 required expectations	evidence refs 和 scope
`failed`	证据否定 expectation 或 gate 失败	最小可行动失败和 evidence
`blocked`	缺环境、凭证、依赖、fixture 或 binary，无法判断	blocker 和 owner
`exhausted`	尝试或预算结束但没有证明	attempt refs 和 remaining uncertainty
`waived`	负责人接受已知缺口	approver、reason、scope、expiry
`needs-review`	有证据但还需要语义/安全 review	reviewer 或 review queue
`skipped`	当前 scope 不适用	reason 和 scope

Fact owners

Agent QC 应说明每个事实由谁拥有，而不是让报告拥有一切。

Owner	拥有	QC 职责
Runtime	task/session/tool/permission state	采集 transcript 和 state refs
Protocol/SDK	schemas、generated clients、adapters	采集 contract diff 和 fake transcript
UI projection	visible rendering 和 user controls	采集 surface artifact 并连接 runtime
Evidence service	durable traces、replay、reviews	链接 evidence ids 和 export jobs
Policy/security	approvals、waivers、credentials、retention	记录 risk decision 和 scope
Artifact/release	deliverables、package contents、versions	采集 manifest 和 install proof
Scheduler	leases、checkpoints、retries、workers	采集 timeline 和 duplicate-work proof
Eval system	rubrics、judge outputs、baselines	采集 dataset、threshold 和 deltas

标准 case envelope

即使 JSON schema 允许扩展，可移植的 qc_case 也应携带这些字段。

Field	Required	Purpose
`id`	yes	stable case id
`project_profile`	yes	taxonomy 中的一个 profile
`surface`	visible case 推荐	observation surface
`target`	yes	file、command、package、flow、API 或 release target
`risk_owner`	recommended	runtime、protocol、UI、scheduler、release、eval、policy
`required_gates`	yes	要满足的 gate families
`steps`	yes	可复现命令或交互
`expected`	yes	行为级期望
`required_evidence`	yes	verdict 需要的工件
`live_policy`	conditional	opt-in、credential scope、redaction、budget
`waiver_policy`	conditional	owner、reason、expiry rules
`verdict`	after run	status 和 evidence refs

标准 report envelope

可移植 QC 报告应回答：

Field	Question
Scope	判断的是哪个 change、release 或 regression sweep？
Profiles	哪些 project profiles 适用？
Surfaces	哪些用户/运维表面被触碰？
Required gates	哪些 gates 必须运行，为什么？
Executed gates	跑了哪些 commands、CI jobs、qcloop runs 或 reviews？
Evidence refs	logs、traces、screenshots、transcripts、reports、reviews 在哪里？
Verdicts	哪些 case passed、failed、blocked、exhausted、waived 或 needs review？
Remaining risk	还有什么不应宣称完成？
Next action	修复、重跑、review、release 还是 waive？

标准自身的验证案例

项目只有能表达以下案例，才能宣称兼容 Agent QC：

Codex-like runtime permission denial，带 CLI transcript、protocol event 和 TUI row。
Claude Code-like remote permission request，带 WebSocket/control transcript 和 TUI prompt。
OpenClaw-like channel webhook replay，带 media fixture 和 redacted credential policy。
Hermes-like scheduler restart，带 deterministic time、checkpoint 和 duplicate-work proof。
Desktop GUI native-bridge change，带 bridge health、workspace readiness、screenshot 和 command-contract proof。
Browser automation flow，带 DOM/a11y、screenshot、console/network 和 cleanup evidence。
Release smoke，带 package manifest、clean install 和 platform note。
Semantic eval regression，带 rubric、judge output、baseline delta 和 reviewer note。

流程与分类 ​

核心契约 ​

生命周期概览 ​

分类维度 ​

Project profile ​

Interaction surface ​

Gate family ​

Evidence kind ​

Verdict status ​

Fact owners ​

标准 case envelope ​

标准 report envelope ​

标准自身的验证案例 ​