Skip to content

流程与分类

本页是 Agent QC 的完整生命周期和分类参考。它沿用 Agent UI 的规范写法:明确维度、字段、约束、生命周期阶段和验证案例。

核心契约

Agent QC 是 Agent 项目质量的证据协议。兼容的 QC 计划会分类拥有的风险、选择门禁、执行检查、保存证据并输出 verdict,而不是把模型文字当证明。

兼容 QC 报告必须:

  • 分类一个或多个 project profiles;
  • 涉及用户可见行为时命名 interaction surfaces;
  • 把每个 required gate 映射到本地命令、CI job、qcloop item 或 review step;
  • 为每个 pass/fail/blocked/exhausted/waived verdict 保留可检查 evidence refs;
  • 分离 deterministic、runtime、surface、live-provider、release 和 semantic-eval 声明;
  • 明确写出限制和 waivers。

兼容 QC 报告不能:

  • 把最终 assistant answer 当作证据,除非它链接了证据工件;
  • 从 UI 文本推断 runtime 成功;
  • 把 live-provider 调用藏进默认确定性测试;
  • 把 screenshot、trace、terminal snapshot、protocol transcript 混成一句“UI 已检查”;
  • 在 required evidence 缺失时宣称 gate 通过。

生命周期概览

text
change or release scope
  -> classify profiles
  -> identify touched surfaces
  -> assign fact owners and risk owners
  -> select gate lanes
  -> write behavior-level cases
  -> execute deterministic gates
  -> execute runtime and surface gates
  -> opt into live/release/eval gates when required
  -> collect evidence refs
  -> issue verdicts
  -> publish report, waivers, and next action

这个流程适用于 CLI agents、SDK、MCP/tool gateways、channel bots、TUI/GUI/WebUI products、browser automation systems、schedulers、skills/plugins、distribution packages 和 eval suites。

分类维度

Project profile

Profile 描述项目拥有的形态。

Profile拥有默认风险
agent-runtime-cliagent loop、CLI、task execution、sandbox、tools、resumestream drift、permissions、subprocess cleanup、resume consistency
agent-sdk-apipublic SDK、generated client、API wrapperssignature drift、async cancellation、fake-server behavior
agent-tool-mcp-gatewaytool declarations、MCP/ACP bridge、connector runtimeprotocol conformance、stdio/http recovery、resource permission
multi-channel-agent-gatewaychat/channel adapters、webhooks、auth、mediaidentity、webhook verification、media routing、secret redaction
agent-ui-tui-desktopGUI、TUI、desktop shell、browser-visible flowsprojection drift、stale success、bridge readiness、screenshots/traces
agent-skills-pluginsskills、plugins、manifests、loaders、marketplacemanifest drift、package boundary、trust policy、fixture install
background-agent-schedulercron、queues、workers、retries、long-running agentsduplicate work、lost checkpoints、race、stuck loop
agent-distribution-releasepackage、Docker、installers、cross-platform releasemissing files、broken clean install、lock drift、supply chain
agent-evals-qualitytask quality、model behavior、rubrics、generated outputsprompt drift、judge instability、baseline regression、grounding gap

Interaction surface

Surface 描述行为在哪里被观察到。

Surface使用场景必需证据
cli-streamstdout/stderr、JSONL/NDJSON、command UIcommand、exit status、transcript、structured sample
tuiterminal UI、Ink、ratatui、cursesviewport、key sequence、terminal snapshot、runtime transcript
webuibrowser dashboard、extension UI、QA/admin consolescreenshot/trace、console log、route/state assertion
desktop-guiTauri、Electron、native shellshell start、bridge health、workspace/session readiness、OS note
browser-automationCDP、Playwright、browser-use、remote browserDOM/a11y、screenshot、console/network、cleanup proof
channel-uichat app、QR、mobile、webhook-visible flowschannel transcript、media fixture、auth/webhook replay、redaction
eval-uiQA dashboards 和 eval reportsrubric、judge output、baseline delta、reviewer note

Gate family

Gate family 描述验证方式,不是框架名称。

Family默认用途升级条件
staticformat、lint、type、schema、dependency hygienegenerated files 或 policy boundaries 变化
unitdeterministic local behavioralgorithms、parsers、reducers、adapters 变化
property-fuzzinvariants 和 generated inputparser、sandbox、path、protocol、serializer 风险高
contract-protocolschema/API/command/tool surfaceswire shape、manifest、command 或 SDK shape 变化
fake-integrationlocal fake server 或 adapter flow外部 API 行为被模拟
runtime-e2e无 live provider 风险的真实 CLI/task/sessionloop、tool、permission、resume、subprocess flow 变化
ui-interactionGUI/TUI/WebUI/browser/channel 可见行为用户或运维会看到改动
live-provider显式 opt-in 的真实网络/model/channel pathprovider/channel 行为属于声明的一部分
stress-concurrencyraces、queue、leases、retries、long runsscheduler、parallel agents、workers 或 locks 变化
distribution-releasepackage/install/Docker/OS matrix有任何对外发布物变化
semantic-evaltask quality、prompt、rubric、judgemodel 行为或输出质量是产品本身
reviewhuman/LLM review需要 safety、policy、UX 或语义判断

Evidence kind

Kind例子必须包含
command-logshell output、CI step、cargo/npm/pytest/vitest outputcommand、exit status、environment note
test-reportJUnit、JSON、coverage、HTML reportsuite id、failing ids、artifact path 或 URL
protocol-transcriptfake server、MCP/ACP、WebSocket、HTTP transcriptrequest/response refs、redaction note
runtime-transcriptCLI JSONL、TUI-linked events、session staterun/session ids、event order、cleanup
surface-artifactscreenshot、video、Playwright trace、terminal snapshotviewport/device/OS、action sequence
browser-diagnosticconsole、network、DOM/a11y snapshotroute、selector 或 accessibility assertion
release-artifactpackage manifest、tarball list、Docker smokeversion、platform、install command
eval-artifactrubric、judge output、baseline diffdataset、model/judge、threshold
review-notehuman 或 LLM reviewreviewer、scope、evidence refs、decision
qcloop-runattempt 和 QC round refsitem value、attempt id、verifier feedback

Verdict status

Status含义必需字段
passed证据证明所有 required expectationsevidence refs 和 scope
failed证据否定 expectation 或 gate 失败最小可行动失败和 evidence
blocked缺环境、凭证、依赖、fixture 或 binary,无法判断blocker 和 owner
exhausted尝试或预算结束但没有证明attempt refs 和 remaining uncertainty
waived负责人接受已知缺口approver、reason、scope、expiry
needs-review有证据但还需要语义/安全 reviewreviewer 或 review queue
skipped当前 scope 不适用reason 和 scope

Fact owners

Agent QC 应说明每个事实由谁拥有,而不是让报告拥有一切。

Owner拥有QC 职责
Runtimetask/session/tool/permission state采集 transcript 和 state refs
Protocol/SDKschemas、generated clients、adapters采集 contract diff 和 fake transcript
UI projectionvisible rendering 和 user controls采集 surface artifact 并连接 runtime
Evidence servicedurable traces、replay、reviews链接 evidence ids 和 export jobs
Policy/securityapprovals、waivers、credentials、retention记录 risk decision 和 scope
Artifact/releasedeliverables、package contents、versions采集 manifest 和 install proof
Schedulerleases、checkpoints、retries、workers采集 timeline 和 duplicate-work proof
Eval systemrubrics、judge outputs、baselines采集 dataset、threshold 和 deltas

标准 case envelope

即使 JSON schema 允许扩展,可移植的 qc_case 也应携带这些字段。

FieldRequiredPurpose
idyesstable case id
project_profileyestaxonomy 中的一个 profile
surfacevisible case 推荐observation surface
targetyesfile、command、package、flow、API 或 release target
risk_ownerrecommendedruntime、protocol、UI、scheduler、release、eval、policy
required_gatesyes要满足的 gate families
stepsyes可复现命令或交互
expectedyes行为级期望
required_evidenceyesverdict 需要的工件
live_policyconditionalopt-in、credential scope、redaction、budget
waiver_policyconditionalowner、reason、expiry rules
verdictafter runstatus 和 evidence refs

标准 report envelope

可移植 QC 报告应回答:

FieldQuestion
Scope判断的是哪个 change、release 或 regression sweep?
Profiles哪些 project profiles 适用?
Surfaces哪些用户/运维表面被触碰?
Required gates哪些 gates 必须运行,为什么?
Executed gates跑了哪些 commands、CI jobs、qcloop runs 或 reviews?
Evidence refslogs、traces、screenshots、transcripts、reports、reviews 在哪里?
Verdicts哪些 case passed、failed、blocked、exhausted、waived 或 needs review?
Remaining risk还有什么不应宣称完成?
Next action修复、重跑、review、release 还是 waive?

标准自身的验证案例

项目只有能表达以下案例,才能宣称兼容 Agent QC:

  1. Codex-like runtime permission denial,带 CLI transcript、protocol event 和 TUI row。
  2. Claude Code-like remote permission request,带 WebSocket/control transcript 和 TUI prompt。
  3. OpenClaw-like channel webhook replay,带 media fixture 和 redacted credential policy。
  4. Hermes-like scheduler restart,带 deterministic time、checkpoint 和 duplicate-work proof。
  5. Desktop GUI native-bridge change,带 bridge health、workspace readiness、screenshot 和 command-contract proof。
  6. Browser automation flow,带 DOM/a11y、screenshot、console/network 和 cleanup evidence。
  7. Release smoke,带 package manifest、clean install 和 platform note。
  8. Semantic eval regression,带 rubric、judge output、baseline delta 和 reviewer note。

Draft standard for evidence-driven quality control of Agent projects.