Skip to content

门禁矩阵

门禁矩阵把 Agent 项目 profiles、surfaces 和风险改动映射到验证 gates。它定义报告宣称通过前所需的最低证据。

Gate 名称是 family,不是框架命令。项目需要把每个 family 映射到本地脚本、CI job、qcloop item 或 review workflow。

Profile defaults

ProfileMinimum gate familiesOptional escalation gates
agent-runtime-clistatic, unit, contract-protocol, runtime-e2eproperty-fuzz, stress-concurrency, live-provider, distribution-release
agent-sdk-apistatic, unit, contract-protocol, fake-integrationdistribution-release, live-provider, semantic-eval
agent-tool-mcp-gatewaycontract-protocol, fake-integration, runtime-e2estress-concurrency, live-provider, review, property-fuzz
multi-channel-agent-gatewaystatic, unit, contract-protocol, fake-integrationlive-provider, distribution-release, semantic-eval, stress-concurrency
agent-ui-tui-desktopstatic, unit, ui-interactionruntime-e2e, contract-protocol, live-provider, review, stress-concurrency
agent-skills-pluginsstatic, contract-protocol, fake-integrationdistribution-release, review, semantic-eval, live-provider
background-agent-schedulerunit, fake-integration, stress-concurrencyruntime-e2e, live-provider, review, distribution-release
agent-distribution-releasestatic, distribution-releaseruntime-e2e, live-provider, review, stress-concurrency
agent-evals-qualitysemantic-eval, reviewlive-provider, stress-concurrency, distribution-release

Surface add-ons

如果 case 命名 surface,需要在 profile default 上追加 surface evidence。

SurfaceMinimum add-onStronger proof
cli-streamcommand log、exit status、stdout/stderr transcriptstructured event assertion、malformed stream fixture、cleanup proof
tuiterminal snapshot、viewport、key sequencemulti-viewport、ANSI/Unicode、interrupt、approval、runtime transcript
webuiscreenshot 或 browser trace、console logPlaywright trace、a11y/DOM snapshot、reload/resume、network log
desktop-guishell start、bridge health、screenshotworkspace readiness、native command contract、OS matrix、trace
browser-automationscreenshot 和 DOM/a11y snapshotconsole/network、SSRF/navigation safety、orphan cleanup、trace/video
channel-uiwebhook/channel transcript、auth proofmedia fixture、replay、device/emulator log、live opt-in lane
eval-uirubric、judge output、report exportbaseline delta、reviewer annotation、failing examples、dashboard screenshot

Change-risk escalation

改动触碰以下风险时升级 gates:

Risk touchedAdd gates
permission、sandbox、credential 或 secret handlingcontract-protocolruntime-e2ereview;path/parser 边界再加 property-fuzz
protocol、schema、generated client、command 或 manifest shapecontract-protocolfake-integration、generated artifact drift check
persistent state、migration、queue 或 schedulerunitruntime-e2estress-concurrency、recovery evidence
user-visible GUI/TUI/WebUI/desktop behaviorui-interaction、surface evidence、stable regression
browser automation 或 remote browser providerbrowser-automation surface proof、cleanup、console/network、safety fixtures
webhook、chat channel、mobile、QR 或 media flowchannel-ui、auth/media replay、redaction、可选 live-provider
package/install/release metadatadistribution-release、clean install、manifest、version/lock consistency
live provider、external network API 或 model backend显式 live-provider、credential scope、budget、redaction
model prompt、rubric、eval 或 judge behaviorsemantic-evalreview、baseline delta、examples
multi-agent、subagent、background 或 remote teammate workruntime-e2estress-concurrency、surface/task evidence

Minimal and strong gates

ClaimMinimal gateStronger gate
"Runtime command works"command log 和 exit statusfake provider transcript、structured events、cleanup proof
"Tool/MCP bridge works"schema/contract checkfake server recovery、permission denial、stdio/http disconnect
"TUI approval works"terminal snapshotkey sequence、runtime action request/response transcript、cancel/reconnect variants
"WebUI flow works"component assertionbrowser trace、console/network、a11y、reload/resume
"Desktop app works"shell startbridge health、workspace readiness、native command contract、screenshot
"Browser automation works"screenshotDOM/a11y、console/network、cleanup、safety fixtures
"Channel adapter works"contract fixturewebhook replay、media、redaction、live opt-in
"Scheduler works"deterministic unitrestart/reclaim、duplicate-work proof、race/stress
"Package is releasable"build outputclean install、package manifest、Docker/OS matrix、supply-chain
"Model quality improved"one rubric passbaseline delta、judge output、human review、failing examples

Evidence minimums

  • static gates 需要 command logs、CI URLs 或 SARIF-style reports。
  • contract-protocol gates 需要 schema/contract reports、transcript refs 或 failing ids。
  • runtime-e2e gates 需要 CLI/runtime transcripts、state snapshots 或 process-cleanup proof。
  • ui-interaction gates 需要 stable assertions 加 screenshots、traces、videos、terminal snapshots 或 accessibility output。
  • live-provider gates 需要 redacted request/response refs、credential scope 和 budget/cost notes。
  • distribution-release gates 需要 package manifests、install output、Docker smoke 或 OS matrix proof。
  • semantic-eval gates 需要 rubric、model/judge outputs、baseline delta 和 waiver threshold。

Framework mapping examples

EcosystemGate mapping
Rust/Codex-likecargo nextest、targeted crate tests、Bazel test/build、schema fixture writers、fake model server、ratatui snapshots
JS/OpenClaw-likeVitest projects、changed-test router、contract configs、live configs、Docker smoke、QA Lab report lanes
Python/Hermes-likepytest markers、xdist、默认排除 integration、credential blanking、e2e directory、ruff/ty
Desktop GUI / native bridgelocal verify、command contracts、bridge health、GUI smoke、Playwright continuation、native backend tests

Anti-patterns

Anti-patternWhy it fails
一个 npm test checkbox 覆盖所有 profiles隐藏 surface/live/release 风险
screenshot-only UI pass没有 runtime backing
contract-only tool pass没有 runtime recovery proof
live provider 放在默认 unit lane默认 flaky 且不安全
release build 没有 install smokepackage 可能不可用
waiver 没有 owner/expiry风险无边界

Draft standard for evidence-driven quality control of Agent projects.