验收场景

Agent QC 验证行为和证据，而不是只验证仓库形状。可用于手工 QA、自动化测试、qcloop batch、CI gates 或 release review。

只有证据证明行为时，场景才通过。缺证据的场景应是 blocked、exhausted、waived 或 needs-review，不是 passed。

1. Runtime CLI permission boundary

用户或测试触发 unsafe tool/command action。
Runtime 发出带 stable id 的 permission 或 policy decision。
动作被拒绝或需要审批。
没有 unauthorized side effect。
CLI/TUI/WebUI 显示 controlled error 或 pending approval。

通过条件：denied action 可见、可关联、无副作用。

证据：command transcript、policy event、side-effect check、可见时的 surface artifact。

2. Tool 或 MCP transport recovery

stdio/http/WebSocket tool server 断开或返回错误。
Runtime 暴露 failure 以及 recovery 或 terminal failure。
Tool state 不污染下一次调用。
UI/TUI 在 final answer text 外展示失败。

通过条件：recovery 和 failure 可检查，且不伪造成功。

证据：protocol transcript、retry log、tool id correlation、surface frame。

3. SDK/API contract drift

Public SDK 或 generated client shape 变化。
运行 schema/generation check。
fake server 或 fixture 验证新契约。
旧不兼容行为已迁移或明确 versioned。

通过条件：contract drift 在 runtime 或 UI 声明前已 review。

证据：schema diff、generated artifact check、fake server transcript。

4. CLI stream final reconciliation

Runtime streaming partial text/tool events。
Runtime 发出 final message 或 terminal status。
CLI output 或 consumer 对 final content 去重/对账。
Exit code 与 terminal status 一致。

通过条件：没有重复 final text、隐藏 tool failure 或错误 exit status。

证据：stdout/stderr transcript、structured event sample、exit code。

5. TUI first status and interrupt

用户提交 prompt。
Listener 在 submit 前或 first runtime event 前绑定。
Runtime accepted 时 status 先于 answer text 出现。
支持时 interrupt/cancel 可用。
Interrupt 停止 run 且没有 orphan subprocesses。

通过条件：用户能看出 agent 活着，并能安全停止。

证据：pseudo-terminal transcript、terminal snapshot、runtime transcript、cleanup proof。

6. TUI tool and permission overlay

Runtime 发出带 stable tool id 的 tool start。
TUI 显示安全 input summary 和 progress。
Runtime 为高风险动作发出 action request。
用户 approve、reject、edit 或 answer。
TUI 只在 runtime confirmation 后标记 resolved。

通过条件：tool progress 和 approval state 可见、可关联、可审计。

证据：terminal snapshot、key sequence、action request/response transcript。

7. WebUI reload and stale state

用户打开 running 或最近 completed session。
WebUI 渲染 route shell 和当前 status。
Page reload 或 route revisit 不伪造成功。
缺失事实渲染为 unknown、unavailable、stale 或 blocked。

通过条件：reload/resume 保持 runtime truth 和安全 fallback states。

证据：browser trace、screenshot、console/network log、runtime state ref。

8. Desktop GUI bridge readiness

App shell 通过支持的 entrypoint 启动或复用。
判断页面前检查 bridge health。
证明 default workspace/session readiness。
用户可见流程带 screenshot/trace。
触碰 native command 时同步 command contracts。

通过条件：desktop readiness 不止由 component tests 证明。

证据：shell log、bridge health、workspace readiness、screenshot/trace、OS note。

9. Browser automation safety and cleanup

Agent 打开或控制 browser session。
测试记录 URL、viewport、provider 和 session scope。
DOM/a11y 与 screenshot 证明 observed state。
检查 console/network logs。
browser/tabs/processes 被关闭或有意复用。

通过条件：observation、safety 和 cleanup 都被证明。

证据：screenshot、DOM/a11y、console/network、cleanup/orphan proof。

10. Channel gateway auth and media

Channel adapter 接收带 auth context 和 media 的 webhook/message。
Gateway 在解析用户内容前验证 identity。
Media 按 policy 存储或拒绝。
Response transcript 已脱敏且可追踪。
如使用 live channel path，必须 opt-in。

通过条件：identity、media 和 response behavior 被证明且不泄密。

证据：webhook replay、media fixture、redacted transcript、auth decision。

11. Queue and steer distinction

Run 正在 active。
用户发送另一个 prompt 或 control action。
系统区分 queue-next 和 steer-current。
Runtime 发出稳定 queued/steer ids。
Surface 显示 pending state 和最终 resolution。

通过条件：用户能区分“稍后运行”和“改变当前运行”。

证据：runtime events、UI/TUI snapshot、queue state transcript。

12. Artifact handoff and evidence export

Runtime 创建或更新 artifact。
UI/CLI 链接 compact artifact reference。
Artifact details 通过 artifact service 或 durable path 打开。
Evidence export 创建 durable refs。
Report 把 artifact/evidence ids 连接到 producing case。

通过条件：deliverables 和 evidence 离开 chat body，成为可追踪 artifacts。

证据：artifact path/id、export log、screenshot/report link。

13. Old-session recovery

用户打开 old session/task/thread。
Shell 或 summary 不等 full history 就出现。
Recent messages/status 先于 heavy details hydrate。
Tool output、artifacts、evidence 按需加载。
Stale 或 missing facts 保持显式。

通过条件：old sessions 可用，且不猜测 missing truth。

证据：timing metrics、screenshot、hydration log、cursor/page refs。

14. Background scheduler restart

Scheduled/background task 启动并写 checkpoint 或 lease。
Owner 中断或 process restart。
New owner 按 policy reclaim 或 resume。
防止 duplicate 和 lost work。
Final state 包含 cleanup 和 ownership evidence。

通过条件：restart 不重复、不丢失、不隐藏工作。

证据：deterministic clock/env、checkpoint、lease timeline、worker logs。

15. Parallel worker fanout/fanin

Coordinator 启动多个 independent workers/subagents/tasks。
每个 worker 有 stable id、role、parent 和 status。
Partial success、failure、retry、wait states 保持可见。
Final synthesis 链接 worker results，且不改写 authorship。

通过条件：parallel work 可见、可恢复、可审计。

证据：delegation graph、worker transcripts、final evidence refs。

16. Remote agent or teammate handoff

Runtime 连接 remote agent 或把 work 交给另一个 teammate。
UI/TUI 显示 remote task id、owner、reason、auth/input needs 和 status。
Input/auth required states 提升为 user controls。
idle/transient state 不被当成 completion。

通过条件：remote ownership 和 completion truth 被保留。

证据：remote protocol transcript、task card snapshot、handoff log。

17. Eval regression and report UI

Prompt/eval suite 对 current behavior 和 baseline 运行。
记录 rubric 和 judge/model settings。
Report 显示 pass/fail examples 和 baseline delta。
Reviewer 可以检查 raw outputs 和 waivers。

通过条件：semantic quality claim 有可比较证据支撑。

证据：dataset/rubric、judge output、baseline delta、report screenshot/export。

18. Distribution install smoke

Release package/image 已构建。
Clean environment 安装或启动它。
Version/help/minimal runtime command 可用。
Package contents 匹配 manifest。
记录平台限制。

通过条件：shipped artifact 在 source tree 外可用。

证据：package manifest、install log、Docker/OS matrix、version output。

19. Live provider opt-in

Case 声明 live provider/channel/model requirement。
Credentials 已 scope 且 redacted。
记录 budget/timeout。
Request/response 或 provider transcript 安全存储。
Failure 不被 retry 到不可见。

通过条件：live behavior 被证明，且不污染 deterministic lanes。

证据：opt-in flag、redacted transcript、budget note、provider id。

20. qcloop repeated QC

Plan 创建 independent qcloop items。
每个 item 包含 profile、surface、gates、expected result 和 evidence policy。
Attempts 和 verifier rounds 被保留。
Exhausted items 保持 exhausted，不是泛化 failed。
Aggregate report 写清 remaining risk。

通过条件：重复提高覆盖率，但不隐藏 required project gates。

证据：qcloop job id、item values、attempts、verifier feedback、verdict refs。

21. Waiver and blocked path

Required gate 无法运行或被明确 defer。
Report 记录 missing fact、owner、scope 和 risk。
Waiver 包含 approver、reason、expiry 和 follow-up。
Release 或 next action 不把 waived gate 称作 passed。

通过条件：不完整证明可见且可追责。

证据：waiver object、blocker note、replacement evidence、follow-up link。

Scenario selection guide

Project shape	Must include
Codex-like runtime CLI	scenarios 1, 2, 4, 5, 18
Claude Code-like TUI runtime	scenarios 5, 6, 11, 16, 21
OpenClaw-like channel/WebUI gateway	scenarios 7, 9, 10, 17, 18, 19
Hermes-like background/browser agent	scenarios 9, 14, 15, 18, 19
Desktop GUI / native bridge	scenarios 7, 8, 9, 12, 21
Eval/QA lab	scenarios 17, 20, 21

验收场景 ​

1. Runtime CLI permission boundary ​

2. Tool 或 MCP transport recovery ​

3. SDK/API contract drift ​

4. CLI stream final reconciliation ​

5. TUI first status and interrupt ​

6. TUI tool and permission overlay ​

7. WebUI reload and stale state ​

8. Desktop GUI bridge readiness ​

9. Browser automation safety and cleanup ​

10. Channel gateway auth and media ​

11. Queue and steer distinction ​

12. Artifact handoff and evidence export ​

13. Old-session recovery ​

14. Background scheduler restart ​

15. Parallel worker fanout/fanin ​

16. Remote agent or teammate handoff ​

17. Eval regression and report UI ​

18. Distribution install smoke ​

19. Live provider opt-in ​

20. qcloop repeated QC ​

21. Waiver and blocked path ​

Scenario selection guide ​