Appearance
Acceptance scenarios
Agent QC validates behavior and evidence, not repository shape alone. Use these scenarios for manual QA, automated tests, qcloop batches, CI gates, or release review.
A scenario passes only when evidence proves the behavior. A scenario with missing evidence is blocked, exhausted, waived, or needs-review, not passed.
1. Runtime CLI permission boundary
- User or test triggers an unsafe tool/command action.
- Runtime emits a permission or policy decision with stable id.
- The action is denied or requires approval.
- No unauthorized side effect occurs.
- CLI/TUI/WebUI shows a controlled error or pending approval.
Pass condition: denied action is visible, correlated, and side-effect-free.
Evidence: command transcript, policy event, side-effect check, surface artifact when visible.
2. Tool or MCP transport recovery
- A stdio/http/WebSocket tool server disconnects or returns an error.
- Runtime surfaces failure and recovery or terminal failure.
- Tool state does not corrupt the next call.
- UI/TUI shows failure outside final answer text.
Pass condition: recovery and failure are inspectable and do not invent success.
Evidence: protocol transcript, retry log, tool id correlation, surface frame.
3. SDK/API contract drift
- Public SDK or generated client changes shape.
- Schema/generation check runs.
- Fake server or fixture verifies the new contract.
- Old incompatible behavior is either migrated or explicitly versioned.
Pass condition: contract drift is reviewed before runtime or UI claims.
Evidence: schema diff, generated artifact check, fake server transcript.
4. CLI stream final reconciliation
- Runtime streams partial text/tool events.
- Runtime emits final message or terminal status.
- CLI output or consumer reconciles final content without duplication.
- Exit code matches terminal status.
Pass condition: no duplicate final text, hidden tool failure, or wrong exit status.
Evidence: stdout/stderr transcript, structured event sample, exit code.
5. TUI first status and interrupt
- User submits a prompt.
- Listener binds before submit or before the first runtime event.
- Runtime status appears before first answer text when accepted.
- Interrupt/cancel is available when supported.
- Interrupt stops the run without orphan subprocesses.
Pass condition: the user can tell the agent is alive and can stop it safely.
Evidence: pseudo-terminal transcript, terminal snapshot, runtime transcript, cleanup proof.
6. TUI tool and permission overlay
- Runtime emits tool start with stable tool id.
- TUI shows safe input summary and progress.
- Runtime emits action request for a high-risk operation.
- User approves, rejects, edits, or answers.
- TUI marks resolved only after runtime confirmation.
Pass condition: tool progress and approval state are visible, correlated, and auditable.
Evidence: terminal snapshot, key sequence, action request/response transcript.
7. WebUI reload and stale state
- User opens a running or recently completed session.
- WebUI renders route shell and current status.
- Page reload or route revisit does not fabricate success.
- Missing facts render as
unknown,unavailable,stale, orblocked.
Pass condition: reload/resume preserves runtime truth and safe fallback states.
Evidence: browser trace, screenshot, console/network log, runtime state ref.
8. Desktop GUI bridge readiness
- App shell starts or is reused through the supported entrypoint.
- Bridge health is checked before judging the page.
- Default workspace/session readiness is proven.
- A user-visible flow runs with screenshot/trace.
- Native command contracts are synchronized when touched.
Pass condition: desktop readiness is proven beyond component tests.
Evidence: shell log, bridge health, workspace readiness, screenshot/trace, OS note.
9. Browser automation safety and cleanup
- Agent opens or controls a browser session.
- Test records URL, viewport, provider, and session scope.
- DOM/a11y and screenshot evidence prove the observed state.
- Console/network logs are inspected.
- Browser/tabs/processes are closed or intentionally reused.
Pass condition: observation, safety, and cleanup are all proven.
Evidence: screenshot, DOM/a11y, console/network, cleanup/orphan proof.
10. Channel gateway auth and media
- Channel adapter receives a webhook/message with auth context and media.
- Gateway verifies identity before parsing user content.
- Media is stored or rejected by policy.
- Response transcript is redacted and traceable.
- Live channel path is opt-in if used.
Pass condition: identity, media, and response behavior are proven without leaking secrets.
Evidence: webhook replay, media fixture, redacted transcript, auth decision.
11. Queue and steer distinction
- A run is active.
- User sends another prompt or control action.
- System distinguishes queue-next from steer-current.
- Runtime emits stable queued/steer ids.
- Surface shows pending state and final resolution.
Pass condition: users can distinguish "run later" from "change current run".
Evidence: runtime events, UI/TUI snapshot, queue state transcript.
12. Artifact handoff and evidence export
- Runtime creates or updates an artifact.
- UI/CLI links compact artifact reference.
- Artifact details open through artifact service or durable path.
- Evidence export creates durable refs.
- Report links artifact/evidence ids to the producing case.
Pass condition: deliverables and evidence leave the chat body and become traceable artifacts.
Evidence: artifact path/id, export log, screenshot/report link.
13. Old-session recovery
- User opens old session/task/thread.
- Shell or summary appears without full history blocking first paint.
- Recent messages/status hydrate before heavy details.
- Tool output, artifacts, and evidence load on demand.
- Stale or missing facts remain explicit.
Pass condition: old sessions are usable and do not guess missing truth.
Evidence: timing metrics, screenshot, hydration log, cursor/page refs.
14. Background scheduler restart
- Scheduled/background task starts and writes checkpoint or lease.
- Owner is interrupted or process restarts.
- New owner reclaims or resumes according to policy.
- Duplicate and lost work are prevented.
- Final state includes cleanup and ownership evidence.
Pass condition: restart does not duplicate, lose, or hide work.
Evidence: deterministic clock/env, checkpoint, lease timeline, worker logs.
15. Parallel worker fanout/fanin
- Coordinator starts multiple independent workers/subagents/tasks.
- Each worker has stable id, role, parent, and status.
- Partial success, failure, retry, and wait states remain visible.
- Final synthesis links worker results without rewriting authorship.
Pass condition: parallel work is visible, resumable, and auditable.
Evidence: delegation graph, worker transcripts, final evidence refs.
16. Remote agent or teammate handoff
- Runtime connects to remote agent or hands work to another teammate.
- UI/TUI shows remote task id, owner, reason, auth/input needs, and status.
- Input/auth required states are promoted to user controls.
- Idle/transient state is not treated as completion.
Pass condition: remote ownership and completion truth are preserved.
Evidence: remote protocol transcript, task card snapshot, handoff log.
17. Eval regression and report UI
- Prompt/eval suite runs against current behavior and baseline.
- Rubric and judge/model settings are recorded.
- Report shows pass/fail examples and baseline delta.
- Reviewer can inspect raw outputs and waivers.
Pass condition: semantic quality claim is backed by comparable evidence.
Evidence: dataset/rubric, judge output, baseline delta, report screenshot/export.
18. Distribution install smoke
- Release package/image is built.
- Clean environment installs or starts it.
- Version/help/minimal runtime command works.
- Package contents match manifest.
- Platform-specific limitations are recorded.
Pass condition: shipped artifact is usable outside the source tree.
Evidence: package manifest, install log, Docker/OS matrix, version output.
19. Live provider opt-in
- Case declares live provider/channel/model requirement.
- Credentials are scoped and redacted.
- Budget/timeout is recorded.
- Request/response or provider transcript is stored safely.
- Failure is not retried into invisibility.
Pass condition: live behavior is proven without contaminating deterministic lanes.
Evidence: opt-in flag, redacted transcript, budget note, provider id.
20. qcloop repeated QC
- Plan creates independent qcloop items.
- Each item includes profile, surface, gates, expected result, and evidence policy.
- Attempts and verifier rounds are preserved.
- Exhausted items remain
exhausted, not generic failed. - Aggregate report states remaining risk.
Pass condition: repetition improves coverage without hiding required project gates.
Evidence: qcloop job id, item values, attempts, verifier feedback, verdict refs.
21. Waiver and blocked path
- Required gate cannot run or is intentionally deferred.
- Report records missing fact, owner, scope, and risk.
- Waiver includes approver, reason, expiry, and follow-up.
- Release or next action does not call the waived gate passed.
Pass condition: incomplete proof is visible and accountable.
Evidence: waiver object, blocker note, replacement evidence, follow-up link.
Scenario selection guide
| Project shape | Must include |
|---|---|
| Codex-like runtime CLI | scenarios 1, 2, 4, 5, 18 |
| Claude Code-like TUI runtime | scenarios 5, 6, 11, 16, 21 |
| OpenClaw-like channel/WebUI gateway | scenarios 7, 9, 10, 17, 18, 19 |
| Hermes-like background/browser agent | scenarios 9, 14, 15, 18, 19 |
| Desktop GUI / native bridge | scenarios 7, 8, 9, 12, 21 |
| Eval/QA lab | scenarios 17, 20, 21 |