Acceptance scenarios

A compatible Agent Runtime should pass behavior-level checks.

Submit and first status

Given a client submits a turn, the runtime emits an accepted or queued fact before any long model output. The thread read model shows preparing, queued, or running instead of remaining unknown.

Tool call with large result

Given a tool returns a large payload, the event stream includes a safe preview and output ref. The final answer does not become the only storage location for the result.

Approval gate

Given a tool needs approval, the runtime emits action.required with a stable action id. Execution does not continue until respond_action resolves the request.

Queue and resume

Given a busy thread receives another turn, the runtime either rejects it with policy detail or stores it as a queued turn. After restart, the queue is still visible and can be resumed or reordered.

Context compaction

Given a long thread requires compaction, the runtime emits compaction start and completed events and keeps a boundary snapshot. Unresolved requests and evidence refs survive compaction.

Subagent delegation

Given a parent turn spawns a child agent, parent and child ids are linked. Waiting, sending input, failure, and close events are visible without parsing child prose.

Agent task retry and graph

Given a long-running task creates child work and then fails, the runtime preserves the original task_id, records the failed attempt with run_id, keeps parent/dependency edges, and starts a new attempt on retry instead of overwriting history.

Evidence export

Given a completed or failed turn, evidence export includes runtime summary, timeline, tool failures, artifact refs, and verification/review refs when available.

Old session recovery

Given an old session with many messages, the runtime returns a shell and recent window first, then exposes older history through cursors and details on demand.

Permission and sandbox fact

Given a shell or file tool requests a write, the runtime records permission.evaluated and sandbox.applied first. If denied or out of bounds, the thread read model explains whether a rule, mode, hook, host policy, or sandbox violation caused it.

Hook input rewrite

Given a pre_tool_use hook rewrites tool input or blocks the tool, the stream contains hook.started / hook.completed, updated input ref, or block reason. The final tool result must not hide the hook decision.

Process stdin and output spill

Given a long-running process needs stdin, write_process_stdin emits process.input. When stdout/stderr exceed budget, runtime emits output.spilled or output.truncated and preserves an output ref.

Model routing and single candidate

Given only one model candidate is available, runtime emits routing.single_candidate and the read model shows candidate count, decision reason, capability gap, cost state, and limit state.

Quota or rate limit

Given provider or host limits are hit, runtime emits rate_limit.hit, quota.low, or quota.blocked, and joins request logs to turn ids in evidence.

Remote channel reconnect

Given a remote client reconnects, runtime returns a snapshot first and continues from the last acknowledged sequence. If replay is unavailable, it emits snapshot.repaired and marks stale or unavailable facts.

Job item retry

Given one item in a batch job fails, the job keeps job/item status separately, allows retrying only that item, and preserves attempt count and last error.

Acceptance scenarios ​

Submit and first status ​

Tool call with large result ​

Approval gate ​

Queue and resume ​

Context compaction ​

Subagent delegation ​

Agent task retry and graph ​

Evidence export ​

Old session recovery ​

Permission and sandbox fact ​

Hook input rewrite ​

Process stdin and output spill ​

Model routing and single candidate ​

Quota or rate limit ​

Remote channel reconnect ​

Job item retry ​