Skip to content

Replay case

A replay case describes what is needed to reconstruct or approximate an agent run.

Replay record

FieldPurpose
replay_idStable replay id.
scopeSession, task, run, turn, artifact, review, or export scope.
input_refsUser input, attachments, context, model config, tool args, and policy refs.
snapshot_refsRuntime, context, tool inventory, policy, source, and artifact snapshots.
trace_refsTrace ids, span ids, logs, metrics, or external telemetry refs.
determinismdeterministic, approximate, non_deterministic, or unavailable.
missing_factsFacts needed but unavailable, expired, redacted, not collected, or not applicable.
expected_outputsClaims, artifacts, checks, diffs, hashes, or summaries to compare.
replay_stepsOptional ordered instructions or machine-readable steps.

Replay cases SHOULD be honest about non-deterministic model output and unavailable external services. They are evidence for reconstruction, not a guarantee that future output will match byte-for-byte.

Replay outcomes

A replay attempt SHOULD record whether it matched expected claims, artifact hashes, verification results, or review conditions. A mismatch is evidence, not an automatic failure of the original pack.

Draft standard for portable agent evidence, provenance, review, and replay.