Appearance
Runtime profile test cases
Use these cases when Agent Evidence consumes Agent Runtime, Lime AgentRuntime Profile, or an equivalent runtime spine. The goal is to prove that evidence, replay, review, and audit exports all consume the same runtime facts instead of rebuilding independent summaries.
Canonical boundary
text
RuntimeEvent / ThreadReadModel / TaskSnapshot
-> EvidencePack / ReplayCase / ReviewRecord / ExportManifest
-> Agent UI timeline, review lane, replay lane, and audit entrypoints1
2
3
2
3
Agent Runtime owns execution facts. Agent Evidence owns portable evidence packaging, provenance, verification, review, replay, redaction, and export. Evidence may summarize runtime facts, but it must not invent runtime status or known gaps.
Required runtime correlation
Evidence records SHOULD preserve these ids when available:
| Field | Purpose |
|---|---|
runtime_id / session_id / thread_id / turn_id | Join evidence to the execution spine. |
task_id / run_id / attempt_id | Join evidence to task attempts and retry history. |
step_id / tool_call_id / action_id | Join evidence to tools, actions, permission waits, and failures. |
artifact_id / context_id / policy_decision_id | Join evidence to adjacent owners. |
trace_id / span_id | Join evidence to telemetry when collected. |
evidence_pack_id / replay_id / review_id / export_id | Join downstream evidence artifacts. |
Test cases
| ID | Case | Input facts | Expected result |
|---|---|---|---|
| AEV-AR-ID-001 | Evidence pack preserves runtime spine | Completed or failed turn | Pack scope includes session_id/thread_id/turn_id and applicable task/run ids. |
| AEV-AR-ID-002 | Failed attempt remains visible | task.attempt.failed then retry | Evidence timeline includes both failed and retried attempts. |
| AEV-AR-TOOL-001 | Tool failure is auditable | tool.failed with tool_call_id | Evidence links failure category, output refs, and telemetry refs without losing tool_call_id. |
| AEV-AR-ACTION-001 | Permission denial is not success | action.required -> action.resolved(deny) | Evidence records denied decision and does not claim tool execution success. |
| AEV-AR-ROUTE-001 | Routing decision is explainable | routing.single_candidate or routing.decided | Evidence includes selected model, decision source, and cost/limit refs if available. |
| AEV-AR-REPLAY-001 | Replay uses same source facts | Evidence pack already exists | Replay case points to pack/source ids instead of rebuilding a second timeline. |
| AEV-AR-REVIEW-001 | Review uses same source facts | Review generated from pack | Review verdict references evidence ids and runtime scope; it does not create parallel status truth. |
| AEV-AR-GAP-001 | Known gaps are only applicable gaps | No matching request telemetry | Pack records empty/unavailable telemetry summary, not fake unlinked evidence. |
| AEV-AR-REDACT-001 | Redaction preserves auditability | Sensitive tool/context output | Redacted pack keeps redaction reason, policy refs, and safe source refs. |
| AEV-AR-EXPORT-001 | Export manifest is complete | Evidence export requested | Manifest lists pack, replay, review, schema, redaction, and runtime scope refs. |
Failure cases
These are incompatible with Agent Evidence:
- Replay, review, and evidence each construct different timelines for the same turn.
- A missing telemetry relationship is exported as a fabricated session-level evidence item.
- Evidence claims a tool succeeded when runtime only has a denied action or failed tool event.
- Redaction removes the reason and owner, making the audit trail unverifiable.
- Evidence pack lacks enough runtime ids to join back to the originating turn or task.
Minimum validation set
For Agent Runtime-compatible integration, run at least:
- AEV-AR-ID-001 and AEV-AR-ID-002.
- AEV-AR-TOOL-001, AEV-AR-ACTION-001, and AEV-AR-ROUTE-001.
- AEV-AR-REPLAY-001 and AEV-AR-REVIEW-001.
- AEV-AR-GAP-001 and AEV-AR-EXPORT-001.
Runtime proves what happened. Agent Evidence proves how that runtime fact can be trusted, replayed, reviewed, redacted, and exported.