Skip to content

Runtime profile test cases

Use these cases when Agent Evidence consumes Agent Runtime, Lime AgentRuntime Profile, or an equivalent runtime spine. The goal is to prove that evidence, replay, review, and audit exports all consume the same runtime facts instead of rebuilding independent summaries.

Canonical boundary

text
RuntimeEvent / ThreadReadModel / TaskSnapshot
  -> EvidencePack / ReplayCase / ReviewRecord / ExportManifest
  -> Agent UI timeline, review lane, replay lane, and audit entrypoints

Agent Runtime owns execution facts. Agent Evidence owns portable evidence packaging, provenance, verification, review, replay, redaction, and export. Evidence may summarize runtime facts, but it must not invent runtime status or known gaps.

Required runtime correlation

Evidence records SHOULD preserve these ids when available:

FieldPurpose
runtime_id / session_id / thread_id / turn_idJoin evidence to the execution spine.
task_id / run_id / attempt_idJoin evidence to task attempts and retry history.
step_id / tool_call_id / action_idJoin evidence to tools, actions, permission waits, and failures.
artifact_id / context_id / policy_decision_idJoin evidence to adjacent owners.
trace_id / span_idJoin evidence to telemetry when collected.
evidence_pack_id / replay_id / review_id / export_idJoin downstream evidence artifacts.

Test cases

IDCaseInput factsExpected result
AEV-AR-ID-001Evidence pack preserves runtime spineCompleted or failed turnPack scope includes session_id/thread_id/turn_id and applicable task/run ids.
AEV-AR-ID-002Failed attempt remains visibletask.attempt.failed then retryEvidence timeline includes both failed and retried attempts.
AEV-AR-TOOL-001Tool failure is auditabletool.failed with tool_call_idEvidence links failure category, output refs, and telemetry refs without losing tool_call_id.
AEV-AR-ACTION-001Permission denial is not successaction.required -> action.resolved(deny)Evidence records denied decision and does not claim tool execution success.
AEV-AR-ROUTE-001Routing decision is explainablerouting.single_candidate or routing.decidedEvidence includes selected model, decision source, and cost/limit refs if available.
AEV-AR-REPLAY-001Replay uses same source factsEvidence pack already existsReplay case points to pack/source ids instead of rebuilding a second timeline.
AEV-AR-REVIEW-001Review uses same source factsReview generated from packReview verdict references evidence ids and runtime scope; it does not create parallel status truth.
AEV-AR-GAP-001Known gaps are only applicable gapsNo matching request telemetryPack records empty/unavailable telemetry summary, not fake unlinked evidence.
AEV-AR-REDACT-001Redaction preserves auditabilitySensitive tool/context outputRedacted pack keeps redaction reason, policy refs, and safe source refs.
AEV-AR-EXPORT-001Export manifest is completeEvidence export requestedManifest lists pack, replay, review, schema, redaction, and runtime scope refs.

Failure cases

These are incompatible with Agent Evidence:

  1. Replay, review, and evidence each construct different timelines for the same turn.
  2. A missing telemetry relationship is exported as a fabricated session-level evidence item.
  3. Evidence claims a tool succeeded when runtime only has a denied action or failed tool event.
  4. Redaction removes the reason and owner, making the audit trail unverifiable.
  5. Evidence pack lacks enough runtime ids to join back to the originating turn or task.

Minimum validation set

For Agent Runtime-compatible integration, run at least:

  1. AEV-AR-ID-001 and AEV-AR-ID-002.
  2. AEV-AR-TOOL-001, AEV-AR-ACTION-001, and AEV-AR-ROUTE-001.
  3. AEV-AR-REPLAY-001 and AEV-AR-REVIEW-001.
  4. AEV-AR-GAP-001 and AEV-AR-EXPORT-001.

Runtime proves what happened. Agent Evidence proves how that runtime fact can be trusted, replayed, reviewed, redacted, and exported.

Draft standard for portable agent evidence, provenance, review, and replay.