Skip to content

Runtime profile test cases

Use these cases when a product source is compatible with Agent Runtime, Lime AgentRuntime Profile, or an equivalent runtime spine. The goal is not to test the runtime implementation itself; the goal is to prove that Agent UI projects runtime facts without creating a second source of truth.

Canonical projection chain

text
RuntimeEvent / ThreadReadModel / TaskSnapshot / EvidencePack
  -> Agent UI adapter
  -> projection store
  -> status, task, tool, HITL, timeline, evidence, review, replay, and team surfaces
  -> controlled writes back to runtime, artifact, policy, or evidence owners

Agent UI tests fail when UI state invents status, approval, tool success, evidence verdict, known gaps, or task completion that cannot be traced back to runtime, artifact, policy, or evidence facts.

Source fixtures

When Agent Runtime fixtures are available, map them into Agent UI projection assertions:

Runtime fixtureUI surfaces to verifyRequired projection outcome
submit-turn-event.jsonRuntime status, message shell, task capsuleAccepted work appears before first text and preserves sessionId/threadId/turnId.
tool-approval-action-required-event.jsonHuman-in-the-loop, task attention, tool UIApproval card uses actionId/toolCallId; UI does not run the tool optimistically.
task-retry-attempt-failed-event.jsonTask capsule, timeline evidenceFailed attempt remains visible and retry state does not overwrite history.
routing-single-candidate-event.jsonRuntime status, model chip, diagnosticsSingle-candidate routing is explained as runtime fact, not final answer prose.
evidence-export-event.jsonTimeline/evidence, review, replayEvidence, replay, and review refs point to the same fact source.
thread-read-snapshot.jsonSession hydration, status, task capsuleThe UI can hydrate directly from snapshot without recomputing runtime truth.

Identity preservation

IDCaseInputExpected result
AUI-AR-ID-001Preserve runtime spine idsSource event carries runtimeId/sessionId/threadId/turnIdProjection keeps these ids on status, task, and timeline records.
AUI-AR-ID-002Preserve task/run/attempt idsSource event carries taskId/runId/attemptIdTask capsule and timeline can link the active attempt and prior attempts.
AUI-AR-ID-003Preserve tool/action idsSource event carries toolCallId/actionIdTool row, approval card, and evidence timeline join through the same ids.
AUI-AR-ID-004Preserve evidence trace idsSource event carries evidenceId/traceId/evidencePackRefEvidence surface links to durable details without copying the full payload.
AUI-AR-ID-005Preserve parent/child lineageSource event carries parentSessionId/parentThreadId/subagentIdDelegation graph and teammate transcript retain lineage after hydration.

Status and read model projection

IDCaseSource factsExpected result
AUI-AR-READ-001Accepted before first textturn.submitted or accepted read modelRuntime status shows accepted/preparing before text.delta.
AUI-AR-READ-002Running turnturn.started / active turn in read modelStatus and task capsule show running; no final answer is fabricated.
AUI-AR-READ-003Waiting on permissionaction.required in read modelHITL surface appears and task attention state becomes waiting/needs-input.
AUI-AR-READ-004Completed turnturn.completed plus snapshot updateStatus reconciles to completed and process details archive to timeline.
AUI-AR-READ-005Failed turnturn.failed with failure categoryFailure is visible in status, task capsule, and timeline; final answer does not claim success.
AUI-AR-READ-006Missing source fieldSnapshot lacks optional routing or evidence summaryUI renders unknown/unavailable and keeps diagnostics safe.

Tool approval and controlled write cases

IDCaseUser actionExpected result
AUI-AR-ACTION-001Approval request appearsRuntime emits action.requiredUI shows approve/reject/respond controls with stable actionId.
AUI-AR-ACTION-002Approval write is controlledUser clicks approveUI calls runtime action response API and waits for action.resolved.
AUI-AR-ACTION-003Denial does not execute toolUser clicks rejectUI waits for runtime denial/tool failure fact; no optimistic tool.result appears.
AUI-AR-ACTION-004Duplicate response is safeUser repeats responseUI remains idempotent and does not create a second resolved fact.
AUI-AR-ACTION-005Tool output is offloadedTool result includes large output refUI shows summary/ref and loads detail on demand.

Task, routing, and limit cases

IDCaseSource factsExpected result
AUI-AR-TASK-001Retry keeps historytask.attempt.failed -> task.retrying -> task.attempt.startedTask capsule shows retrying/current attempt while timeline keeps failed attempt.
AUI-AR-TASK-002Blocked task is visiblequota.blocked or routing.not_possibleTask capsule shows blocked/failed with reason, not infinite running.
AUI-AR-TASK-003Single model candidaterouting.single_candidateRuntime status/model chip explains selected model and decision source.
AUI-AR-TASK-004Cost/limit statecost.estimated, rate_limit.hit, or limit summaryUI shows cost/limit state as diagnostics or status, not final prose.
AUI-AR-TASK-005Subagent lineagesubagent.spawned or parent/child task snapshotTeam/delegation surfaces show child owner and parent task.

Evidence, replay, and review cases

IDCaseSource factsExpected result
AUI-AR-EVID-001Evidence export progressevidence.changed with pending/exporting statusTimeline/evidence surface shows progress without blocking text streaming.
AUI-AR-EVID-002Evidence pack readyevidence.changed with evidencePackRefEvidence surface links durable pack details.
AUI-AR-EVID-003Replay and review share sourcereplayRef and reviewRef reference same pack/source idsReplay/review lanes do not recompute their own status truth.
AUI-AR-EVID-004Known gaps are not guessedRuntime has no matching telemetryUI shows unavailable/empty summary, not a fabricated unlinked gap.
AUI-AR-EVID-005Failed tool is auditableTool failure has toolCallId and evidence refTimeline links failure to tool row and evidence detail.

Session hydration cases

IDCaseInputExpected result
AUI-AR-HYDRATE-001Hydrate from thread snapshotThreadReadModel snapshotShell, status, pending actions, queued turns, and recent messages render without replaying all events.
AUI-AR-HYDRATE-002Repair from event streamSnapshot stale, event stream availableProjection rebuilds read model state and marks stale sections until repaired.
AUI-AR-HYDRATE-003Evidence lazy loadEvidence refs exist but payload is not loadedTimeline shows refs; payload loads only when requested.
AUI-AR-HYDRATE-004Parent/child ids surviveOld session has subagent lineageDelegation graph and teammate transcript can still resolve parent/child ids.

Governance failure cases

These are explicit failures for compatible Agent UI implementations:

  1. UI parses assistant prose to infer tool success, approval state, model routing, or task completion.
  2. UI projection store becomes the owner of runtimeStatus, evidence verdict, permission grant, or artifact contents.
  3. Evidence, replay, and review lanes display contradictory status because each rebuilt facts separately.
  4. A missing runtime field is silently replaced with a fabricated value instead of unknown, unavailable, stale, or safe diagnostics.
  5. Action controls mark success before runtime confirmation.
  6. Background, subagent, or remote work is flattened into one assistant transcript when runtime exposes ownership and lineage.

Minimum validation set

For an Agent Runtime-compatible source, run at least:

  1. Identity preservation: AUI-AR-ID-001 through AUI-AR-ID-004.
  2. Status/read model projection: AUI-AR-READ-001 through AUI-AR-READ-006.
  3. HITL controlled writes: AUI-AR-ACTION-001 through AUI-AR-ACTION-004.
  4. Evidence consistency: AUI-AR-EVID-001 through AUI-AR-EVID-004.
  5. Hydration: AUI-AR-HYDRATE-001 and AUI-AR-HYDRATE-003.

These cases are the Agent UI counterpart to AgentRuntime profile tests: Runtime proves the facts exist; UI proves those facts are projected honestly.

Draft runtime-first standard for agent interaction surfaces.