Skip to content

Specification

Agent Runtime latest draft is a portable standard draft for agent execution. The core contract is the boundary between execution facts and consumers such as UI, replay, review, telemetry, workflow, and remote channels.

Agent Runtime owns execution facts. It does not own the visual surface, provider API, external tool protocol, artifact bytes, evidence verdict, memory source, or host account model.

Scope

Agent Runtime standardizes these implementation concerns:

  1. Runtime identity and correlation ids.
  2. Event classes and event envelope fields.
  3. Control plane actions and required write boundaries.
  4. Durable snapshots and read models.
  5. Tool/context/model/policy orchestration facts.
  6. Human-in-the-loop requests and queue/resume semantics.
  7. Evidence, replay, and observability export boundaries.
  8. Permission, sandbox, hooks, process execution, remote channel recovery, and peer task mapping.
  9. Model routing, candidate sets, cost, quota, rate limit, and budget facts.
  10. Agent task lifecycle, attempts, task graphs, subagent graphs, background jobs, large output storage, and session reconstruction.

Agent Runtime does not standardize a UI component model, model provider protocol, tool registry format, workflow language, vector store, artifact format, or observability backend.

Pressure From Real Runtimes

Agent Runtime is not a wrapper around chat streaming. Real implementations show ten facts that must be first-class:

  1. Tool calls have schema, progress, partial output, permission gates, hooks, result refs, and failure categories.
  2. Command execution has cwd, sandbox, network, stdin/stdout, exit code, output buffers, and long-running process state.
  3. Permission decisions come from modes, rules, hooks, classifiers, humans, and host policy. Deny/ask rules must be able to override automatic allow.
  4. Hooks are governance points and must write runtime facts, not create a side execution path.
  5. Context compaction, rollback, and reconstruction need explicit boundaries.
  6. Subagents need a parent-child graph, isolation, status, and recoverable child threads.
  7. Tasks need objective, owner, status, attempts, dependencies, progress, output refs, and delivery state; todo lists are not enough.
  8. Jobs need item status, attempts, assignment, and progress.
  9. Remote channels need identity, native peer ids, resume cursors, permission bridges, and disconnect semantics.
  10. Model routing needs task profiles, candidate sets, decisions, fallback, single-candidate, and no-candidate facts.
  11. Cost, quota, rate limits, request telemetry, and evidence must join through stable correlation ids.

Execution architecture

The runtime may keep internal provider-native records, but external consumers SHOULD receive normalized runtime events and snapshots.

Required identity model

IdentityMeaningRequired relationship
runtime_idRuntime installation or service instance.Stable enough for trace attribution.
session_idDurable user-visible work container.Owns one or more threads.
thread_idOrdered execution context.Belongs to one session.
turn_idOne submitted input cycle.Belongs to one thread.
task_idUnit of work with objective, lifecycle, attempts, relationships, and acceptance.Belongs to a session, thread, or parent task.
run_id / attempt_idOne execution attempt for a task.Belongs to one task and may bind a thread, worker, or job item.
step_idOrdered runtime item, such as status, message, tool, artifact, or action.Belongs to a turn, task, or run.
tool_call_idOne tool invocation.Belongs to a step and may have result refs.
action_idOne pending human or policy decision.Belongs to a turn, task, or tool call.
subagent_idChild agent execution context.Has parent session/thread/turn links.
artifact_idDurable deliverable reference.Owned by artifact service; referenced by runtime.
evidence_idTrace, replay, verification, or review reference.Owned by evidence system; referenced by runtime.

A compatible implementation MUST NOT rely on a single message id to represent all runtime work.

Event envelope

Every emitted event SHOULD include:

FieldRequirement
typeRequired event class.
event_idRequired unique event id.
timestampRequired producer timestamp.
sequenceMonotonic within a stream when possible.
schema_versionRuntime event schema version.
session_id, thread_id, turn_idPresent whenever the event belongs to a thread or turn.
task_id, run_id, attempt_id, step_id, tool_call_id, action_id, subagent_idPresent when applicable.
trace_id, span_idPresent when telemetry is available.
payloadTyped event payload.
refsStable references to large or owned external facts.

Large tool outputs, artifacts, evidence packs, and raw provider payloads SHOULD be referenced, not copied into every event.

Standard event classes

ClassPurpose
session.created / session.updatedSession metadata changed.
thread.started / thread.updatedThread lifecycle or read-model relevant state changed.
turn.submitted / turn.started / turn.completed / turn.failedUser or system turn lifecycle.
task.created / task.accepted / task.queued / task.started / task.updated / task.progress / task.waiting / task.blocked / task.paused / task.resumed / task.retrying / task.cancel_requested / task.cancelled / task.timed_out / task.failed / task.lost / task.completed / task.archivedAgent task lifecycle, progress, waiting, retry, cancellation, loss, and terminal state.
run.statusHuman-readable runtime status with phase, title, detail, checkpoints, and metadata.
model.requested / model.delta / model.completed / model.failedProvider adapter lifecycle and text/structured output stream.
reasoning.delta / reasoning.summaryReasoning or planning stream outside final text.
tool.catalog.resolvedTool inventory or capability surface was selected for the turn.
tool.started / tool.args / tool.progress / tool.result / tool.failedTool invocation lifecycle.
action.required / action.resolvedRuntime paused for user, policy, or structured input decision.
queue.changedQueued turns changed order, state, or policy.
context.resolvedContext, memory, knowledge, source, or policy refs selected for a turn.
context.compaction.started / context.compaction.completed / context.compaction.failedContext compaction boundary lifecycle.
artifact.changedRuntime observed or produced an artifact reference.
evidence.changedRuntime observed or exported evidence/replay/review reference.
subagent.spawned / subagent.status / subagent.input / subagent.completed / subagent.failed / subagent.closedChild agent coordination.
limit.changedCost, quota, rate limit, budget, or policy limit changed.
snapshot.updatedDurable snapshot or read model changed.
runtime.warning / runtime.errorNon-fatal warning or fatal runtime error.

Implementations may add vendor-specific event types, but MUST keep the normalized classes available for portable consumers.

Expanded Event Families

Real coding, desktop, and remote runtimes SHOULD also expose these event families:

FamilyEventsPurpose
Permissionpermission.evaluated / permission.requested / permission.resolvedRecord how rules, modes, hooks, classifiers, humans, or host policy decided.
Sandboxsandbox.applied / sandbox.violationRecord actual execution boundaries and violations.
Hook / policyhook.started / hook.completed / hook.failed / policy.changedRecord governance inputs, outcomes, duration, and failure behavior.
Processprocess.started / process.output / process.input / process.completed / process.failed / process.terminatedRecord commands, PTY sessions, long-running processes, and output refs.
Routingtask.profile.resolved / routing.candidates.resolved / routing.decided / routing.fallback.applied / routing.not_possible / routing.single_candidateExplain model candidates, selection, fallback, blocking, and single-candidate paths.
Task orchestrationtask.delegated / task.dependency.updated / task.attempt.started / task.attempt.completed / task.attempt.failedRecord task graph edges, delegation, dependencies, and per-attempt execution history.
Cost / limitscost.estimated / cost.recorded / rate_limit.hit / quota.low / quota.blockedMake cost, limits, and quota runtime facts.
Channelchannel.connected / channel.disconnected / channel.resumed / channel.message / channel.permission_forwarded / channel.permission_returnedRecord remote channels, recovery, and cross-channel approval.
Jobsjob.created / job.started / job.progress / job.item.started / job.item.completed / job.item.failed / job.completed / job.failed / job.cancelledRecord batch and background work.
Outputoutput.spilled / output.truncated / output.redacted / output.expiredManage large output and auditable references.
Historyhistory.window.loaded / history.reconstructed / history.rollback.started / history.rollback.completed / snapshot.repairedRecover old sessions, compaction, and rollback.

Control plane

A compatible runtime SHOULD expose these commands, regardless of transport:

CommandRequired inputResult
submit_turnsession_id, thread_id or create policy, input parts, options, metadata.Accepted turn or queued turn.
interrupt_turnsession_id, optional thread_id / turn_id, reason.Interrupt accepted or no-op.
resume_threadsession_id, thread_id, optional resume token.Resume attempt result.
create_task / update_task / start_task / append_task_progressTask objective, scope, profile, constraints, assignee, or progress refs.Task lifecycle and progress events.
pause_task / resume_task / cancel_task / retry_tasktask_id, reason, optional propagation policy.Task pause, resume, cancellation, or new attempt facts.
complete_task / fail_task / list_tasks / get_taskTask scope or terminal facts.Durable task read model or terminal reconciliation events.
link_tasks / unlink_tasksParent, child, dependency, source, artifact, evidence, or subagent edge.Task graph update event.
respond_actionaction_id, decision, optional structured payload.Action resolved event.
remove_queued_turn / promote_queued_turnqueued_turn_id, target session/thread.Queue changed event.
get_sessionsession_id, history window or cursor.Durable session snapshot.
get_thread_readsession_id, thread_id.Thread read model.
get_tool_inventoryScope, caller, policy, runtime mode.Tool inventory snapshot.
spawn_subagent / send_subagent_input / wait_subagents / resume_subagent / close_subagentParent ids and child control payload.Subagent lifecycle facts.
export_evidence / export_replaySession/thread/turn/task scope.Stable evidence or replay refs.
evaluate_permission / resolve_permissionTool/process/action scope and decision payload.Permission evaluated/resolved event.
get_execution_environmentSession/thread/turn scope.Environment snapshot.
write_process_stdin / terminate_processprocess_id, input, or reason.Process input / terminated event.
list_subagents / list_jobs / get_job / cancel_jobSession/thread/job scope.Subagent graph or job snapshot.
reconnect_channel / ack_eventsChannel id, cursor, resume token.Channel resumed or snapshot repair.
export_reviewSession/thread/turn/task scope.Review refs.

Commands that mutate state MUST write through the runtime or owning adjacent system. UI-only state cannot mutate runtime truth.

Durable snapshots and read models

The event stream is necessary but not enough. A compatible runtime SHOULD maintain:

  • session_snapshot: shell, title, timestamps, threads, recent messages or steps, history cursor.
  • thread_read_model: current status, active turn, pending requests, last outcome, incidents, queued turns, diagnostics.
  • tool_inventory_snapshot: tools available for the current caller, policy, context, and mode.
  • queue_snapshot: queued turn ids, order, source, policy, and resume state.
  • task_snapshot: active, waiting, failed, lost, recent terminal tasks, task graph, current attempts, and delivery state.
  • context_boundary_snapshot: selected refs, compaction summaries, context warnings, missing facts.
  • artifact_checkpoint_summary: artifact refs, versions, previews, validation issue counts, diff refs.
  • evidence_summary: trace ids, verification outcomes, replay refs, review refs, audit notes.
  • permission_sandbox_summary: permission state, pending approvals, sandbox profile, and violation refs.
  • execution_environment_snapshot: cwd, workspace roots, env refs, process limits, and active processes.
  • routing_limit_summary: task profile, candidate count, routing decision, cost state, quota/rate-limit state.
  • subagent_job_summary: child graph, job progress, assigned threads, and recoverability.
  • channel_summary: remote peers, resume cursors, last acknowledged sequence, and permission bridge state.

Read models may be compact. They must be honest: unknown, unavailable, stale, and blocked are better than inferred success.

Completion and failure semantics

A runtime SHOULD distinguish:

  • accepted: runtime received the request.
  • queued: work is waiting behind another turn or policy gate.
  • preparing: context, model, tools, or policy are being resolved.
  • running: the execution loop is active.
  • waiting_input: user or external structured input is required.
  • waiting_permission: human, policy, or host approval is required.
  • waiting_resource: credential, quota, file, network, worker, or external system is unavailable.
  • blocked: an action, credential, policy, context, dependency, tool, or quota is missing.
  • streaming: model or tool output is being emitted.
  • retrying: retry or fallback is active.
  • lost: runtime cannot prove whether the worker is still alive.
  • timed_out: time or inactivity budget stopped the work.
  • completed: owner declared work complete and durable facts are reconciled.
  • failed: work cannot continue without a new request or repair.
  • cancelled: user, policy, or runtime interrupted the work.
  • stale: known snapshot may not reflect current execution.

success from a provider or tool is not the same as completed agent work. Completion must be tied to runtime state and, when required, artifact or evidence facts.

Validation

A validator SHOULD check behavior, not only file presence:

  • Events contain stable ids and can be replayed into a read model.
  • Provider streams map to normalized model/text/reasoning events.
  • Tool calls preserve input refs, result refs, errors, and policy decisions.
  • Human actions pause execution and resume only through respond_action.
  • Queue mutations survive restart and emit queue.changed.
  • Task lifecycle survives restart, keeps prior attempts, and can recover parent/child and dependency edges.
  • Old sessions hydrate through snapshots and cursor windows.
  • Evidence/replay exports derive from the same runtime facts as UI and diagnostics.
  • Missing facts are marked unknown, unavailable, stale, or blocked instead of inferred from prose.

Draft standard for portable agent execution runtimes.