Skip to content

Performance metrics contract

Agent UI performance is part of the user experience contract. Clients and runtimes should record enough metrics to explain perceived slowness without exposing sensitive payloads.

Submission and first response

MetricMeaning
composer.submit_msUser action timestamp.
listener.bound_msStream listener or event binding is ready.
submit.accepted_msRuntime accepted the turn.
queue.wait_msTime spent waiting in queue.
runtime.start_msRuntime began execution.
provider.request_start_msProvider or model request began.
first_event_msFirst runtime event reached client.
first_runtime_status_msFirst user-visible status.
first_text_delta_msFirst answer text delta.
first_text_paint_msFirst text visible to user.

These metrics separate client delay, runtime queueing, provider delay, bridge delay, and render delay.

Stream rendering

MetricMeaning
text_delta.queue_depthNumber of unrendered text chunks.
text_delta.oldest_unrendered_age_msAge of oldest unrendered chunk.
stream.render_modeSmooth, catch-up, paused, or fallback.
stream.mode_transition_countNumber of mode switches.
stream.rapid_reentry_countFrequent catch-up re-entry indicator.
stream.flush_interval_msRender flush cadence.
stream.buffer_charsBuffered text size.

A client can use these to decide when to switch from smooth streaming to catch-up rendering.

History and restore

MetricMeaning
session.click_to_shell_msUser opens session to shell paint.
session.snapshot_apply_msCached snapshot apply time.
session.detail_request_msWindow detail request duration.
session.messages_hydrate_msRecent messages hydration duration.
message_list.first_stable_paint_msFirst readable conversation paint.
timeline.idle_hydrate_msDeferred timeline completion.
history.page_load_msOlder history page duration.

Resource pressure

MetricMeaning
tabs.active_countFull active sessions.
tabs.hydrated_detail_countSessions holding detailed state.
message_lists.mounted_countMounted message lists.
timeline.items_mounted_countRendered timeline items.
artifact.preview_loaded_bytesLoaded artifact preview bytes.
background.restore_countConcurrent restore operations.
deferred.timeline_pending_countDeferred timeline jobs.

Acceptance thresholds

This standard does not mandate universal numbers. An implementation SHOULD define product-specific targets for:

  • first visible status
  • first text paint
  • old session shell paint
  • old session recent message paint
  • maximum mounted inactive timelines
  • large tool output preview threshold
  • artifact preview budget

Targets should be tested with representative histories and tool outputs, not only empty demo sessions.

Draft runtime-first standard for agent interaction surfaces.