Performance metrics contract

Agent UI performance is part of the user experience contract. Clients and runtimes should record enough metrics to explain perceived slowness without exposing sensitive payloads.

Submission and first response

Metric	Meaning
`composer.submit_ms`	User action timestamp.
`listener.bound_ms`	Stream listener or event binding is ready.
`submit.accepted_ms`	Runtime accepted the turn.
`queue.wait_ms`	Time spent waiting in queue.
`runtime.start_ms`	Runtime began execution.
`provider.request_start_ms`	Provider or model request began.
`first_event_ms`	First runtime event reached client.
`first_runtime_status_ms`	First user-visible status.
`first_text_delta_ms`	First answer text delta.
`first_text_paint_ms`	First text visible to user.

These metrics separate client delay, runtime queueing, provider delay, bridge delay, and render delay.

Stream rendering

Metric	Meaning
`text_delta.queue_depth`	Number of unrendered text chunks.
`text_delta.oldest_unrendered_age_ms`	Age of oldest unrendered chunk.
`stream.render_mode`	Smooth, catch-up, paused, or fallback.
`stream.mode_transition_count`	Number of mode switches.
`stream.rapid_reentry_count`	Frequent catch-up re-entry indicator.
`stream.flush_interval_ms`	Render flush cadence.
`stream.buffer_chars`	Buffered text size.

A client can use these to decide when to switch from smooth streaming to catch-up rendering.

History and restore

Metric	Meaning
`session.click_to_shell_ms`	User opens session to shell paint.
`session.snapshot_apply_ms`	Cached snapshot apply time.
`session.detail_request_ms`	Window detail request duration.
`session.messages_hydrate_ms`	Recent messages hydration duration.
`message_list.first_stable_paint_ms`	First readable conversation paint.
`timeline.idle_hydrate_ms`	Deferred timeline completion.
`history.page_load_ms`	Older history page duration.

Resource pressure

Metric	Meaning
`tabs.active_count`	Full active sessions.
`tabs.hydrated_detail_count`	Sessions holding detailed state.
`message_lists.mounted_count`	Mounted message lists.
`timeline.items_mounted_count`	Rendered timeline items.
`artifact.preview_loaded_bytes`	Loaded artifact preview bytes.
`background.restore_count`	Concurrent restore operations.
`deferred.timeline_pending_count`	Deferred timeline jobs.

Acceptance thresholds

This standard does not mandate universal numbers. An implementation SHOULD define product-specific targets for:

first visible status
first text paint
old session shell paint
old session recent message paint
maximum mounted inactive timelines
large tool output preview threshold
artifact preview budget

Targets should be tested with representative histories and tool outputs, not only empty demo sessions.

Performance metrics contract ​

Submission and first response ​

Stream rendering ​

History and restore ​

Resource pressure ​