Skip to content

Gate matrix

The gate matrix maps Agent project profiles, surfaces, and risk changes to validation gates. It defines the minimum evidence needed before a report can claim a pass.

Gate names are families, not framework commands. A project maps each family to local scripts, CI jobs, qcloop items, or review workflows.

Profile defaults

ProfileMinimum gate familiesOptional escalation gates
agent-runtime-clistatic, unit, contract-protocol, runtime-e2eproperty-fuzz, stress-concurrency, live-provider, distribution-release
agent-sdk-apistatic, unit, contract-protocol, fake-integrationdistribution-release, live-provider, semantic-eval
agent-tool-mcp-gatewaycontract-protocol, fake-integration, runtime-e2estress-concurrency, live-provider, review, property-fuzz
multi-channel-agent-gatewaystatic, unit, contract-protocol, fake-integrationlive-provider, distribution-release, semantic-eval, stress-concurrency
agent-ui-tui-desktopstatic, unit, ui-interactionruntime-e2e, contract-protocol, live-provider, review, stress-concurrency
agent-skills-pluginsstatic, contract-protocol, fake-integrationdistribution-release, review, semantic-eval, live-provider
background-agent-schedulerunit, fake-integration, stress-concurrencyruntime-e2e, live-provider, review, distribution-release
agent-distribution-releasestatic, distribution-releaseruntime-e2e, live-provider, review, stress-concurrency
agent-evals-qualitysemantic-eval, reviewlive-provider, stress-concurrency, distribution-release

Surface add-ons

If a case names a surface, add surface evidence on top of the profile default.

SurfaceMinimum add-onStronger proof
cli-streamcommand log, exit status, stdout/stderr transcriptstructured event assertion, malformed stream fixture, cleanup proof
tuiterminal snapshot, viewport, key sequencemulti-viewport, ANSI/Unicode, interrupt, approval, runtime transcript
webuiscreenshot or browser trace, console logPlaywright trace, a11y/DOM snapshot, reload/resume, network log
desktop-guishell start, bridge health, screenshotworkspace readiness, native command contract, OS matrix, trace
browser-automationscreenshot and DOM/a11y snapshotconsole/network, SSRF/navigation safety, orphan cleanup, trace/video
channel-uiwebhook/channel transcript, auth proofmedia fixture, replay, device/emulator log, live opt-in lane
eval-uirubric, judge output, report exportbaseline delta, reviewer annotation, failing examples, dashboard screenshot

Change-risk escalation

Escalate gates when the change touches:

Risk touchedAdd gates
permission, sandbox, credential, or secret handlingcontract-protocol, runtime-e2e, review; add property-fuzz for path/parser boundaries
protocol, schema, generated client, command, or manifest shapecontract-protocol, fake-integration, generated artifact drift check
persistent state, migration, queue, or schedulerunit, runtime-e2e, stress-concurrency, recovery evidence
user-visible GUI/TUI/WebUI/desktop behaviorui-interaction, surface evidence, stable regression
browser automation or remote browser providerbrowser-automation surface proof, cleanup, console/network, safety fixtures
webhook, chat channel, mobile, QR, or media flowchannel-ui, auth/media replay, redaction, optional live-provider
package/install/release metadatadistribution-release, clean install, manifest, version/lock consistency
live provider, external network API, or model backendexplicit live-provider, credential scope, budget, redaction
model prompt, rubric, eval, or judge behaviorsemantic-eval, review, baseline delta, examples
multi-agent, subagent, background, or remote teammate workruntime-e2e, stress-concurrency, surface/task evidence

Minimal and strong gates

ClaimMinimal gateStronger gate
"Runtime command works"command log and exit statusfake provider transcript, structured events, cleanup proof
"Tool/MCP bridge works"schema/contract checkfake server recovery, permission denial, stdio/http disconnect
"TUI approval works"terminal snapshotkey sequence, runtime action request/response transcript, cancel/reconnect variants
"WebUI flow works"component assertionbrowser trace, console/network, a11y, reload/resume
"Desktop app works"shell startbridge health, workspace readiness, native command contract, screenshot
"Browser automation works"screenshotDOM/a11y, console/network, cleanup, safety fixtures
"Channel adapter works"contract fixturewebhook replay, media, redaction, live opt-in
"Scheduler works"deterministic unitrestart/reclaim, duplicate-work proof, race/stress
"Package is releasable"build outputclean install, package manifest, Docker/OS matrix, supply-chain
"Model quality improved"one rubric passbaseline delta, judge output, human review, failing examples

Evidence minimums

  • static gates need command logs, CI URLs, or SARIF-style reports.
  • contract-protocol gates need schema/contract reports, transcript refs, or failing ids.
  • runtime-e2e gates need CLI/runtime transcripts, state snapshots, or process-cleanup proof.
  • ui-interaction gates need stable assertions plus screenshots, traces, videos, terminal snapshots, or accessibility output.
  • live-provider gates need redacted request/response refs, credential scope, and budget/cost notes.
  • distribution-release gates need package manifests, install output, Docker smoke, or OS matrix proof.
  • semantic-eval gates need rubric, model/judge outputs, baseline delta, and waiver threshold.

Framework mapping examples

EcosystemGate mapping
Rust/Codex-likecargo nextest, targeted crate tests, Bazel test/build, schema fixture writers, fake model server, ratatui snapshots
JS/OpenClaw-likeVitest projects, changed-test router, contract configs, live configs, Docker smoke, QA Lab report lanes
Python/Hermes-likepytest markers, xdist, integration exclusion by default, credential blanking, e2e directory, ruff/ty
Desktop GUI / native bridgelocal verify, command contracts, bridge health, GUI smoke, Playwright continuation, native backend tests

Anti-patterns

Anti-patternWhy it fails
One npm test checkbox for all profileshides surface/live/release risk
Screenshot-only UI passno runtime backing
Contract-only tool passno runtime recovery proof
Live provider in default unit laneflaky and unsafe by default
Release build without install smokepackage may be unusable
Waiver with no owner/expiryunbounded risk

Draft standard for evidence-driven quality control of Agent projects.