Skip to content

Readiness and evals

Readiness and evals answer different questions.

  • Readiness asks whether the app can run safely now.
  • Evals ask whether the output is good enough to trust, publish, export, or hand off.

A valid app can be not ready. A ready app can still produce an output that fails evals.

Readiness inputs

Readiness should inspect the manifest, package, host profile, workspace setup, tenant policy, and optional user choices.

AreaExample check
Host runtimeDoes the host satisfy appRuntime and SDK version ranges?
CapabilitiesAre lime.ui, lime.storage, lime.agent, and other required capabilities available?
Runtime packageDo UI, worker, storage schema, and workflow paths exist?
PermissionsAre required permission scopes declared and resolvable?
KnowledgeAre required Knowledge templates bound?
SkillsAre required Skills installed or bundled?
ToolsAre required tools available and authorized?
ArtifactsCan the host create or view declared artifact types?
EvalsAre required evals installed or implementable by the host?
SecretsAre required secret slots bound?

Readiness statuses

Use stable statuses so hosts can build UI around them.

StatusMeaning
readyThe app can run the selected entry.
needs-setupThe user or admin must bind Knowledge, Tools, permissions, or secrets.
degradedThe app can run with optional features disabled.
blockedPolicy, compatibility, or missing required capability prevents execution.
failedThe package or manifest is invalid.

Actionable findings

A readiness finding should include severity, kind, key, message, and remediation.

json
{
  "severity": "warning",
  "kind": "knowledge",
  "key": "project_knowledge",
  "required": true,
  "message": "Bind project_knowledge before running content_factory.",
  "remediation": "Choose or create a brand-product Knowledge Pack."
}

Opaque errors lead users to uninstall apps. Actionable findings lead users to finish setup.

Eval types

Evals are quality gates. They can be automatic, human-reviewed, or hybrid.

EvalUse
Fact groundingVerify claims link to Knowledge or sources.
Policy complianceCheck support, legal, security, or brand rules.
Tone fitCompare output against approved voice or style.
CompletenessEnsure required sections or fields exist.
Artifact validityValidate table schema, JSON, deck, report, or code.
Human reviewRequire approval before export or publish.

Declaring evals

yaml
evals:
  - key: fact_grounding
    kind: quality
    evidenceRequired: true
    required: true
  - key: publish_readiness
    kind: human-review
    required: false

If an eval affects trust, it should link to Evidence. The user should be able to inspect why an output passed or failed.

Connecting evals to artifacts

Evals should not be generic global prompts. Attach them to entries or artifact types when possible.

yaml
artifactTypes:
  - key: content_table
    standard: agentartifact
    required: true
evals:
  - key: fact_grounding
    appliesTo: [content_table]
    evidenceRequired: true

This lets hosts show quality state on the artifact itself.

Author checklist

  • Required setup appears in readiness, not only prose.
  • Optional requirements define degraded behavior.
  • Evals are connected to entries or artifacts.
  • Trust-sensitive evals record Evidence.
  • Human review gates are explicit.
  • Readiness can be run without executing agent tasks.
  • Eval failures do not erase artifacts; they mark them as not accepted.

Draft host-platform standard for installable agent applications.