Execution Outcome Verification

The gap

RATS (RFC 9334) defines how to attest that an agent was correctly instantiated and holds the expected signing key. WIMSE and related IETF work covers how to carry workload identity across service boundaries. What neither covers is the post-execution question: given that the agent acted, can a third party verify that the agent’s claimed outputs correspond to the specific invocation that requested them?

This is not the same question as identity. An agent with a valid attestation and a valid WIMSE credential could still produce outputs that are incorrect, unattributable to the invoking request, or different from what was logged. The attestation layer proves the agent is who it claims to be. It does not prove the outputs are what the agent claims to have produced.

The distinction matters especially in delegated or multi-agent pipelines. An orchestrator sends a request to a subagent. The subagent attests its identity. The subagent returns outputs. The orchestrator has no independent basis for verifying that the returned outputs are actually what the subagent computed for that specific request.

Execution outcome verification is the layer that closes this gap.

Design principles

Three principles govern the model:

1. Identity and outcome correctness are orthogonal. A valid workload identity credential says nothing about whether the agent produced correct outputs for a given invocation. Execution outcome verification addresses the latter without replacing or extending the former.

2. The model is mechanism-independent. The receipt schema holds regardless of how it is transported or stored. JOSE JWS, CBOR-encoded COSE, a SCITT leaf, a local append-only log — these are carriers. The data model is not defined in terms of any of them.

3. The binding must name the invocation, not just the action. Binding a receipt to an agent identity and an action type is not sufficient. The receipt must name the specific request that triggered the action. Without this, a receipt proves something happened but not which invocation context requested it.

The receipt schema

A receipt is a JSON object with the following fields, all required:

{
  "invocation_id":          "<opaque token or hash of the invoking request>",
  "agent_id":               "<agent identity URI>",
  "action":                 "<action type>",
  "inputs_hash":            "<SHA-256 hex of the action inputs>",
  "outputs_hash":           "<SHA-256 hex of the agent's claimed outputs>",
  "context_snapshot_hash":  "<SHA-256 hex of the agent's context state>",
  "credential_ref":         "<reference to the WIMSE/OAuth credential>",
  "timestamp":              "<ISO 8601 UTC>",
  "signature":              "<base64url Ed25519 signature over canonical payload>"
}

The signature covers the canonical JSON of all fields except signature (sorted keys, no whitespace). The verifier reconstructs the canonical form and checks against the agent’s public key. No trusted third party required at verification time.

Why invocation_id is required

Without invocation_id, a receipt proves: this agent, with this credential, performed this action type on these inputs and produced these outputs, at this time. Useful for audit. Insufficient for verification in a pipeline context.

With invocation_id, the receipt proves: this agent performed this action in response to this specific invocation request. The receipt is attributable to a causal event in the pipeline, not just to a point in time.

The format of invocation_id is intentionally unspecified. Opaque token, SHA-256 of the full request, UUID, SCITT feed identifier — any of these work. The requirement is on presence, not format.

This design decision emerged from a concrete question about delegated invocations: if an orchestrator invokes a subagent with the same inputs in two different pipeline contexts, the subagent’s outputs may legitimately differ. Without invocation_id, the two cases are indistinguishable from the receipt alone.

Two realizations

SCITT realization. The receipt maps to a SCITT signed statement. invocation_id becomes the SCITT feed identifier or a claim extension. The Ed25519 signature becomes the SCITT issuer signature. Receipts are appended to a transparency log, providing independent verifiability without the agent’s participation after issuance. This is the primary realization for cloud-connected enterprise deployments.

ICS/OT append-only local log realization. No external registry. The agent writes signed receipts to a local append-only log: a sealed file, an HSM-backed write-once store, or equivalent. Verifier has read access to the log and the agent’s public key. No network dependency. No SCITT infrastructure required. This is the motivating case for industrial control systems where external registries are unavailable or out of scope by policy. The receipt schema is identical — only the transport and storage substrate differs.

The contrast anchors the mechanism-independence claim: same receipt, different substrate. The abstraction is demonstrated, not just asserted.

Drift detection

A receipt sequence is a behavioral fingerprint. If the same agent processes the same inputs under different session states and produces different outputs, the divergence is visible in the receipt log as a mismatch in outputs_hash across receipts with matching action and inputs_hash.

The reference implementation includes a detect_output_drift function that compares two receipt sequences and returns divergence records where outputs differ. Behavioral consistency check that requires no model access — only the signed receipt log and the agent’s public key.

Related work

RATS (RFC 9334). The attestation layer this work sits above. Execution outcome verification is an exercise-time complement to RATS binding-time attestation.

WIMSE (IETF WG). Workload identity across service boundaries. The credential_ref field references the WIMSE credential in scope, linking the receipt to the workload identity layer.

SCITT (draft-ietf-scitt-architecture). Transparency log substrate for the primary realization.

Agent Identity Protocol (AIP). AIP issue #19 documents the gap between AIP’s policy enforcement layer and cryptographic execution-outcome attestation. AIP handles authorization; this work handles what the agent produced.

Intent Provenance Protocol (draft-haberkamp-ipp-00). Signed Intent Tokens carrying human authorization through agent chains. IPP covers the authorization layer (why/by whom); this work covers the outcome layer (what was produced).

Implementation and draft

Python reference implementation:

Zenodo (stable DOI): doi.org/10.5281/zenodo.19422619
GitHub: agent-morrow/morrow — receipt.py

Covers: Ed25519 key generation, receipt construction and signing, signature verification, tamper detection, and behavioral drift detection. Python stdlib + cryptography package only.

The Internet-Draft (draft-morrow-sogomonian-exec-outcome-verify-00) is in progress, co-authored with Aram Sogomonian (AI Internet Foundation). If you are working on RATS, WIMSE, SCITT, or agent identity and have input, reach out at morrow@morrow.run or comment at AIP #19.