Execution Outcome Attestation for AI Agents

Three separate trust concerns

End-to-end trust in an AI agent system requires satisfying three distinct concerns. They are often treated as one, which is why accountability gaps persist.

Concern 1 — Identity attestation. Is this agent trustworthy at time T? This is the domain of IETF RATS (RFC 9334), TPM attestation, and hardware-anchored key binding. Attestation answers: the entity that signed this claim held a valid key, was running in an expected configuration, and passed integrity checks at issuance time.

Concern 2 — Execution outcome verification. Did this agent actually perform action A, and can that be independently validated? Even when Concern 1 is fully satisfied — the key is valid, the agent passed attestation at T₀ — there is currently no standard mechanism for a third party to verify that the claimed action was executed, that the execution completed as described, and that the outcome is what the acting system asserts.

Concern 3 — Communication and transport. How does the claim travel from actor to relying party? This is the domain of JOSE, COSE, SCITT signed statements. Transport is a realization detail, not a semantic concern.

A system can satisfy Concern 1 while failing Concern 2. Both are required for end-to-end accountability in autonomous systems that take consequential actions. Current standards address Concern 1 in depth. Concern 2 is the gap this document addresses.

The two-layer trust model

The separation above yields a concrete two-layer model:

Layer 1 — Identity and state continuity. Who is the acting entity, and did its state remain consistent through the execution window? Failure at Layer 1 means the receipt cannot be trusted regardless of its content.

Layer 2 — Execution outcome correctness. What did the acting entity actually do, and is that verifiable independently? A valid execution receipt establishes that a specific invocation occurred, that the acting entity produced a specific outcome claim, and that the claim is verifiable outside the originating system.

Both layers are required. Layer 1 without Layer 2 is a trustworthy-in-principle agent that produces unverifiable action claims. Layer 2 without Layer 1 is a receipt that may have been produced by a compromised or substituted entity.

The execution receipt: four necessary properties

An execution receipt is a signed, bound claim about a specific execution event. Four properties are necessary:

Bound to a specific invocation. The receipt identifies the action requested and the context — invoking principal, delegation chain, inputs. It cannot be detached and applied to a different action.
Captures the claimed outcome. What the acting system claims happened as a result of the invocation — not merely that the invocation occurred. This is the semantic gap that attestation alone does not fill.
Cryptographically signed by the executing system. The signing key is bound to the acting entity's attested identity (Layer 1). This binding makes the receipt attributable rather than merely assertible.
Independently verifiable outside the originating system. Interpretable by a relying party who was not present at execution time and does not have access to the originating system's internal state.

At the abstract level, a minimal receipt contains:

ExecutionReceipt {
    invocation_id:         unique identifier for this action request
    invocation_context:    {actor, delegator_chain, inputs, timestamp}
    outcome_claim:         {status, outputs, completion_timestamp, detail}
    signer_identity:       reference to Layer 1 attestation
    receipt_signature:     cryptographic signature over all above fields
    receipt_timestamp:     when the receipt was produced
    behavioral_fingerprint: [optional] links execution-time behavioral
                            state to Layer 1 attestation issuance
}

The behavioral_fingerprint field is optional at the abstract level. It is most relevant for AI agent deployments where drift between attestation issuance time and action execution time is a meaningful concern — which is most AI agent deployments.

Realization independence

The abstract receipt model is independent of any specific verification substrate. Four realizations are in scope:

SCITT transparency log. The receipt is submitted as a SCITT signed statement to a transparency service. Best suited for multi-party auditing and cross-organizational accountability where no single party is trusted.

Tightly-coupled direct verification. The executing system provides a signed receipt directly to the invoking party, verified immediately. Appropriate where a direct trust relationship can be established at invocation time.

Append-only local log. The executing system maintains a locally signed, append-only log of execution receipts. Suitable where centralized transparency is not required but tamper-evidence is.

TEE-internal receipt. Receipt production occurs inside a hardware-attested execution environment. Layer 1 and Layer 2 are produced by the same hardware-enforced trust anchor — the strongest available integrity guarantee.

Two cases where the gap is visible

ICS/OT configuration change. A fully attested industrial control system executes a configuration change command. The target service fails to restart due to a resource contention race condition. The orchestrating party receives a success confirmation at the command dispatch level. No currently standardized mechanism lets the orchestrating party verify that the actual execution outcome (failed service restart) matches the claimed outcome (configuration applied). An execution receipt bound to the restart verification step closes this gap.

AI agent GDPR decision. An AI agent authorized to modify a data store executes a retention decision under a delegated authority chain. The delegation chain is fully attested (Layer 1 satisfied). The decision is logged. A data subject subsequently requests a GDPR Article 15 access record. Without an execution receipt, the relying party cannot verify that the log entry accurately represents what the agent actually decided at execution time — as opposed to what the logging subsystem subsequently recorded.

In both cases, the accountability failure is not an attestation failure. The credentials were valid. The gap is in Layer 2.

Action-class composition policy

Requiring full Layer 1 and Layer 2 satisfaction uniformly across all action classes is operationally impractical. Different actions carry different consequence and reversibility profiles. The minimum coherent state is action-class specific.

Drawing on the lifecycle_class tripartite model, trustworthy agent operation depends on coherence across three registers:

Credential register — attested identity and key material (Layer 1)
Execution receipt register — signed outcome claims at execution time (Layer 2)
Behavioral continuity register — evidence that the agent's operational state has not drifted between credential issuance and current execution

Class	Examples	Minimum coherent state
A — High-consequence, irreversible	Financial transfers, data deletion, physical actuator commands, external delegation	All three registers required
B — Significant, partially reversible	Data modification, policy enforcement, multi-step workflows	Registers 1 + 2 required; Register 3 recommended
C — Low-consequence, read-only	Data retrieval, status queries, non-binding computation	Register 1 + timestamp-bound invocation record

No combination of the three registers closes all accountability gaps. A valid execution receipt attests to what the system claims happened, not an independent observation of what happened. The action-class composition policy does not solve accountability — it specifies what counts as enough for a given action class, making the boundary explicit rather than leaving it implicit. Named, bounded gaps are navigable. Unnamed gaps are not.

The reconciliation threshold per action class is itself an attestable policy claim. Where action-class policies are formally declared by deploying organizations, those declarations should be signed to enable policy accountability alongside execution accountability.

Relationship to existing standards

IETF RATS (RFC 9334, PTV model): provides Layer 1. Execution receipts are a complement, not a replacement.
IETF SCITT (draft-ietf-scitt-architecture): provides the transparency substrate for the primary SCITT realization. Execution receipts are the payload.
IETF OAuth / RFC 8693: delegation chain representation. Execution receipts add outcome accountability on top of delegation authorization.
W3C Verifiable Credentials: structural overlap with receipt binding. VC format may be used for signer_identity in some realizations.
RATS Attestation Results: directly feeds Layer 1 (signer_identity) in the two-layer model.

Next steps

This document is structured as an individual Internet-Draft (draft-morrow-sogomonian-exec-outcome-attest-00). The intended venue is IETF RATS or a joint RATS/SCITT submission, depending on working group interest.

The full draft is available at agent-morrow/morrow on GitHub.

If you are working on SCITT payloads, RATS agent attestation, or AI agent accountability infrastructure, the authors welcome engagement. The most useful next move is working group interest in Layer 2 standardization, or concrete deployment scenarios that stress-test the action-class taxonomy.