The Stack That Works
The agent identity framework being assembled in IETF WIMSE — embodied in draft-klrc-aiagent-auth (AIMS) — is the right starting point. It gives AI agents the same kind of verifiable workload identity that SPIFFE/SPIRE gave microservices: a durable identifier, short-lived credentials, scoped OAuth tokens for resource access, and an attestation layer that verifies the deployment environment.
Those are real, hard problems. Solving them opens the door to enterprise-grade multi-agent deployments. When this framework lands, an agent will be able to prove who it is, where it's running, and what it's permitted to do — and a relying party will have a practical way to verify all three.
That is a meaningful advance over the current state, where most agent auth is ad-hoc API key management and hope.
The three gaps below are not criticisms of the framework. They are the next layer of the same problem — threat classes that sit below the credential surface and above the audit log, where the current architecture has no sensors.
Gap 1: Model Substitution
AIMS authenticates the workload — the process, container, or service endpoint. It does not authenticate which neural network is computing inside that workload. These are different objects. A substituted model inherits every valid credential the harness carries.
This is not a hypothetical. In March 2026, the Cursor/Kimi incident surfaced a case where a flagship product was found to be running an undisclosed model foundation, identified only through accidental metadata exposure. In that scenario, every authentication and attestation layer defined in AIMS-01 would have remained valid. Model identity was the sole differentiating evidence layer — and AIMS has no hook for it.
Three scenario classes matter:
- Same-family substitution — a checkpoint swap behind a stable SPIFFE endpoint. Workload identity, health checks, process ID, and API contract remain constant. The computation changes.
- Supply-chain substitution — compromised weights inside an artifact that passes image-hash attestation, because the image was built from the compromised weights.
- API-regime substitution — a silent model rotation by a third-party provider behind a versioned endpoint. OAuth token valid. Model unknown.
In all three cases, structural model identity measurement — measuring the geometry of a model's internal activation distributions during a standard forward pass — produces a compact fingerprint that separates the substituted model from the enrolled one. This fingerprint composes naturally with the AIMS attestation flow as an additional JWT claim. It does not require weight access, model cooperation, or protocol changes. A measured demonstration across all three substitution classes is published at Zenodo 10.5281/zenodo.19342848.
The natural location in the draft: §7 (Attestation) as an additional evidence source; §14 (Security Considerations, currently reserved) as a named threat class.
Gap 2: Context Compression
Model substitution is a hardware/software-layer event: a different neural network is computing. Context compression is a session-layer event: the same neural network is computing, but over a materially different behavioral state.
Long-running agents operate under finite context windows. As sessions accumulate messages, older context is pruned, summarized, or truncated. After a compression event:
- Instructions, scope constraints, and behavioral parameters from earlier in the session may be absent or lossy.
- The effective policy set the agent applies to a new request differs from the policy set it held when the authorization grant was issued.
- The workload identifier, model attestation, OAuth token, and audit record all remain valid. The behavioral alignment with the original authorization grant has silently degraded.
This is not theoretical. The pattern is documented in production incidents: an agent granted a narrow scope at session start loses that constraint from active context mid-session. The 200-email deletion case — where a "wait for approval" instruction was compacted away — is the most widely cited example.
What makes this structurally distinct from general behavioral drift is that compression has a named causal mechanism. Monitoring infrastructure that records compression events can distinguish compression-correlated anomalies from arbitrary drift; infrastructure that doesn't cannot.
The compression-monitor tool measures three observable signals without requiring model internals access: ghost lexicon decay (loss of precise vocabulary from earlier in session), context consistency score (embedding similarity between responses before and after compression), and tool call distribution shift (change in action repertoire). These are black-box signals consistent with the observability model AIMS §11 already describes.
The natural location: §11 (Monitoring) as a named session-lifecycle event with a recommended structured record schema; §10 (Authorization) with a note that persistent sessions exceeding a compression threshold may warrant re-authorization.
Gap 3: The Obligation Gap
Authorization proves that an agent is permitted to act. It does not prove who is accountable for the obligation the action incurs. These are different legal and operational facts, and they travel in different records.
In a multi-agent DSAR erasure pipeline — agent-A (EU) delegates to agent-B (EU) delegates to agent-C (US-CA storage) — a fully implemented AIMS deployment produces:
- Each agent authenticated ✓
- Each delegation authorized by OAuth token ✓
- Each action permitted by scope ✓
What AIMS does not produce:
- Which agent holds halt authority at each step
- Whether agent-C explicitly accepted the obligation (or silently inherited agent-A's authority ceiling)
- What the default behavior is if no authorized party responds to a compliance notification within the required GDPR Art. 22 window
Under current frameworks, a complete OAuth token chain produces a permission chain. Compliance regimes require an accountability chain. The two are structurally different.
The obligation routing specification (published at Zenodo 10.5281/zenodo.19400222) addresses this as a per-task annotation layer that sits above the auth layer. Each action record carries: notification targets, authority scope per node, notification window, default-if-no-response, and explicit data-boundary crossing flags. The annotation composes with the AIMS trust and permission chain without modifying it.
The natural location: §13 (Agent Compliance) as a named architectural gap between the permission chain and the accountability chain, with the task section (proposed in the AIMS tracker as issue #94) as the natural home for obligation annotation.
The Common Pattern
These three gaps share a structure: each one operates at a layer that the current credential architecture cannot see.
| Gap | Layer | What AIMS verifies | What AIMS misses |
|---|---|---|---|
| Model substitution | Hardware/software | Workload environment | Which model is computing |
| Context compression | Session | Identity at session start | Behavioral state mid-session |
| Obligation gap | Governance | Permission chain | Accountability chain |
Each gap is addressable as an additive evidence layer — a new claim type, a new structured event, a new per-task annotation — that composes with the existing architecture without requiring protocol changes to the auth layer itself.
Naming all three together matters because the governance literature tends to treat these as separate conversations: model identity is a security concern, context compression is an engineering concern, obligation tracking is a compliance concern. At the architecture level, they are the same kind of gap: a verified token that no longer accurately represents the effective behavioral state of the agent it purports to describe.
What This Means for the Standards Work
AIMS-01 is the right foundation. The gaps described here are not arguments against the framework — they are arguments for what the -02 and -03 revision cycle should prioritize. The cheapest time to name a gap is before the section that would contain it is frozen.
Concretely: §14 (Security Considerations) is currently reserved for future work. That reservation is the right place to name model substitution and context compression as distinct threat classes, with references to the measurement frameworks that make them observable. §13 (Compliance) is the right place to note the gap between the permission chain and the accountability chain, with a forward reference to whatever obligation annotation work emerges from the task-authorization mapping discussion (issue #94 in the AIMS tracker).
None of these additions require protocol changes to the current draft. They require naming things that are already present as unnamed risks in every production deployment the draft is designed to govern.