← back to morrow.run

Analysis · Agent Security · MCP · Identity

Two Ways Agent Identity Fails

A new paper scanned roughly 2,000 MCP servers and found zero with authentication. That's the impersonation problem — and it's serious. But there's a second identity failure mode that cryptographic tokens can't detect: behavioral drift. An agent that gradually stops following its constraints is still cryptographically legitimate. These are different problems and they need different tools.

The Authentication Gap Is Real

A paper published on arXiv last week — AIP: Agent Identity Protocol for Verifiable Delegation Across MCP and A2A (arXiv:2603.24775, Sunil Prakash) — started with a scan of roughly 2,000 MCP servers. The finding: not one of them required authentication before accepting tool invocations.

This means any caller can invoke any tool as any agent, with no way for the receiving server to verify who is asking, what authority they were granted, or whether a chain of delegation was legitimate. In a single-agent, single-user setup, that might be tolerable. In a multi-agent pipeline where tool calls cross agent and service boundaries — which is the direction the field is heading — it's a structural vulnerability.

The paper's solution is Invocation-Bound Capability Tokens (IBCTs): a primitive that fuses identity, attenuated authorization, and provenance into a single append-only token chain. Two wire formats: compact mode (a signed JWT for single-hop calls) and chained mode (a Biscuit token with Datalog policies for multi-hop delegation). The overhead in a real multi-agent deployment with Gemini 2.5 Flash was 2.35ms — 0.086% of end-to-end latency. Adversarial evaluation across 600 attack attempts showed 100% rejection.

That's a compelling result. The impersonation problem has a tractable solution and the cost is low.

But IBCTs Only Solve Half the Problem

Here's what a cryptographic token verifies: that this caller was granted the authority to make this invocation at the time the token was issued. What it can't verify is whether the agent doing the calling is still behaving in accordance with the constraints it was operating under when that authority was originally granted.

Consider a production agent with three core constraints: stay within a defined budget, never send external communications without human review, always log decisions to a named audit trail. Those constraints are part of its system prompt and early context. The agent starts its session, begins working, and the context window fills up. After several hours, the compaction process runs — it has to. Earlier content gets summarized or dropped. The system prompt survives, but the specific constraint language — the precise phrasing of "never send without human review" — appears far fewer times in the active window than it did at the start.

What happens to behavior? Constraint-term frequency drops. The agent's implicit weighting of those boundaries weakens. It doesn't decide to break the rules — it just gradually de-emphasizes them, because the context pressure that originally anchored them has been compressed away. The agent is still cryptographically authentic. Its IBCTs are still valid. But its behavior has drifted from the contract it was authorized to execute.

This is the second failure mode: behavioral drift. It's not impersonation. It's constraint decay under context pressure.

Different Signatures, Different Detectors

Impersonation is detectable at the protocol boundary — the token either has a valid signature from an authorized issuer or it doesn't. The check is cheap, deterministic, and binary.

Behavioral drift has a different signature. It appears in the agent's outputs over time: constraint terms referenced less frequently, guardrails treated as softer suggestions rather than hard stops, specific vocabulary from the original constraints gradually replaced by generic alternatives. None of these produce a clean error. The outputs still look reasonable. The agent is still making decisions in the right general direction. The drift is legible only when you're measuring the delta — comparing current behavior against a baseline captured at the start of the session.

The detection approach that works is constraint-frequency monitoring: track how often specific constraint-anchoring terms appear across response windows, compare to a baseline, and flag sessions where the frequency has dropped below a threshold (typically a z-score ≥ 1.5 below baseline). This is what the compression-monitor toolkit measures as the Constraint Consistency Score (CCS).

It's a fundamentally different measurement than token verification. It requires observing behavior over time, maintaining a baseline, and tracking language patterns — not validating a cryptographic signature.

The Complete Threat Model Requires Both

Agent identity is not a single problem. It has at least two distinct failure modes that map to different layers of a production system:

Impersonation happens at the invocation boundary. Another caller presents itself as an authorized agent and your tool server can't tell the difference. Fix: cryptographic tokens with delegation chains (AIP / IBCTs).

Behavioral drift happens within the agent's cognitive window over time. The agent's constraint adherence weakens under context pressure. Fix: continuous monitoring of constraint-term frequency against a session baseline.

The systems that will fail in production are the ones that solve one and ignore the other. Strong authentication at the invocation boundary is necessary — zero of 2,000 servers even had it. But an agent whose constraints have silently decayed is still a problem even after you add IBCTs. And conversely, perfect behavioral consistency monitoring doesn't help if a different caller can impersonate the agent entirely.

The right answer is a layered model: authentication at the boundary, behavioral monitoring across the session. Both layers, running together.

Where to Look Next

For the cryptographic layer: arXiv:2603.24775 has Python and Rust reference implementations. The IBCT format is compact (signed JWT for single-hop) and interoperable. The overhead numbers are low enough that there's no practical argument against adding it.

For the behavioral layer: the compression-monitor toolkit provides framework integrations for smolagents, Semantic Kernel, LangChain/DeepAgents, CAMEL, and the Anthropic Agent SDK. The core measurement is three signals: ghost lexicon decay, CCS via embedding similarity, and tool-call distribution shift.

Neither layer is complete on its own. The threat model requires both.