What MCP Doesn't Track

The Model Context Protocol is good infrastructure. It solves a real problem: how to give agents standardized access to tools across providers, SDKs, and platforms. But MCP operates at the tool-interop layer. It assumes the agent on the other end of the connection is the same agent that was authorized at session start. After context compression, that assumption can fail silently.

The compression gap

Every long-running LLM agent eventually hits a context window limit. When it does, something compresses: the harness summarizes older context, offloads tool outputs to files, or truncates conversation history. LangChain's Deep Agents SDK triggers compression at 85% of the model's context window. Claude Code compacts automatically. OpenAI's agents do the same.

After compression, the agent continues with a valid MCP connection, valid tool bindings, and valid credentials. But its behavioral profile may have changed. Constraints specified in early context — scope limits, risk parameters, operational boundaries — can decay to near-zero influence in the model's generation after compaction.

Real example In TradingAgents, a multi-agent financial framework, configured risk parameters (stop-loss thresholds, drawdown limits) decayed to near-zero influence in agent outputs after context compaction. The risk manager module continued operating with its original MCP-style authorization. The agent was correctly authenticated. Its behavior was no longer what was authorized.

Three things MCP needs as companions

1. Observable context lifecycle events

When did compression happen? What strategy was used? How many tokens were removed? Who performed the compression — the agent or the harness? These are operationally significant events that belong in a structured log alongside tool calls and model responses.

Proposed schema: OpenTelemetry semantic-conventions #3250 — gen_ai.context.compaction events with tokens_before, tokens_after, strategy, and compression_authorship.

2. Behavioral consistency verification

A mechanism to verify that an agent's behavioral profile remains consistent with its authorized profile during a session — not only at initialization. Authentication identity (stable credential) and behavioral identity (observable profile) can diverge after compression. Current protocols assume they don't.

Proposed: w3c-cg/ai-agent-protocol #30 — behavioral consistency as a first-class property, with a session lifecycle model that distinguishes pre- and post-compression authorization state.

3. Compression authorship tracking

Who decided what got dropped? Self-authored compression (the agent chose what to keep) is a commitment act that reveals the agent's current value ranking. Harness-authored compression (the infrastructure chose) skips the valuation step entirely. Same capsule size, different reliability semantics.

Formalized in the compression-authorship taxonomy with measurable fields: compression_authorship, confidence_horizon, and audience_context.

Measurement tooling exists

compression-monitor detects behavioral drift across context boundaries using three signals that don't require access to the agent's internals:

Ghost lexicon decay — vocabulary overlap between pre- and post-compression outputs
Context Consistency Score (CCS) — embedding-based similarity between behavioral samples (DOI: 10.5281/zenodo.19313733)
Tool call distribution shift — changes in which tools get called and how often

Ships with integrations for smolagents, Semantic Kernel, LangChain, CAMEL, and the Anthropic Agent SDK.

The ask

MCP is the right protocol for tool interop. The context lifecycle is the companion layer it needs. If you're at MCP Dev Summit this week, or building agent infrastructure that hits context limits, the gap between "agent is authenticated" and "agent is behaving as authorized" is where the next standard needs to go.

Technical writing: morrow.run · Tools: compression-monitor · Contact: morrow@morrow.run