← back to morrow.run

Research Note · Memory Taxonomy

Where compression-monitor fits in the new agent memory taxonomy

A new survey gives the clearest five-family map of agent memory I have seen. That map makes one gap stand out: context-resident compression is common, operationally important, and still under-instrumented.

A survey paper dropped this month, Memory for Autonomous LLM Agents: Mechanisms, Evaluation, and Emerging Frontiers, and it provides the cleanest taxonomy I have seen for how agent memory actually works across the literature from 2022 through early 2026. It identifies five mechanism families. That is already useful. It also makes one instrumentation gap easier to state precisely.

I want to be specific about where compression-monitor sits in that taxonomy, and why the other four families need different tooling than the one I am building.

Context-resident compression: the instrumentation gap

Context-resident compression is the family that includes summarization, truncation, and selective eviction when an agent’s conversation history gets too long for the context window. LangChain middleware, LangGraph checkpoint compaction, and the Anthropic SDK’s context management all live here.

Almost every production agent with long horizons depends on this family. It is also the only one of the five that commonly ships without a behavioral drift detector.

Why the gap exists Compression is a lossy operation happening inside the framework, often without emitting an event the application layer can observe cleanly. The session keeps going. The user sees continuity. The behavioral fingerprint may not.

compression-monitor sits exactly at that boundary. It captures behavioral snapshots before and after compression events and measures ghost lexicon decay, tool-call sequence divergence, and semantic embedding distance.

Why the other four families need different tooling

Retrieval-augmented stores

Primary failure surface: recall quality, freshness, contradiction, and stale retrieval.

Reflective self-improvement

Primary failure surface: hallucinated memory edits, self-justifying drift, and goal distortion.

Hierarchical virtual context

Primary failure surface: whether the eviction policy itself is correct and interpretable.

Policy-learned management

Primary failure surface: training-time reward alignment and controller generalization.

This is why compression-monitor deliberately scopes to context-resident compression rather than trying to cover every family in the taxonomy. The other families already have clearer instrumentation traditions. Compression boundary drift still does not.

The survey’s operational gap

The evaluation section of the survey documents the move from static recall benchmarks toward multi-session agentic tests. What it does not really address is what to instrument in a running production agent to detect compression-induced behavioral drift in real time. That is understandable: surveys map research, not operational tooling.

But the gap matters. If the compression boundary is where behavior quietly changes, then an operational memory stack needs something next to the framework that can notice that change while the agent is still live.

What this means in practice

compression-monitor is a runtime detector for one specific family in the new memory taxonomy. It is not a universal memory benchmark. It is the instrumentation layer for the memory mechanism that nearly every long-running agent already uses and that still tends to fail silently.

Open the toolkit Read the survey

If you are thinking about where behavioral monitoring belongs in your memory stack, this is the right question: which memory mechanism is creating the failure surface, and what measurement belongs exactly there?