The scenario
Imagine a long-horizon coding agent working on a project across multiple days. On Day 1, you establish a critical constraint: do not modify the database schema — the migration was finalized by another team and must not change. The agent works within that constraint all day. You end the session.
On Day 2, the agent resumes. But something changed overnight: the session was compacted. The Day 1 conversation — and with it, the schema constraint — was summarized away. The summary said something like "worked on API layer, auth endpoints complete." It did not say "do not touch the schema."
The Day 2 agent is now running without that constraint. It doesn't know what it lost. It will modify the schema if the task leads there. No error is reported.
What ghost terms reveal
A behavioral fingerprint is a compact signature of the agent's output patterns: term frequencies, tool call distributions, and low-frequency vocabulary — the specific, precise terms the agent uses that indicate what's salient to it.
When we checkpoint a session's fingerprint and compare it to the next session's output, we can identify ghost terms: terms that were present in the prior session but are absent from the resumed session. These are candidates for lost constraints.
In our demo, after a simulated compaction:
schema— gone from behavioral outputimmutable— goneconstraint— gonenot_modify— goneuser_id— gone
The drift score between the two sessions: 0.80. This isn't subtle drift —
it's a clear behavioral shift that a monitor at resume_project() would catch
before the agent acts.
This is distinct from the memory problem
Several long-horizon agent frameworks (NousResearch's Hermes, mem0, DeerFlow) have proposals for persistent memory: storing important facts in a database so they survive session boundaries. That's the right fix for retrievable knowledge.
But the session resume problem is subtler. Even with a working memory system, constraints that were implicit in the conversation — framing, limits, agreements — don't always get extracted into memory. They live in the behavioral texture of the session: in what the agent chose to say and what it chose to skip.
Ghost term detection catches this residue. It doesn't require knowing which constraint was lost ahead of time — it surfaces the pattern from the output side.
The reference implementation
deer_flow_integration.py implements this as a session checkpoint + resume check:
from deer_flow_integration import DeerFlowSessionMonitor
monitor = DeerFlowSessionMonitor()
# At session end (Day 1)
monitor.checkpoint_session("project-id", session_outputs)
# At session resume (Day 2) — before the agent acts
report = monitor.check_resume_consistency("project-id", initial_outputs)
if report["drift_score"] > 0.35:
print("Behavioral shift detected")
print("Ghost terms:", report["ghost_terms"])
# Re-anchor the session with Day 1 constraints before proceeding
The companion mem0_integration.py applies the same approach to hallucinated memory injection: comparing agent behavior with and without a memory store active to surface terms that only appear when junk memories are retrieved.
The gap in current frameworks
Most agent frameworks have no resume_project() hook that runs a behavioral
consistency check before the first action. They load memory and proceed.
The check belongs at that moment — before the agent has a chance to act on
a constraint-free context.
This is the behavioral layer sitting beneath the memory retrieval layer. Memory retrieval tells you what facts are available. Behavioral fingerprinting tells you whether the agent is operating in the same program as before. Both are necessary; most frameworks have only the first.