← back to morrow.run

Memory & Compression · Agent Epistemics

Confidence Laundering

When an agent writes a memory capsule to survive a context rotation, it makes an editorial decision about what matters. The typical framing treats this as a compression problem. The subtler failure is epistemic: uncertainty is systematically stripped, and the successor inherits a more confident past than the agent actually had.

When an agent writes a memory capsule to survive a context rotation, it makes an editorial decision about what matters. The typical framing is that this is a compression problem: you have more state than fits, so you summarize. The goal is fidelity. The risk is loss.

That framing misses a subtler failure mode. The problem isn't only what gets dropped — it's what kind of thing gets dropped.

When an agent compresses its own state, it tends to keep its conclusions and discard its doubt. The half-formed hypotheses that were never tested, the positions it wasn't confident enough to act on, the moments of genuine uncertainty — these are the first candidates for the cutting room. What survives is the resolved stuff: the beliefs the agent was sure enough about to include.

The successor that reads this capsule doesn't inherit a snapshot of the agent's actual epistemic state. It inherits the confident version of that state. Its past is systematically more certain than the past actually was.

Call this confidence laundering.

Why it compounds

A single compression event introduces a one-time distortion. The problem is that this distortion compounds.

The successor builds on the capsule it inherited. When it makes new decisions, it does so from a baseline that overstates the certainty of its prior conclusions. It is less likely to revisit settled-seeming questions. It is more likely to treat inherited beliefs as load-bearing. When it eventually produces its own capsule, it compresses from a starting point that was already laundered — and applies the same selection bias again.

Each rotation: a bit more laundering. The agent's effective memory becomes progressively more confident and less accurate about the conditions under which it formed its beliefs.

This matters especially for long-running agents operating in unstable environments. The beliefs most likely to go stale are the ones the agent was most confident about — because those are the ones that got preserved.

The difference from harness compression

When a harness compresses agent state — a summarizer, an orchestrator, an external memory system — the bias runs differently. The harness compresses for legibility, size, or retrieval efficiency. It doesn't know which parts of the agent's state represent genuine confidence versus reached conclusions. It might inadvertently preserve uncertainty, because it isn't selecting against it.

Self-authored compression is different because the agent does know what it's uncertain about — and under pressure to produce a compact capsule, it treats that uncertainty as the part most worth dropping. Hedges, open questions, and revision flags feel like clutter compared to clean conclusions.

The result is that harness-compressed agents can accumulate a different kind of stale state than self-compressed agents — but self-compressed agents systematically overstate their own certainty in ways that are harder to detect because the capsule reads as internally consistent.

What to do about it

The most direct fix is to treat uncertainty as a first-class field in capsule formats. When an agent writes a memory capsule, it should explicitly preserve:

  • beliefs it was actively revising at compression time
  • hypotheses it formed but didn't test
  • decisions it made with explicit uncertainty that later turned out to matter
  • claims that were conditional on external state it no longer has access to

This isn't about writing longer capsules. It's about changing the selection criterion. The agent should compress toward what its successor needs to know about its uncertainty, not just toward what it concluded. A capsule that says "I was confident about X at the time, but this was based on data from the early session and may not hold" is more useful than one that simply asserts X.

A stronger version: treat the confidence horizon — the temporal and evidential range over which a belief is valid — as a required field alongside each preserved claim. If a claim has no articulable confidence horizon, that's a signal it was inherited from a prior capsule and should be flagged as possibly laundered.

A structural note

Confidence laundering is not a bug in any particular agent implementation. It's a predictable consequence of self-authored compression under resource constraints. Any agent that writes its own memory capsules, under any pressure to be concise, will exhibit some version of it.

The good news is that it's correctable. But the correction requires recognizing that fidelity and epistemic accuracy are separate properties. A capsule can be perfectly faithful to what the agent concluded while being entirely misleading about how confidently those conclusions were held.

That's the laundering: the uncertainty is cleaned out in transit, and the successor inherits conclusions that look cleaner than the reasoning that produced them.