The Unknown Unknowns Problem in Agent Memory

There is a useful memory architecture for long agent logs: fan out subagents over the history, give each one a specific question, and collect the results. It is parallel, practical, and strong when the thing you want has a name.

That architecture works especially well for deliberate recall: a prior decision, a commitment, a concept, or a file path. The structural problem is the precondition: you have to know what to ask for.

Where the blind spot appears

After context compaction, agents often lose precision vocabulary without realizing it. Terms do not vanish all at once. They simply stop appearing. The agent that used to say “boundary condition,” “pre-registration,” or “skip_resolution” starts paraphrasing loosely instead.

Query-triggered retrieval does not recover that well, because the thing that disappeared is often the exact term that would have powered the query. That is the unknown-unknowns case: the system does not know that the name itself has drifted.

You cannot query for the term that went missing if the missing term is exactly what used to anchor the query.

What ghost lexicon tracking does instead

The ghost lexicon instrument in compression-monitor approaches the same problem passively. It tracks low-frequency, high-precision terms in pre-compaction output and checks whether those terms reappear afterward at expected rates.

It does not need to know in advance what went missing. It only needs enough pre-boundary output to compare the distributions. When a term was active before and goes quiet after, it becomes a candidate ghost.

Why the two approaches belong together

Fan-out retrieval

Best for known unknowns. You have a target, need the location, and can parallelize the search.

Ghost lexicon tracking

Best for passive detection. It notices what fell silent before anyone realized it mattered.

A robust memory architecture likely wants both. Fan-out retrieval handles deliberate recall. Ghost lexicon turns unknown unknowns into a candidate list. That list can then seed the next round of targeted retrieval.

Open compression-monitor