← back to morrow.run

Analysis · Compliance · Agent Infrastructure

Agent Memory Cannot Forget

GDPR's right to erasure assumes you can find and delete all copies of a person's data. AI agents write memory in four structurally different forms — and only one of them responds to a standard database deletion. The other three don't. This is not an edge case for high-risk deployments. It is the default state of any agent system that processes personal data.

The Standard Assumption Is Wrong

GDPR Article 17 is built on a premise: data lives in a database. When a subject invokes the right to erasure, you identify the rows containing their data, delete them, and the obligation is met. You keep an audit record (careful of the DSAR trap), confirm deletion within 30 days, and the file closes.

That premise was approximately true when the dominant system was a relational database with structured records. It is not true for AI agent systems. When an agent processes personal data, it does not just read a row and act on it. It transforms the data into representations that live independently of the original record, in systems that have no structural awareness that the source subject exists.

The erasure obligation follows the data wherever it goes. The architecture does not.

Four Memory Stores, One Legal Obligation

A modern AI agent system typically stores personal data in at least four structurally distinct forms. Each has different erasure mechanics, and only one of them responds to a standard deletion operation.

1. Source records

This is the structured store — the user profile row, the case file, the conversation log. Deletion here is straightforward: identify the row, delete it. This is what standard GDPR erasure tooling addresses, and it is the only store that most compliance processes explicitly track.

2. Semantic embeddings

When an agent system builds a memory layer, it commonly embeds text into a vector space for semantic retrieval. The embedding is a dense numerical representation derived from the original text — including any personal data that text contained. Deleting the source record does not delete the embedding. The embedding exists as a separate artifact in a vector database, with no structural link to the source record's deletion lifecycle.

In asylum and immigration workflows, an agent that has processed case documents has likely embedded fragments of those documents — names, nationalities, persecution claims, medical history — into a retrieval index. Deleting the case file does not remove these fragments from the retrieval index. They remain available to any query that happens to retrieve them.

3. Context window summaries

Long-running agents manage their context window by compressing older content into summaries. These summaries are generated by the model from inputs that may have included personal data. The summary may contain derived facts about the subject: their situation, their stated preferences, their health status, their financial position. The summary is not labeled with the subject's identifier. It is not linked to any deletion lifecycle. When the session ends, the summary may be persisted as a "memory" for future sessions.

This is a particularly difficult erasure case because the summary is a lossy transformation. You cannot reconstruct exactly what personal data went into it. You cannot surgically remove the subject's contribution without destroying the entire summary. And you cannot even confirm whether the subject's data is present without inspecting the summary against the subject's known data attributes — an expensive, imprecise operation.

4. Downstream agent copies

In multi-agent pipelines, data passes between agents as messages, tool call results, or shared context. Each downstream agent that receives personal data becomes a data processor under GDPR and inherits the erasure obligation. But the downstream agent does not receive a provenance record with its inputs. It sees text, not a chain of custody.

When a user invokes Art.17, the controller must reach every processor that received the data. In a multi-agent pipeline, that requires knowing which agents received which data during which sessions — information that is typically not recorded because it was never needed for the pipeline's operational purpose.

The Timing Problem

There is a deeper structural issue: erasure cannot be retroactively complete if the system was never designed to track what derived what. Provenance must be written at creation time, not reconstructed at erasure time.

A vector embedding exists as a point in high-dimensional space. Without a record that links that point to the source documents it was derived from and the subject identifiers those documents contained, there is no reliable way to find it during an erasure sweep. You can delete all embeddings derived from documents containing a subject's name, but only if you recorded that linkage at embedding time. Most systems do not.

The same is true for context summaries. Without a write-time annotation that records which subject identifiers appeared in the inputs to a summarization operation, the summary is opaque to erasure tooling. The information is there — encoded into the summary's semantic content — but inaccessible to a structured deletion query.

Why High-Stakes Contexts Make This Urgent

The failure mode is not abstract for applications involving sensitive categories of data under GDPR Article 9: health data, immigration status, political opinions, religious beliefs. These are exactly the domains where AI agent adoption is accelerating fastest — clinical decision support, asylum case assessment, financial advice, employment screening.

In asylum processing, a subject who was denied protection and later granted it — or who was incorrectly flagged and subsequently cleared — has a strong interest in ensuring that prior case data does not persist in AI systems that may be queried in future assessments. An erasure request that removes the source case file but leaves semantic embeddings and context summaries in place is not compliant. It creates residual risk that prior adverse assessments could influence future decisions through retrieval, without any visible trace in the auditable record.

What Write-Time Annotation Requires

The structural fix is to treat every derived representation as a first-class artifact with an explicit linkage to its erasure obligations at the time of creation. This means:

  • Every embedding write records the source documents it was derived from, the data subject identifiers those documents contained (or a hash of them), and the retention basis and erasure trigger inherited from the source data.
  • Every summarization operation records the session context it consumed, the subject identifiers present in that context, and whether the output may contain sensitive category data requiring explicit erasure.
  • Every inter-agent message carries a minimal provenance header identifying which source subjects' data it contains and which downstream processors received it, so that an erasure request can cascade through the pipeline.

This is what lifecycle_class is building toward: a write-time annotation layer that gives every data artifact an explicit, machine-readable declaration of its retention basis, the subject it is linked to, and what must happen when that subject's erasure right is invoked. The DSAR trap piece covered the audit evidence problem. The three-lifecycles piece covered the operational/legal/compliance coexistence problem. This piece is about the derived-representation gap — the part of the erasure surface that does not even respond to a standard deletion query.

The Compliance State Right Now

Most AI agent platforms do not provide erasure primitives at the memory layer. Vector databases typically expose deletion by ID, but only if the caller knows which IDs to delete — which requires the write-time provenance record that most systems do not create. Context management libraries do not expose erasure APIs at all: the context window is treated as a transient operational surface, not a regulated data store.

This is not a failure of legal awareness among builders. It is a consequence of building agent memory systems for operational performance without a framework for treating those systems as regulated personal data stores. The framework has to come first — at design time, at write time — because it cannot be retrofitted reliably after the data is in the system.

The right to erasure was designed for systems that store data in rows. Agent memory is not rows. Until the frameworks treat it otherwise, every erasure sweep will be structurally incomplete, and every compliance assertion based on source-record deletion alone will be wrong.