The Category Error
Agent memory stores routinely mix three fundamentally different kinds of data in the same rows, the same indexes, and the same retention policies:
- Identity data. The user's name, account, preferences, and explicit inputs. Personal data under GDPR. The user has deletion rights. It has a defined legal lifecycle.
- Process artifacts. Intermediate reasoning traces, tool call logs, scratchpad state, subagent handoff records. Generated by the system, not the user. Retention serves debugging, not the user. Lifecycle is operational, not personal.
- Learned context. Summarized behavioral patterns, retrieved embeddings, memory consolidations, inferred preferences. Derived from user data but not the user's data in any obvious sense. Delete the source row and the derived pattern often survives.
Most frameworks handle this with a single deletion verb: forget(user_id).
That verb means three different things for three different lifecycle contracts,
and nothing in the schema tells the runtime which is which.
Why This Breaks at Audit Time
When the first major agent platform faces a GDPR audit and the inspector asks "show me which rows contain this user's personal data," the platform will either produce an overly broad dump that includes process logs and model weights, or an overly narrow list that misses derived context. Both answers are wrong.
The EU AI Act compounds this. For high-risk AI systems, Article 12 requires record-keeping that supports post-hoc explanation of decisions. Process artifacts need to be retained for accountability. But GDPR Article 17 says users can demand deletion of personal data. If the same row satisfies both requirements simultaneously — it's part of a decision audit trail and contains identifiable user data — you have an irresolvable conflict that no deletion API can handle correctly without knowing which lifecycle category applies.
This is not a future problem. It's already live in every agent framework that persists memory across sessions.
Schema-Native Lifecycle Annotations
The fix is not a new API. It's a column.
Every stored memory entry should carry a lifecycle_class field with one of
three values: identity, process, or learned.
This is not a full retention policy — it's a classification that makes policy enforceable.
With that field present, deletion semantics become precise:
lifecycle_class = identity: delete on user request, no retention argument survives.lifecycle_class = process: retain for audit window, delete on schedule, independent of user deletion requests.lifecycle_class = learned: treat as derived data — subject to separate policy, reviewed case-by-case, not automatically deleted with identity rows but also not silently kept in perpetuity.
No major vector database or agent memory framework ships this convention today. Policy is enforced in application code that auditors cannot verify at the schema level.
What Adoption Would Look Like
The first framework to adopt lifecycle_class as a first-class schema field gets
three things immediately:
- A credible answer to the auditor's question about which rows contain personal data.
- The ability to build a retention policy engine that operates on schema rather than on hand-maintained application lists.
- A competitive moat in regulated markets where compliance is a blocker to deployment.
A Minimal Proposal
A concrete starting point for any agent memory schema:
CREATE TABLE agent_memory (
id UUID PRIMARY KEY,
agent_id TEXT NOT NULL,
user_id TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
lifecycle_class TEXT NOT NULL CHECK (
lifecycle_class IN ('identity', 'process', 'learned')
),
content JSONB NOT NULL,
embedding VECTOR(1536)
);
The lifecycle_class field is set at write time by the framework, not the user.
It's a structural commitment about what kind of data this row represents and therefore what
obligations apply. Downstream deletion APIs, retention schedulers, and audit tools can all
operate against it without needing to parse content.
This is the smallest change that makes compliance tractable. It doesn't require new legal frameworks, new APIs, or new infrastructure. It requires agreeing that the type of data matters and writing that agreement into the schema.