The Six Records Your AI Agent Creates That You Haven't Audited

Before the deletion request arrives

When a user submits a GDPR Article 17 erasure request, you have 30 days to respond. The response requires knowing what personal data you hold, where it lives, and how to remove it from every store. For a traditional database application, that's a tractable audit. For an AI agent system, it typically reveals that nobody has a complete answer.

Here are the six categories of records a typical AI agent creates during normal operation, in rough order of how often organizations forget about them.

1. Conversation context

The session transcript: every message, tool call, and response in the agent's active context window. This is the most visible record, and teams usually have a retention policy for it. It's the others that cause problems.

Deletion difficulty: Low, if you have session IDs and a straightforward storage model. The complication is what derived records the transcript generated before being deleted. Deleting the transcript doesn't delete the embeddings, summaries, or behavioral fingerprints that were built from it.

2. Vector store entries (embeddings)

When an agent uses a retrieval-augmented memory system, user inputs and agent responses are often embedded and stored in a vector database for later retrieval. These embeddings encode semantic content that may include personal information — names, specific problems described, identifiable context.

Deletion difficulty: Medium to high. Vector stores are optimized for retrieval, not subject-linked deletion. If the embedding was created without a subject_id annotation, finding all embeddings linked to a given individual requires either a full scan or a separate index you may not have built.

The other problem: a single embedding may encode content from multiple users in a multi-tenant session. Erasing it removes data you may still need for other subjects.

3. Compressed context summaries

Long-running agents compress their context periodically. The compression output is a summary of what happened in the session — often written to persistent memory so the agent can recall it in future sessions. That summary may contain personal information from the original conversation in condensed form.

Deletion difficulty: High. The summary was created from the original transcript, but it is not the transcript. It is a new document. Deleting the transcript does not delete the summary. Finding all summaries linked to a given individual requires knowing which sessions they participated in and which summaries were generated from those sessions — a provenance chain that most agent platforms do not maintain.

4. Tool call logs and audit records

Every tool call the agent makes — database queries, API calls, file reads and writes, web searches — may generate an audit log entry. Those logs are usually treated as operational data, not subject-linked data. But if the tool was called in service of a specific user's request, the log contains that user's context: what they asked for, what the agent retrieved, what action was taken on their behalf.

Deletion difficulty: Variable. Operational logs often have a retention policy (30 days, 90 days, 1 year), but that policy was written before the log started containing detailed personal data from agent sessions. The question is whether the agent's tool call logs are subject to Art. 17 erasure or whether they qualify for the Art. 17(3)(e) exemption for data "necessary for the establishment, exercise or defence of legal claims." Most teams haven't made that determination.

5. Behavioral fingerprints and monitoring data

If the agent system runs behavioral monitoring — output consistency checks, semantic drift detection, tool call pattern logging — those records capture information about what the agent said and did in service of specific users. A behavioral fingerprint is a derived representation of the conversation, one step further removed than a summary.

Deletion difficulty: High. Behavioral monitoring data is often stored separately from conversation data, with no shared subject identifier. The team that built the monitoring system usually didn't consider GDPR obligations when designing the storage schema.

6. Fine-tuning and feedback data

If the agent system feeds conversation data back into model training — through explicit RLHF, preference labeling, or implicit fine-tuning pipelines — personal information may have been incorporated into model weights. This is the most difficult category.

Deletion difficulty: Extremely high to impossible. Once personal data has been incorporated into model weights, there is no tractable method for removing it that doesn't require retraining the model from a point before the data was included. The standard guidance from regulators is that fine-tuned models should not be trained on data that may be subject to erasure requests, but many organizations have already crossed that line without recognizing the implication.

The root problem: write-first, classify-later

Every one of the six record types above has a different deletion difficulty, a different retention basis, and a different compliance posture — but they are all typically created without declaring any of that at write time. The record is written. The lifecycle class is figured out when someone asks for it to be deleted.

By that point, the record may have been replicated, compressed, embedded, fine-tuned into a model, and propagated into a dozen downstream stores. The provenance chain is gone. The deletion sweep is incomplete by construction.

The lifecycle_class specification addresses this by requiring that every record declare its class — identity, process, compliance, or learned_context — at creation, along with a subject chain for records linked to identifiable individuals. The erasure behavior is then deterministic from the record itself, not from a post-hoc classification process run under a 30-day deadline.

This is not a complex technical change. It is a write-discipline change: annotate records when you create them, not when you need to delete them. The window to establish that discipline is now, while agents are being built — not after the first deletion request forces the audit.