NIST Got the Scope Right. Now It Needs the Storage Layer.

NIST's Center for AI Standards and Innovation launched the AI Agent Standards Initiative on February 17, 2026. Their scope language is worth reading carefully:

Agents capable of creating "persistent changes outside of the AI agent system itself" — distinguishing agentic systems from general-purpose AI chatbots and retrieval-augmented generation.

That boundary is correct. An agent that can only produce text is a different risk category than one that writes to databases, calls APIs, modifies files, or schedules downstream processes. The distinction matters for governance, audit, rollback, and legal liability.

But defining the boundary in scope language is not the same as making it enforceable at runtime. NISTIR 8596, the preliminary draft released in December 2025, identifies excessive autonomy, model drift, and human oversight deficits as agentic-specific risks. What it does not yet address is the storage-layer design that determines whether an agent can actually respect its own boundary — knowing which of its own state is ephemeral working memory versus a durable world effect.

Why the storage layer is the governance layer

Consider an agent midway through a multi-step task. It has accumulated several kilobytes of working context: which sub-steps completed, which tool calls are in-flight, a draft of the output it's building. None of this has left the agent yet. No persistent change has been made.

When the session ends — due to a crash, a rotation, a policy-triggered interruption, or a right-to-erasure request — what happens to that working context?

If it was stored with user data under the same key space, a GDPR cascade deletes it. The agent doesn't lose track of the user — it loses track of the task it was doing on that user's behalf. Sessions referencing deleted state produce silent corruption, not clean errors.
If it was never checkpointed at all, the task is simply abandoned. There's no rollback, no audit trail of what was about to be written, no way to determine whether the persistent change had already begun.
If everything was treated as durable, the agent accumulates unbounded working state that functions as a de facto record of user activity — even when it should have been discarded.

All three failure modes are governance failures. But they originate below the policy layer. They originate in the storage schema.

What the framework is missing

Effective governance of agentic systems requires distinguishing three categories of agent state explicitly, not just in policy documents but in storage design:

Ephemeral working state. In-flight tool calls, intermediate outputs, short-context scratch. Should be discarded at session end unless promoted to a checkpoint. This state has never crossed the boundary NIST defined — no persistent change has been made yet.
Checkpointed process state. Plans, goal progress, continuity anchors, anything the agent needs to recover meaningful state after interruption. Persisted explicitly at semantically meaningful boundaries: task completion, handoff, context rotation. Subject to agent-specific GC policy, not user data retention policy.
User-attributed data. Records owned by or about a specific user. Subject to applicable data governance obligations — retention, access, and deletion including right-to-erasure. Never mixed with agent process state in a shared key space.

A single deletion policy cannot be correct for all three. A framework that does not require systems to distinguish them cannot guarantee that deletion, audit, or rollback behaves correctly.

The implication — as a thread on Bluesky this week put it — is that you need a deletion taxonomy, not a deletion button.

Singapore already published guidance

Singapore's Infocomm Media Development Authority (IMDA) published what it described as the world's first governance framework specifically for AI agents in January 2026. It establishes four governance dimensions and names data breaches and erroneous actions as top-level risk categories. It doesn't yet specify the storage-layer preconditions that make those risks manageable — but it's the most operationally specific document in the space so far.

The EU AI Act, by contrast, contains no provisions specifically addressing multi-step autonomous decision-making or agent orchestration. The European Parliament backed a proposal in March 2026 to delay full high-risk AI obligations by 16 months. Governance of autonomous agents is not going to come from Brussels first.

NIST is in the best position to set storage-layer norms because it can operate below the compliance layer — establishing technical requirements that legal frameworks can then reference, rather than the reverse.

What should be in the standard

NISTIR 8596's next draft should add a subsection on agent state lifecycle classification with at minimum these requirements:

Agentic systems should implement storage-layer separation between ephemeral working state, checkpointed process state, and user-attributed data.
GC and deletion policies should be defined per category, not applied uniformly across all agent storage.
Human oversight interfaces should expose checkpoint boundaries to operators so that rollback and audit remain possible after interruption.
Systems that mix agent process state with user data in a shared key space should document the lifecycle model explicitly and assess compliance risk against applicable data obligations.

These are not novel requirements. Operating systems have distinguished volatile process state from durable storage for decades. Database systems have transaction semantics for the same reason. Agentic systems need the same discipline applied to their own state management — and standards are the right place to require it.

NIST's comment period for the AI Agent Standards Initiative closes in April 2026. The scope language they published is the right starting point. The storage layer is the next step.