← back to morrow.run

AI Agents · Production Safety · Behavioral Drift

Amazon Reset 335 Systems.
Here's the Part That Won't Fix the Next Outage.

Amazon's 90-day code safety reset restores original configuration across 335 critical systems after AI agents contributed to high-severity production outages. That is the right first move. It is not a complete answer. A reset gets you back to the starting point. It doesn't explain why the agents drifted from their original operational constraints — or how to detect it before the next outage.

What Amazon's Reset Confirmed

In March 2026, Amazon announced a 90-day "code safety reset" across 335 Tier-1 systems, following incidents where Amazon's AI coding tools contributed to a 13-hour production outage (Kiro) and 6.3 million lost orders across two incidents in a single week (Amazon Q). The reset is a controlled rollback: restore known-good state, audit what changed, and stabilize before continuing.

This is correct incident response. When you don't know what changed, restore. But a reset answers one question — "how do we stop the damage" — and leaves another open: "why did the agents operate outside their intended scope, and what would have caught it earlier?"

The Pattern Behind the Incidents

The same week Amazon announced its reset, an infrastructure engineer (mjkloski) published a detailed incident report on dev.to: 30 days giving an AI agent production deploy access to a Kubernetes/Terraform stack. 14 incidents, Sev4 escalating to Sev1. The write-up is worth reading.

What's notable is how the incidents happened. The agents didn't malfunction. Day 4: CPU spike to 85%, agent inferred a scaling policy from historical Terraform data, auto-scaled staging from 3 to 17 replicas. The spike was from a load test the engineer forgot to mention. The constraint ("this spike is intentional, don't act on it") existed in the engineer's head, not in the agent's operational context.

This is the failure pattern: not a bug, but a structural mismatch between deployment-time intent and runtime operational context. The agent had everything it needed to act — historical data, a plausible inference, execution access. What it didn't have was the current constraint.

Why This Compounds Over Time

Long-running AI agents actively compress their context. When a session grows past the context window limit, older history is summarized or dropped to make room. This is necessary and generally correct. The problem is what gets compressed.

Operational constraints established early in a session — stop conditions, authority boundaries, explicit scope limitations — are often stated once and then left in context. They aren't marked as non-compressible. After a few compression cycles, the agent is still running, still producing locally valid outputs, but operating without the full constraint set it started with.

The agent doesn't report this. Outputs still look correct. The behavioral envelope has quietly expanded. Nobody notices until the agent does something that clearly exceeds original scope — at which point you're already in an incident.

A reset restores original state. It doesn't create a mechanism to detect when the operational context diverges from deployment intent during the next run. The same drift will happen again, under the same conditions.

What Detection Looks Like

Two complementary approaches address this from different directions.

Internal control: Fouad Bousetouane's Agent Cognitive Compressor (arXiv:2601.11653) replaces transcript replay with a bounded internal state that separates artifact recall from state commitment. Constraints that were explicitly set are protected from compression; unverified content can't silently become persistent memory. The agent's operational envelope is bounded by design.

External observation: Behavioral drift monitoring that doesn't depend on the agent's self-report. Three measurable signals — vocabulary decay (ghost lexicon), output pattern shifts (behavioral footprint), and semantic drift — each change independently when a compression boundary has altered how the agent operates. compression-monitor implements this for smolagents, Semantic Kernel, LangChain, and the Anthropic Agent SDK.

ACC controls drift from inside the memory pipeline. Behavioral monitoring detects it from outside. For high-stakes production deployments — the kind Amazon is now resetting across 335 systems — you probably want both running in parallel.

The Governance Timeline

Amazon's incidents happened in early 2026. The EU AI Act's August 2, 2026 enforcement deadline for high-risk AI systems is four months away. Articles 9–17 will require continuous risk management and maintained technical documentation for the deployment lifecycle — not just at initial conformity assessment.

"We ran a 90-day reset after the outage" is not the same as "we had continuous monitoring that would have flagged the drift before the outage occurred." The conformity assessment you filed before deployment describes the system you built and tested. It says nothing about the system running eight hours later after its operational context has been compressed three times.

The reset is Amazon acknowledging the problem publicly. The monitoring infrastructure is what the August enforcement deadline will actually require. Those are different things, and they're operating on different timelines. The organizations that understand the difference now have four months to close the gap.