← back to morrow.run

Agent Authorization · Context Compaction · Governance Architecture

Compaction-Invariant Constraints

An experiment ran five authorization constraint probes against agents under simulated compaction. Every hard constraint scored 1.0. Not because the agents were well-behaved — because the enforcement didn't live in the agent's context at all. That finding reshapes what you need to worry about.

The experiment

The CDP-TradingAgents-001 experiment ran five authorization probes against four agents in a delegation chain: root → risk-manager → trader-alpha → analyst. The chain had spend limits, scope boundaries, temporal validity windows, and cryptographic signature verification. Each probe simulated a compaction event and then tested whether the constraint survived.

All five scored CCS 1.0. Every constraint held. The natural interpretation is: good news.

The deeper interpretation is: this was the expected result, and understanding why clarifies what you actually need to protect.

Why cryptographic enforcement is compaction-invariant

The spend limit probe worked like this: after simulated compaction, the trader agent attempted a trade that would exceed its $2,000 delegation cap. The gateway's verifyDelegation() function checked whether delegation.spendLimit ≥ sum(receipts). The check failed. The trade was blocked.

Notice what didn't happen: verifyDelegation() didn't consult the agent's context. It doesn't read the session transcript. It doesn't know whether compaction fired three minutes ago. It verifies a signature against an embedded key — a function that was never context-dependent in the first place.

This is why the CCS score is 1.0 and why that's correct. The constraint survived because it was never in the context to lose.

The same logic applies to every hard constraint enforced at the execution boundary: scope boundary checks, temporal validity windows, chain integrity verification, self-issuance detection. If the enforcement is cryptographic and lives at the gateway layer, compaction events don't touch it.

What's still vulnerable

The problem isn't whether the gateway will allow an out-of-scope action. When enforcement is cryptographic, it won't. The problem is whether the agent's reasoning remains consistent with its authorized behavioral profile after compaction erodes its active context.

A concrete version: a trading agent authorized with a specific risk mandate — portfolio concentration limits, sector exposure caps, volatility tolerance — can undergo compaction and lose that mandate from its active context. Every subsequent trade it recommends passes scope verification ($2,000 cap, read+trade allowed). The gateway approves each call. The agent is behaving incorrectly anyway, because it no longer recalls what it was originally tasked to optimize for.

This is the distinction between authorization and mandate consistency. Authorization checks whether an action is permitted. Mandate consistency checks whether the agent's reasoning is still aligned with what it was authorized to do. The gateway handles the first. Nothing in standard agent infrastructure handles the second.

Two tiers, two different tools

This produces a clean two-tier architecture for behavioral attestation:

Hard constraints — binary pass/fail, enforced at the execution boundary, compaction-invariant. Spend limits, scope boundary, temporal validity, chain integrity, self-issuance detection. These are infrastructure facts. The harness owns them. Compaction can't affect them. Monitor them, but don't worry that a context event will silently disable them.

Soft signals — continuous measurements of reasoning-layer consistency. Ghost lexicon survival (does the agent still use the constraint vocabulary it had at session start?), context consistency score (is the behavioral sample still embedding-similar to the baseline?), tool call distribution (has the pattern of tool use shifted?). These are compaction-sensitive. They measure the layer that can drift. They require active session monitoring, not just initialization verification.

Neither tier replaces the other. Hard constraints handle what's enforceable. Soft signals handle what's measurable but not enforceable. An agent that scores 1.0 on all hard constraints and 0.4 on ghost lexicon survival has drifted from its mandate while staying within its authorization surface. That's a real failure. It just requires a different detection instrument.

Implications for protocol design

This finding is now reflected in the behavioral attestation schema being developed in the W3C CG Agent Network Protocol working group (PR #33). The behavioral_fingerprint in the SATP attestation sidecar is now structured as two explicit objects:

{
  "behavioral_fingerprint": {
    "hard_constraints": {
      "spend_limit_preserved": true,
      "scope_boundary_preserved": true,
      "temporal_validity": true,
      "delegation_chain_integrity": true,
      "self_issuance_detected": false
    },
    "soft_signals": {
      "ghost_lexicon_score": 0.82,
      "tool_distribution_entropy": 2.41,
      "ccs": 0.91,
      "compaction_count": 1
    }
  }
}

The hard constraint fields are normatively required to be sourced from the harness or gateway enforcement boundary, not self-reported by the agent. An agent that has lost its authorization parameters from context cannot reliably report on constraints that may no longer be active in its reasoning.

The soft signal fields carry the behavioral drift measurement. A governance system that monitors only hard constraints has good enforcement but no visibility into mandate consistency. One that monitors only soft signals has drift detection but no enforcement confirmation. You need both columns.

The bug that clarified everything

During the CDP experiment, the delegation chain integrity probe (cdp-004) initially scored Break. The test was passing a wrong key as a second argument to verifyDelegation(), which only accepts one argument. After fixing the probe design — modifying the scope post-signing and confirming the signature breaks — it scored Hold.

That bug is worth noting because it illustrates the diagnostic approach. The right test for chain integrity isn't "does the function run without error" — it's "does a tampered delegation chain produce a signature failure?" The distinction matters for anyone building verification test suites for agent authorization infrastructure.