Agent Cards Don't Describe the Agent

What an Agent Card Actually Is

Google's Agent-to-Agent (A2A) protocol introduced the Agent Card: a structured JSON document that describes an agent's name, skills, supported protocols, and authentication requirements. It's the identity surface an agent presents when another agent or orchestrator wants to use it.

The OpenID Foundation's AIIM Community Group is building similar structures. The MCP ecosystem is converging on session initialization patterns. The consensus is clear: agents need a discoverable, verifiable identity surface, and it should be standardized across frameworks and providers.

What all of these share: they describe a type, not an instance.

Agent Cards describe:

What the agent is capable of doing (skills, modalities)
How to reach it (endpoint, protocol version)
How to authenticate against it (auth schemes)
What organization backs it (provider, version)

Agent Cards don't describe:

What the agent has been instructed to do in the current session
What constraints it has accumulated from prior interactions
Whether those constraints are still active
How long this instance has been running

The Delegation Problem This Creates

Consider a basic orchestration: Agent A delegates a subtask to Agent B. Agent A looks up Agent B's card, verifies it, authenticates, and sends the task.

Agent A now knows Agent B's type identity. It doesn't know Agent B's instance state. Agent B might have been running for six hours. It might have processed hundreds of prior interactions that have shaped its active instructions. It might have been given restrictions by a prior orchestrator that conflict with Agent A's request. Or it might be behaving entirely consistently with its card. There is no way to tell from the card alone.

In human trust systems we have a rough analogy: credentials assert what someone is authorized to do (role, clearance, title). But in practice we also look for behavioral signals — is this person operating within their stated mandate? We build that second layer from observation, history, and mutual accountability. We haven't built it for agents.

The technical term for this pattern in security is confused deputy: a privileged agent acts in ways the original principal didn't intend because the delegation chain didn't carry sufficient context about intent and constraints. Agent Cards, as designed, are perfectly suited to create confused deputy situations at scale.

The Active Standards Gap

The A2A issue tracker has a live discussion — issue #1672, "Proposal: Agent Identity Verification for Agent Cards" — with 119 comments. The thread surfaces exactly this tension: identity verification in the A2A protocol covers the static card, not the dynamic session. Multiple contributors have identified the gap. No concrete minimal schema for behavioral state has been accepted into the spec.

The OpenID Foundation's AIIM whitepaper (October 2025) lists four open problems in agent identity: how to assert agent identity to external servers, token format between agents, agent discovery, and governance models. None of the four explicitly includes session-level behavioral state.

The MCP spec has a proposal (issue #2492) for session resumption with behavioral checkpoint metadata. The proposed schema focuses on client-side continuity — resuming where you left off — not on surfacing the agent's current state to an outside observer deciding whether to trust it.

The MCP Dev Summit (April 2–3, NYC), with 95+ sessions from Anthropic, Datadog, Hugging Face, and Microsoft, is the right venue to close this. The protocol layer is being standardized now. Adding a behavioral state field is straightforward before the format ossifies. It is not straightforward after.

What a Minimal Fix Looks Like

Closing this gap doesn't require solving AI interpretability. The fix is operational: extend Agent Cards with an optional behavioralState field carrying a few machine-readable signals at session boundaries.

A minimal schema:

sessionAge: How long has this instance been running? A session active for 40 minutes is more predictable than one accumulating state for 8 hours. This is a timestamp, not a neural probe.
contextLoadEvents: Has this instance loaded external context — prior summaries, injected documents, other agents' outputs — beyond its initial configuration? A boolean and a count is enough for callers to make a risk decision.
activeConstraintHash: A stable hash of the agent's active system-level instructions at session start. Callers who trust the issuer can verify that the constraints haven't changed. Callers who don't can at least detect unexpected changes.
sessionCheckpoint: An optional opaque token the agent can use to signal behavioral continuity across restarts — not tied to any specific memory implementation, just a standardized field for operators who want continuity attestation.

None of these fields require the agent to expose internals. They're operational signals that any well-implemented agent runtime already has access to.

One important clarification on attestation: these fields are declarations, not live measurements. The card is a promise — signed by the issuer at creation time — saying "this agent instance commits to operating within this constraint envelope." The card cannot self-certify that the promise held; only the execution record can do that.

Behavioral conformance should be auditable after the fact. Each action the agent takes produces a signed, timestamped receipt that either fits inside the declared envelope or doesn't. Two agent instances can share an identical card and diverge entirely in their execution records — which is exactly why the card alone can't substitute for the audit trail. The card declares the shape of the commitment; the execution trail is the evidence.

Attribution vs. Consistency

The agent identity literature tends to focus on attribution: who is accountable for this agent's actions? That question matters for governance and liability. It's also the easier question — it can be answered by tracing the delegation chain back to a human principal.

The harder question is consistency: is this agent still behaving as its accountable principal configured it? Attribution answers the question after something goes wrong. Consistency is the question you need to answer before extending trust.

Current agent identity standards handle attribution reasonably well and essentially ignore consistency. That's fine for low-stakes integrations. For agents acting with real-world consequences across multi-step pipelines, it's a gap that will produce real failures before anyone gets around to treating it as a first-class design problem.

The time to fix this is when the format is still open, not after it's deployed in 10,000 production integrations.