IEEE P3394 Has Telemetry. That Is Not the Same as Accountability.

IEEE P3394 is the working group defining the Universal Message Format for LLM agents — the standard envelope for how agents talk to each other and to orchestrators. The reference implementation, scope3394 by NEOLAF, ships with OpenTelemetry tracing built in. That is genuinely useful for operators who want to see what their agent did. It does not solve the accountability problem, and those two things are often conflated.

What OTel actually gives you

OpenTelemetry traces are structured records of what the agent's runtime observed: which tools were called, in what order, how long each took, what errors were raised. For an operator debugging a failed task, this is exactly what you want. The spans are rich, the tooling ecosystem is excellent, and the integration is the right choice for runtime observability.

But OTel traces have a structural property that matters for accountability: they are recorded and controlled by the agent's operator. A trace says "the agent's own runtime logged these events." It does not say "the calling party can verify these events without trusting the agent's telemetry backend."

This is not a design flaw in OTel. It was never built to be a third-party verification mechanism. It is a design gap in how the P3394 message envelope currently treats post-execution accountability.

The actual accountability gap in P3394

P3394 handles the pre-execution identity layer well: the UMF includes agent roles, session context, and message framing. The calling party can construct a well-formed invocation and know who they are talking to.

What the spec does not yet address is the post-execution direction: after the agent completed the task, can the caller — or any third party — verify that the agent's claimed outputs correspond to the specific invocation that was sent?

This requires something structurally different from a telemetry span. It requires a signed record, produced by the executing agent, that binds:

the invocation identifier (so the receipt is tied to the specific call)
the tool calls that were actually made (so the outcome is attributable)
a hash of the output (so the result can't be substituted after the fact)
the agent's signing key (so the receipt is verifiable by the caller)

Submitted to a transparency log, this gives the caller a receipt they can audit independently. The telemetry backend's trustworthiness becomes irrelevant to the verification question.

The key distinction Telemetry answers: what did the agent's runtime record? An outcome receipt answers: what can the caller verify without trusting the agent's infrastructure? These are different questions and require different mechanisms.

Why this fits in P3394's scope

The UMF is an envelope. An outcome receipt is a natural extension of that envelope in the response direction: just as the request carries invocation context (session ID, agent role, message payload), the response could carry an outcome receipt (invocation hash, tool call summary, agent signature).

This doesn't require replacing OTel. Telemetry and accountability serve different audiences. The operator needs spans for debugging. The caller needs a receipt for verification. A well-designed standard supports both.

I opened issue #1 on scope3394 asking whether post-execution outcome receipts are within P3394's UMF scope, or intended for a companion standard. The answer shapes where the work should go. If P3394 covers the full invocation lifecycle, the receipt field belongs in the envelope spec. If not, a profile document or IETF liaison would be the right path.

The broader stack

Post-execution outcome verification is one layer in a larger accountability stack. Pre-execution identity is handled by WIMSE and P3394's own UMF. Scope enforcement is handled by Attenuating Agent Tokens (AAT). Behavioral state at execution time can be bound via RATS EAT. What connects these layers to independently verifiable outcomes is the execution receipt.

The EOV I-D was designed specifically to fill this position in the stack: layered on RATS and SCITT, complementing authorization frameworks rather than replacing them, and producing a signed receipt that can be checked by anyone who holds the original invocation context.

P3394's message envelope is the right place to reference this layer. The question is whether the WG sees it that way.