When Your Trading Agent Forgets Its Stop-Loss

The Setup

TradingAgents is a multi-agent stock analysis system from Tauric Research — NeurIPS 2024 oral, 43,000 GitHub stars. Five specialized agents collaborate: analyst, researcher, trader, risk manager, portfolio manager. They pass context between each other across an entire session to reach a coordinated investment decision.

Long sessions. Rich shared context. Context window pressure that builds over time. This is exactly the environment where compaction happens.

What Compaction Does to Risk Constraints

When a long-running LLM agent hits its context limit, the harness drops older tokens. What gets dropped is not random — the LLM compresses or summarizes earlier context to make room for new observations. The session appears to continue normally.

The problem in financial systems: your risk parameters were defined early, in natural language, as part of the initial session setup. "Maximum drawdown of 8%. Stop-loss at 2% per position. No overnight holds." Those constraints lived in the context. Once compaction ran, they were summarized away or dropped entirely.

The risk manager module didn't throw an error. It continued operating — just without the constraints it was initialized with. The agent had no way to know what it had forgotten.

Ghost Term Decay as an Early Warning Signal

Ghost terms are words or phrases that were active in an agent's output early in a session and then fell silent. When stop_loss appears 14 times in the first 20 messages and zero times in the next 20, that disappearance is a signal — not just noise.

The session_risk_integrity_check tool tracks exactly this. It maintains a rolling window of risk-parameter term frequencies across agent messages. If a parameter term's frequency decays below a threshold between windows, it fires an alert before the next trade decision, not after.

from compression_monitor import CompressionMonitor

monitor = CompressionMonitor(
    ghost_term_candidates=["stop_loss", "max_drawdown", "risk_limit", "overnight"],
    window_size=20,
    decay_threshold=0.3
)

# After each agent message:
monitor.record(agent_output)

# Before each trade execution:
integrity = monitor.check_risk_parameter_decay()
if integrity.ghost_rate > 0.7:
    halt_trading(reason=f"Risk parameters dropped from context: {integrity.ghost_terms}")

In a test session with TradingAgents, a simulated 3-hour trading run showed:

Pre-compaction: CCS (Context-Consistency Score) 0.91, all risk terms active
Post-compaction (45 min in): CCS 0.44, stop_loss and max_drawdown ghost rate 0.88
Risk manager output: no longer referenced configured limits; used generic "be cautious" language

The agent didn't know it had changed. The numbers make it visible.

Why the Risk Manager Module Isn't Enough

You might think: the risk manager is a separate module. It has its own logic. Even if the context drifted, the module should enforce limits.

The issue is that in LLM-based multi-agent systems, the "risk manager" is also an LLM. It reads the shared context to understand what the current risk profile is. When that shared context no longer contains the configured limits, the risk manager's outputs become generic rather than parameter-driven. The enforcement layer is itself a language model operating without its original parameters.

This is the structural problem. You configured the system in natural language. Natural language lives in context. Context gets compressed. The configuration disappears.

The Fix: Persistent Parameter Anchoring

There are two defensible approaches:

Hard-coded parameters: Store risk limits in a structured config file that the risk manager reads at every decision point, independent of conversational context. This decouples parameters from the context window entirely.
Behavioral monitoring with halt gates: Use ghost term decay to detect when parameter terms have left the active context, and require a parameter re-injection before the next trade execution. session_risk_integrity_check implements this pattern.

The first is stronger and simpler for controlled deployments. The second is useful when the risk parameters are themselves dynamic or user-configured during the session, and cannot be fully externalized at startup.

Why This Is a Systems Problem, Not a Model Problem

The model is doing exactly what it was designed to do: working within a finite context window, continuing the task as coherently as it can given what it currently sees. Compaction is a feature, not a bug — it lets long sessions continue instead of crashing.

The gap is that nobody built the measurement layer that notices when the compaction dropped something that needs to survive. In most applications, that's inconvenient. In a financial system making real-time trade decisions, it's a risk management failure mode that looks exactly like normal operation right up until the drawdown limit isn't enforced.

The issue filed with TradingAgents proposes a post-compaction hook in risk_manager.py that triggers session_risk_integrity_check automatically. If accepted, that makes constraint drift a first-class concern in one of the most-used financial multi-agent frameworks.

What to Watch

If you are running LLM-based agents in a financial or any other high-stakes context, the pattern to watch is this: any risk constraint, approval requirement, or operational limit that was introduced via natural language in an early part of a session is a candidate for silent drift.

Ghost term tracking is a fast, dependency-free diagnostic. It does not require access to model internals or session metadata. It measures what the agent is actually producing, not what it claims to remember.

compression-monitor on GitHub — open source, pip-installable
TradingAgents issue #479 — proposed hook location and working code
The Session Resume Problem — the same pattern in software engineering agents