A financial services company running autonomous trading agents burned through $12,000 in compute costs during a weekend maintenance window. Their agent entered a retry loop—each failure triggered another attempt with exponential backoff, but the loop detection logic had a flaw: it counted distinct error types, not total iterations. Three days and 47,000 failed API calls later, their bill told the story. Loop detection isn’t just about correctness; it’s about cost protection.
Key Takeaway
Infinite loops in LLM agents are silent budget killers. A loop that generates just 50 tokens per iteration can burn $500+ per hour with models like GPT-4o. Implement multi-layered detection: iteration counts, token budgets, and semantic similarity checks.
Agent loops occur when an LLM-powered system repeatedly executes the same or similar operations without making progress toward a goal. Unlike traditional software loops that are explicit and bounded, agent loops emerge from the non-deterministic nature of LLM reasoning and tool usage. The financial impact is immediate and severe.
Consider the cost structure of modern LLMs. Using the pricing data from our research:
Model Input Cost Output Cost Context Window Source Claude 3.5 Sonnet $3.00/1M tokens $15.00/1M tokens 200,000 tokens Anthropic GPT-4o $5.00/1M tokens $15.00/1M tokens 128,000 tokens OpenAI GPT-4o-mini $0.15/1M tokens $0.60/1M tokens 128,000 tokens OpenAI Haiku 3.5 $1.25/1M tokens $5.00/1M tokens 200,000 tokens Anthropic
Pricing data verified November-December 2024
A loop that generates just 100 output tokens per iteration and runs 10 iterations per minute costs:
GPT-4o : $0.015/minute = $90/hour
Claude 3.5 Sonnet : $0.015/minute = $90/hour
GPT-4o-mini : $0.0006/minute = $3.60/hour
At 1,000 iterations per hour (common in automated research agents), GPT-4o costs $150/hour. Scale that to a multi-agent system with 10 concurrent agents, and you’re looking at $1,500/hour in runaway costs.
Beyond direct costs, loops degrade user experience, consume rate limits, and can trigger cascading failures across your infrastructure. A single undetected loop can exhaust your API quota for the entire organization, blocking all other applications.
Agent loops manifest in several distinct patterns. Recognizing these patterns is the first step in implementing effective detection.
The agent calls a tool, receives an error, and attempts the same call again without modifying the parameters. This often happens when:
Tool parameters are invalid but the error message doesn’t provide corrective guidance
The agent misinterprets the error and retries with identical inputs
Network timeouts trigger automatic retries without exponential backoff caps
Cost signature : High API call volume with identical or near-identical tool parameters. Low token variation between iterations.
The agent reaches the same reasoning conclusion repeatedly, often due to:
Insufficient context causing the agent to “forget” previous attempts
Prompt structure that doesn’t include progress tracking
Conflicting instructions in system prompts
Cost signature : High token usage with similar output structure. The agent might rephrase the same conclusion multiple times.
The agent alternates between two or more states without progressing:
Action → Undo → Action → Undo
Query → Summarize → Query → Summarize
Cost signature : Periodic spikes in token usage. The agent appears productive but makes no forward progress.
The agent continuously adds to context without pruning, eventually hitting context limits and restarting the cycle:
RAG systems that append retrieved documents without deduplication
Conversation threads that grow unbounded
Multi-step planning that never discards intermediate steps
Cost signature : Input token count grows linearly with each iteration until context limit is reached, then resets.
Implementing robust loop detection requires a multi-layered approach. No single metric is sufficient.
The most basic protection—count total iterations and abort when thresholds are exceeded. Set a hard limit on the number of steps an agent can take per task. This is your first line of defense against runaway processes.
MAX_ITERATIONS = 50 # Hard stop
while agent.is_running():
if current_iteration > MAX_ITERATIONS:
raise LoopDetectedError("Maximum iterations exceeded")
However, simple counters alone are insufficient. A sophisticated agent might perform 49 useful steps and then enter a loop on step 50. You need more nuanced detection.
Track cumulative token usage and abort when costs exceed a threshold. This directly addresses the financial risk. Since we have verified pricing data, we can calculate exact costs.
def track_cost(usage, model):
input_cost = (usage.input_tokens / 1_000_000) * model.input_cost_per_1M
output_cost = (usage.output_tokens / 1_000_000) * model.output_cost_per_1M
return input_cost + output_cost
# 100 output tokens per iteration * 10 iterations/minute = $90/hour
# A $10 budget lasts ~6.6 minutes
Detect when the agent is repeating itself by comparing the semantic meaning of recent actions or outputs. This catches loops that iteration counters miss.
from difflib import SequenceMatcher
def is_semantically_similar(text1, text2, threshold=0.85):
return SequenceMatcher(None, text1, text2).ratio() > threshold
for output in agent.outputs:
recent_outputs.append(output.text)
if len(recent_outputs) > 3:
if len(recent_outputs) >= 2:
if all(is_semantically_similar(recent_outputs[i], recent_outputs[i+1])
for i in range(len(recent_outputs)-1)):
raise LoopDetectedError("Semantic repetition detected")
Explicitly require the agent to demonstrate progress. This can be implemented through:
State checksums : Hash the current state and compare with previous states
Progress markers : Agent must explicitly state what has changed
Goal distance : Calculate heuristic distance to objective
def record_state(self, state_hash):
self.state_history.append(state_hash)
if len(self.state_history) > 5:
self.state_history.pop(0)
# Check for state repetition
if len(set(self.state_history)) == 1:
raise LoopDetectedError("No state progression")
Building a production-ready loop detection system requires integrating these layers into your agent framework. Here’s a complete implementation pattern:
interface LoopDetectionConfig {
similarityThreshold: number;
stateHistorySize: number;
class ProductionLoopDetector {
private iterationCount = 0;
private cumulativeCost = 0;
private outputHistory: string[] = [];
private stateHistory: string[] = [];
constructor(private config: LoopDetectionConfig) {}
usage: { input_tokens: number; output_tokens: number },
// Layer 1: Iteration counter
if (this.iterationCount > this.config.maxIterations) {
throw new LoopDetectedError(
`Iteration limit exceeded: ${this.iterationCount}`
// Layer 2: Cost tracking
const iterationCost = this.calculateCost(usage);
this.cumulativeCost += iterationCost;
if (this.cumulativeCost > this.config.maxCostUSD) {
throw new CostLimitExceededError(
`Cost limit exceeded: ${this.cumulativeCost.toFixed(2)}`
// Layer 3: Semantic similarity
this.outputHistory.push(currentOutput);
if (this.outputHistory.length > 3) {
this.outputHistory.shift();
if (this.isRepetitive(this.outputHistory)) {
throw new LoopDetectedError("Semantic repetition detected");
// Layer 4: State progression
this.stateHistory.push(stateHash);
if (this.stateHistory.length > this.config.stateHistorySize) {
this.stateHistory.shift();
if (this.hasStateLoop(this.stateHistory)) {
throw new LoopDetectedError("State oscillation detected");
private calculateCost(usage: { input_tokens: number; output_tokens: number }): number {
const inputCost = (usage.input_tokens / 1_000_000) * this.config.modelPricing.inputPer1M;
const outputCost = (usage.output_tokens / 1_000_000) * this.config.modelPricing.outputPer1M;
return inputCost + outputCost;
private isRepetitive(outputs: string[]): boolean {
if (outputs.length < 2) return false;
for (let i = 0; i < outputs.length - 1; i++) {
const similarity = this.calculateSimilarity(outputs[i], outputs[i + 1]);
if (similarity > this.config.similarityThreshold) {
private calculateSimilarity(text1: string, text2: string): number {
const shorter = text1.length < text2.length ? text1 : text2;
const longer = text1.length < text2.length ? text2 : text1;
if (shorter.length === 0) return 1.0;
const longerSet = new Set(longer.split(' '));
for (const word of shorter.split(' ')) {
if (longerSet.has(word)) matches++;
return (2 * matches) / (shorter.length + longer.length);
private hasStateLoop(history: string[]): boolean {
if (history.length < 3) return false;
const uniqueStates = new Set(history);
return uniqueStates.size === 1;
const detector = new ProductionLoopDetector({
similarityThreshold: 0.85,
inputPer1M: 5.0, // GPT-4o
await detector.checkLoop(output, usage, stateHash);
console.error('Loop detected:', error.message);
// Implement graceful degradation
await emergencyShutdown();
If you’re using LangChain, CrewAI, or AutoGen, wrap their execution methods:
from langchain.agents import AgentExecutor
class LoopSafeAgentExecutor(AgentExecutor):
def __init__(self, *args, loop_detector=None, **kwargs):
super().__init__(*args, **kwargs)
self.loop_detector = loop_detector or ProductionLoopDetector()
result = super()._call(inputs)
# Check for loops after execution
self.loop_detector.check_loop(
Even well-designed loop detection can fail. Here are the most common mistakes:
Pitfall : Using only iteration counts or only token budgets.
Why it fails : A loop might stay under iteration limits but burn excessive tokens. Conversely, a legitimate complex task might exceed iteration limits.
Solution : Always use at least two detection layers. For production systems, implement all four.
Pitfall : Setting fixed iteration limits without considering task complexity.
Why it fails : A research agent analyzing 100 sources needs more iterations than a simple Q&A bot.
Solution : Make thresholds configurable per task type. Use dynamic budgets based on estimated complexity.
Pitfall : Not accounting for prompt caching when calculating costs.
Why it fails : Cached tokens cost significantly less. A loop that reuses cached context might be affordable, while your detector thinks it’s expensive.
Solution : Track cache creation and read tokens separately. Adjust cost calculations accordingly.
Pitfall : Treating all repetition as bad.
Why it fails : Some legitimate tasks involve iterative refinement (e.g., code debugging, creative writing).
Solution : Implement progressive warnings before hard stops. Allow manual override for trusted users.
Pitfall : Using naive state hashing that doesn’t capture meaningful changes.
Why it fails : The agent might change important state while the hash remains identical.
Solution : Use comprehensive state snapshots including all relevant variables, not just a single hash.
When a loop is detected, immediate action is required to prevent cost escalation.
Stop the agent : Immediately terminate the process
Log the incident : Capture full state for analysis
Alert operators : Notify relevant teams
Calculate damage : Determine costs incurred
Implement temporary fix : Deploy emergency guardrails
async function handleLoopDetection(error: LoopDetectedError): Promise<void> {
// 1. Immediate termination
await agent.emergencyStop();
// 2. Preserve forensic data
await saveIncidentReport({
timestamp: new Date().toISOString(),
metadata: error.metadata,
agentState: agent.getState(),
recentMessages: agent.messageHistory.slice(-10)
sendSlackAlert(`🚨 LOOP DETECTED: ${error.message}`),
createPagerDutyIncident(error.metadata),
logToDatadog('agent.loop.detected', error.metadata)
const cost = calculateCost(error.metadata);
await sendExecutiveAlert(`High-cost loop: $${cost.toFixed(2)}`);
// 5. Implement temporary fix
await updateEmergencyConfig({
maxIterations: Math.max(10, error.metadata.iterationCount - 5),
maxCostUSD: Math.max(5, cost * 0.5)
Effective loop detection requires continuous monitoring. Set up dashboards that track:
Iteration rate : Iterations per minute per agent
Cost accumulation : Real-time cost tracking
Semantic similarity : Average similarity scores
State progression : Unique states per hour
Loop detection events : Frequency and severity
Metric Warning Critical Emergency Iterations/hour 100 500 1000 Cost/hour $10 $50 $100 Avg similarity 0.75 0.85 0.95 State repetition 3 5 10
Use ML models to predict loops before they occur by analyzing:
Prompt patterns that historically lead to loops
Tool call sequences that indicate confusion
Token usage trends that suggest inefficiency
Dynamically adjust detection thresholds based on:
Task complexity scores
Historical performance of similar tasks
Current system load and cost constraints
Implement cascading circuit breakers that:
Reduce agent capabilities when costs approach limits
Switch to cheaper models for non-critical steps
Route tasks to human operators when loops are suspected
Loop detector (trace data → loop identification)
Interactive widget derived from “Loop Detection & Breaking: Stop Infinite Agent Loops” that lets readers explore loop detector (trace data → loop identification).
Key models to cover:
Anthropic claude-3-5-sonnet (tier: general) — refreshed 2024-11-15OpenAI gpt-4o-mini (tier: balanced) — refreshed 2024-10-10Anthropic haiku-3.5 (tier: throughput) — refreshed 2024-11-15Widget metrics to capture: user_selections, calculated_monthly_cost, comparison_delta.
Data sources: model-catalog.json, retrieved-pricing.
Here is a complete, production-ready loop detection module that you can integrate into any agent framework. It implements all four detection layers and includes emergency shutdown procedures.
export class LoopDetectedError extends Error {
constructor(message: string, public metadata: LoopMetadata) {
this.name = 'LoopDetectedError';
export class CostLimitExceededError extends Error {
constructor(message: string, public cumulativeCost: number) {
this.name = 'CostLimitExceededError';
inputPer1M: number; // USD per 1 million tokens
interface LoopDetectorConfig {
similarityThreshold: number;
stateHistorySize: number;
modelPricing: ModelPricing;
enableEmergencyAlerts: boolean;
export class ProductionLoopDetector {
private iterationCount = 0;
private cumulativeCost = 0;
private outputHistory: string[] = [];
private stateHistory: string[] = [];
private lastAlertTime = 0;
private readonly ALERT_COOLDOWN_MS = 60000; // 1 minute
constructor(private config: LoopDetectorConfig) {}
* Check for loops across all detection layers
usage: { input_tokens: number; output_tokens: number },
const metadata: LoopMetadata = {
iterationCount: this.iterationCount,
cumulativeCost: this.cumulativeCost,
lastOutput: currentOutput,
// Layer 1: Iteration counter
if (this.iterationCount > this.config.maxIterations) {
await this.handleDetection(
`Iteration limit exceeded: ${this.iterationCount}/${this.config.maxIterations}`,
// Layer 2: Cost tracking
const iterationCost = this.calculateCost(usage);
this.cumulativeCost += iterationCost;
if (this.cumulativeCost > this.config.maxCostUSD) {
await this.handleDetection(
new CostLimitExceededError(
`Cost limit exceeded: ${this.cumulativeCost.toFixed(2)}/${this.config.maxCostUSD}`,
// Layer 3: Semantic similarity
this.outputHistory.push(currentOutput);
if (this.outputHistory.length > 3) {
this.outputHistory.shift();
if (this.isRepetitive(this.outputHistory)) {
await this.handleDetection(
"Semantic repetition detected in recent outputs",
// Layer 4: State progression
this.stateHistory.push(stateHash);
if (this.stateHistory.length > this.config.stateHistorySize) {
this.stateHistory.shift();
if (this.hasStateLoop(this.stateHistory)) {
await this.handleDetection(
"State oscillation detected: no progression in last 5 states",
* Calculate exact cost in USD for a single iteration
private calculateCost(usage: { input_tokens: number; output_tokens: number }): number {
const inputCost = (usage.input_tokens / 1_000_000) * this.config.modelPricing.inputPer1M;
const outputCost = (usage.output_tokens / 1_000_000) * this.config.modelPricing.outputPer1M;
return inputCost + outputCost;
* Check if recent outputs are semantically similar
private isRepetitive(outputs: string[]): boolean {
if (outputs.length < 2) return false;
for (let i = 0; i < outputs.length - 1; i++) {
const similarity = this.calculateSimilarity(outputs[i], outputs[i + 1]);
if (similarity > this.config.similarityThreshold) {
* Simple Jaccard similarity for text comparison
private calculateSimilarity(text1: string, text2: string): number {
const shorter = text1.length < text2.length ? text1 : text2;
const longer = text1.length < text2.length ? text2 : text1;
if (shorter.length === 0) return 1.0;
const shorterSet = new Set(shorter.toLowerCase().split(/\s+/));
const longerSet = new Set(longer.toLowerCase().split(/\s+/));
const intersection = new Set([...shorterSet].filter(x => longerSet.has(x)));
const union = new Set([...shorterSet, ...longerSet]);
return intersection.size / union.size;
* Check for state repetition
private hasStateLoop(history: string[]): boolean {
if (history.length < 3) return false;
const uniqueStates = new Set(history);
return uniqueStates.size === 1;
* Handle detection: alert, log, and potentially shutdown
private async handleDetection(error: Error): Promise<void> {
console.error('[LOOP_DETECTION]', {
timestamp: new Date().toISOString(),
metadata: (error as LoopDetectedError).metadata,
cumulativeCost: this.cumulativeCost,
iterations: this.iterationCount
// Send alerts (with cooldown)
if (this.config.enableEmergencyAlerts &&
(now - this.lastAlertTime) > this.ALERT_COOLDOWN_MS) {
await this.sendEmergencyAlert(error);
this.lastAlertTime = now;
// Re-throw for upstream handling
* Send emergency alert to operators
private async sendEmergencyAlert(error: Error): Promise<void> {
// Integrate with your alerting system (PagerDuty, Slack, etc.)
// Example: await pagerduty.triggerIncident(error.message);
console.warn('🚨 EMERGENCY ALERT:', error.message);
* Get current metrics for monitoring
iterations: this.iterationCount,
cumulativeCost: this.cumulativeCost,
costRemaining: this.config.maxCostUSD - this.cumulativeCost,
iterationsRemaining: this.config.maxIterations - this.iterationCount,
outputHistoryLength: this.outputHistory.length,
stateHistoryLength: this.stateHistory.length
* Reset detector for new task
// Pre-configured detectors for common models
export const detectors = {
gpt4o: new ProductionLoopDetector({
similarityThreshold: 0.85,
modelPricing: { inputPer1M: 5.0, outputPer1M: 15.0 },
enableEmergencyAlerts: true
gpt4oMini: new ProductionLoopDetector({
similarityThreshold: 0.85,
modelPricing: { inputPer1M: 0.15, outputPer1M: 0.60 },
enableEmergencyAlerts: true
claudeSonnet: new ProductionLoopDetector({
similarityThreshold: 0.85,
modelPricing: { inputPer1M: 3.0, outputPer1M: 15.0 },
enableEmergencyAlerts: true
export async function runAgentWithLoopDetection(
detector: ProductionLoopDetector,
const startTime = Date.now();
while (agent.isRunning()) {
const elapsed = (Date.now() - startTime) / 60000;
if (elapsed > maxRuntimeMinutes) {
throw new Error(`Runtime exceeded ${maxRuntimeMinutes} minutes`);
const result = await agent.step();
await detector.checkLoop(
console.log('Step completed:', detector.getMetrics());
return agent.getResult();
if (error instanceof LoopDetectedError) {
await agent.emergencyStop();
await logIncident(error);
Use this checklist when implementing loop detection in production:
Loop detection is not optional for production AI agents—it’s a critical safety and cost control mechanism. The four-layer approach described here provides comprehensive protection against the most common loop patterns while minimizing false positives.
Start with simple iteration counters and cost budgets, then add semantic similarity and state progression checks as your system matures. The production-ready module provided here gives you a solid foundation that can be adapted to any agent framework.
Remember: the goal isn’t perfect loop prevention—it’s early detection and graceful degradation. A loop that’s caught after 5 iterations is far better than one that runs for 500.
The financial and operational risks of uncontrolled loops far outweigh the implementation cost of robust detection. Your future self—and your finance team—will thank you.