Loop Detection & Breaking: Stop Infinite Agent Loops

A financial services company running autonomous trading agents burned through $12,000 in compute costs during a weekend maintenance window. Their agent entered a retry loop—each failure triggered another attempt with exponential backoff, but the loop detection logic had a flaw: it counted distinct error types, not total iterations. Three days and 47,000 failed API calls later, their bill told the story. Loop detection isn’t just about correctness; it’s about cost protection.

Why This Matters

Agent loops occur when an LLM-powered system repeatedly executes the same or similar operations without making progress toward a goal. Unlike traditional software loops that are explicit and bounded, agent loops emerge from the non-deterministic nature of LLM reasoning and tool usage. The financial impact is immediate and severe.

Consider the cost structure of modern LLMs. Using the pricing data from our research:

Model	Input Cost	Output Cost	Context Window	Source
Claude 3.5 Sonnet	$3.00/1M tokens	$15.00/1M tokens	200,000 tokens	Anthropic
GPT-4o	$5.00/1M tokens	$15.00/1M tokens	128,000 tokens	OpenAI
GPT-4o-mini	$0.15/1M tokens	$0.60/1M tokens	128,000 tokens	OpenAI
Haiku 3.5	$1.25/1M tokens	$5.00/1M tokens	200,000 tokens	Anthropic

Pricing data verified November-December 2024

A loop that generates just 100 output tokens per iteration and runs 10 iterations per minute costs:

GPT-4o: $0.015/minute = $90/hour
Claude 3.5 Sonnet: $0.015/minute = $90/hour
GPT-4o-mini: $0.0006/minute = $3.60/hour

At 1,000 iterations per hour (common in automated research agents), GPT-4o costs $150/hour. Scale that to a multi-agent system with 10 concurrent agents, and you’re looking at $1,500/hour in runaway costs.

Beyond direct costs, loops degrade user experience, consume rate limits, and can trigger cascading failures across your infrastructure. A single undetected loop can exhaust your API quota for the entire organization, blocking all other applications.

Understanding Agent Loop Patterns

Agent loops manifest in several distinct patterns. Recognizing these patterns is the first step in implementing effective detection.

Pattern 1: Tool Retry Loops

The agent calls a tool, receives an error, and attempts the same call again without modifying the parameters. This often happens when:

Tool parameters are invalid but the error message doesn’t provide corrective guidance
The agent misinterprets the error and retries with identical inputs
Network timeouts trigger automatic retries without exponential backoff caps

Cost signature: High API call volume with identical or near-identical tool parameters. Low token variation between iterations.

Pattern 2: Reasoning Cycles

The agent reaches the same reasoning conclusion repeatedly, often due to:

Insufficient context causing the agent to “forget” previous attempts
Prompt structure that doesn’t include progress tracking
Conflicting instructions in system prompts

Cost signature: High token usage with similar output structure. The agent might rephrase the same conclusion multiple times.

Pattern 3: State Oscillation

The agent alternates between two or more states without progressing:

Action → Undo → Action → Undo
Query → Summarize → Query → Summarize

Cost signature: Periodic spikes in token usage. The agent appears productive but makes no forward progress.

Pattern 4: Context Overflow Loops

The agent continuously adds to context without pruning, eventually hitting context limits and restarting the cycle:

RAG systems that append retrieved documents without deduplication
Conversation threads that grow unbounded
Multi-step planning that never discards intermediate steps

Cost signature: Input token count grows linearly with each iteration until context limit is reached, then resets.

Loop Detection Architecture

Implementing robust loop detection requires a multi-layered approach. No single metric is sufficient.

Layer 1: Iteration Counters

The most basic protection—count total iterations and abort when thresholds are exceeded. Set a hard limit on the number of steps an agent can take per task. This is your first line of defense against runaway processes.

MAX_ITERATIONS = 50  # Hard stop
current_iteration = 0

while agent.is_running():
    current_iteration += 1
    if current_iteration > MAX_ITERATIONS:
        raise LoopDetectedError("Maximum iterations exceeded")
    # Agent logic here

However, simple counters alone are insufficient. A sophisticated agent might perform 49 useful steps and then enter a loop on step 50. You need more nuanced detection.

Layer 2: Token Budgets

Track cumulative token usage and abort when costs exceed a threshold. This directly addresses the financial risk. Since we have verified pricing data, we can calculate exact costs.

MAX_COST_USD = 10.00
cumulative_cost = 0.0

def track_cost(usage, model):
    input_cost = (usage.input_tokens / 1_000_000) * model.input_cost_per_1M
    output_cost = (usage.output_tokens / 1_000_000) * model.output_cost_per_1M
    return input_cost + output_cost

# Example: GPT-4o budget
# 100 output tokens per iteration * 10 iterations/minute = $90/hour
# A $10 budget lasts ~6.6 minutes

Layer 3: Semantic Similarity Checks

Detect when the agent is repeating itself by comparing the semantic meaning of recent actions or outputs. This catches loops that iteration counters miss.

from difflib import SequenceMatcher

def is_semantically_similar(text1, text2, threshold=0.85):
    return SequenceMatcher(None, text1, text2).ratio() > threshold

# Track last 3 outputs
recent_outputs = []
for output in agent.outputs:
    recent_outputs.append(output.text)
    if len(recent_outputs) > 3:
        recent_outputs.pop(0)

    if len(recent_outputs) >= 2:
        if all(is_semantically_similar(recent_outputs[i], recent_outputs[i+1])
               for i in range(len(recent_outputs)-1)):
            raise LoopDetectedError("Semantic repetition detected")

Layer 4: Progress Tracking

Explicitly require the agent to demonstrate progress. This can be implemented through:

State checksums: Hash the current state and compare with previous states
Progress markers: Agent must explicitly state what has changed
Goal distance: Calculate heuristic distance to objective

class ProgressTracker:
    def __init__(self):
        self.state_history = []

    def record_state(self, state_hash):
        self.state_history.append(state_hash)
        if len(self.state_history) > 5:
            self.state_history.pop(0)

        # Check for state repetition
        if len(set(self.state_history)) == 1:
            raise LoopDetectedError("No state progression")

Practical Implementation

Building a production-ready loop detection system requires integrating these layers into your agent framework. Here’s a complete implementation pattern:

interface LoopDetectionConfig {
  maxIterations: number;
  maxCostUSD: number;
  similarityThreshold: number;
  stateHistorySize: number;
  modelPricing: {
    inputPer1M: number;
    outputPer1M: number;
  };
}

class ProductionLoopDetector {
  private iterationCount = 0;
  private cumulativeCost = 0;
  private outputHistory: string[] = [];
  private stateHistory: string[] = [];

  constructor(private config: LoopDetectionConfig) {}

  async checkLoop(
    currentOutput: string,
    usage: { input_tokens: number; output_tokens: number },
    stateHash: string
  ): Promise<void> {
    // Layer 1: Iteration counter
    this.iterationCount++;
    if (this.iterationCount > this.config.maxIterations) {
      throw new LoopDetectedError(
        `Iteration limit exceeded: ${this.iterationCount}`
      );
    }

    // Layer 2: Cost tracking
    const iterationCost = this.calculateCost(usage);
    this.cumulativeCost += iterationCost;
    if (this.cumulativeCost > this.config.maxCostUSD) {
      throw new CostLimitExceededError(
        `Cost limit exceeded: ${this.cumulativeCost.toFixed(2)}`
      );
    }

    // Layer 3: Semantic similarity
    this.outputHistory.push(currentOutput);
    if (this.outputHistory.length > 3) {
      this.outputHistory.shift();
    }
    if (this.isRepetitive(this.outputHistory)) {
      throw new LoopDetectedError("Semantic repetition detected");
    }

    // Layer 4: State progression
    this.stateHistory.push(stateHash);
    if (this.stateHistory.length > this.config.stateHistorySize) {
      this.stateHistory.shift();
    }
    if (this.hasStateLoop(this.stateHistory)) {
      throw new LoopDetectedError("State oscillation detected");
    }
  }

  private calculateCost(usage: { input_tokens: number; output_tokens: number }): number {
    const inputCost = (usage.input_tokens / 1_000_000) * this.config.modelPricing.inputPer1M;
    const outputCost = (usage.output_tokens / 1_000_000) * this.config.modelPricing.outputPer1M;
    return inputCost + outputCost;
  }

  private isRepetitive(outputs: string[]): boolean {
    if (outputs.length < 2) return false;

    for (let i = 0; i < outputs.length - 1; i++) {
      const similarity = this.calculateSimilarity(outputs[i], outputs[i + 1]);
      if (similarity > this.config.similarityThreshold) {
        return true;
      }
    }
    return false;
  }

  private calculateSimilarity(text1: string, text2: string): number {
    const shorter = text1.length < text2.length ? text1 : text2;
    const longer = text1.length < text2.length ? text2 : text1;

    if (shorter.length === 0) return 1.0;

    let matches = 0;
    const longerSet = new Set(longer.split(' '));

    for (const word of shorter.split(' ')) {
      if (longerSet.has(word)) matches++;
    }

    return (2 * matches) / (shorter.length + longer.length);
  }

  private hasStateLoop(history: string[]): boolean {
    if (history.length < 3) return false;
    const uniqueStates = new Set(history);
    return uniqueStates.size === 1;
  }
}

// Usage example
const detector = new ProductionLoopDetector({
  maxIterations: 50,
  maxCostUSD: 10.0,
  similarityThreshold: 0.85,
  stateHistorySize: 5,
  modelPricing: {
    inputPer1M: 5.0,  // GPT-4o
    outputPer1M: 15.0
  }
});

// In your agent loop
try {
  await detector.checkLoop(output, usage, stateHash);
} catch (error) {
  // Log the incident
  console.error('Loop detected:', error.message);
  // Implement graceful degradation
  await emergencyShutdown();
  // Alert operators
  await alertTeam(error);
}

Integration with Existing Frameworks

If you’re using LangChain, CrewAI, or AutoGen, wrap their execution methods:

# LangChain wrapper
from langchain.agents import AgentExecutor

class LoopSafeAgentExecutor(AgentExecutor):
    def __init__(self, *args, loop_detector=None, **kwargs):
        super().__init__(*args, **kwargs)
        self.loop_detector = loop_detector or ProductionLoopDetector()

    def _call(self, inputs):
        result = super()._call(inputs)
        # Check for loops after execution
        self.loop_detector.check_loop(
            result['output'],
            result['usage'],
            self._get_state_hash()
        )
        return result

Common Pitfalls

Even well-designed loop detection can fail. Here are the most common mistakes:

1. Over-Reliance on Single Metrics

Pitfall: Using only iteration counts or only token budgets.

Why it fails: A loop might stay under iteration limits but burn excessive tokens. Conversely, a legitimate complex task might exceed iteration limits.

Solution: Always use at least two detection layers. For production systems, implement all four.

2. Hardcoded Thresholds

Pitfall: Setting fixed iteration limits without considering task complexity.

Why it fails: A research agent analyzing 100 sources needs more iterations than a simple Q&A bot.

Solution: Make thresholds configurable per task type. Use dynamic budgets based on estimated complexity.

3. Ignoring Cache Effects

Pitfall: Not accounting for prompt caching when calculating costs.

Why it fails: Cached tokens cost significantly less. A loop that reuses cached context might be affordable, while your detector thinks it’s expensive.

Solution: Track cache creation and read tokens separately. Adjust cost calculations accordingly.

4. False Positives on Legitimate Loops

Pitfall: Treating all repetition as bad.

Why it fails: Some legitimate tasks involve iterative refinement (e.g., code debugging, creative writing).

Solution: Implement progressive warnings before hard stops. Allow manual override for trusted users.

5. State Hash Collisions

Pitfall: Using naive state hashing that doesn’t capture meaningful changes.

Why it fails: The agent might change important state while the hash remains identical.

Solution: Use comprehensive state snapshots including all relevant variables, not just a single hash.

Emergency Response Procedures

When a loop is detected, immediate action is required to prevent cost escalation.

Immediate Actions

Stop the agent: Immediately terminate the process
Log the incident: Capture full state for analysis
Alert operators: Notify relevant teams
Calculate damage: Determine costs incurred
Implement temporary fix: Deploy emergency guardrails

Recovery Steps

async function handleLoopDetection(error: LoopDetectedError): Promise<void> {
  // 1. Immediate termination
  await agent.emergencyStop();

  // 2. Preserve forensic data
  await saveIncidentReport({
    timestamp: new Date().toISOString(),
    error: error.message,
    metadata: error.metadata,
    agentState: agent.getState(),
    recentMessages: agent.messageHistory.slice(-10)
  });

  // 3. Alert operators
  await Promise.all([
    sendSlackAlert(`🚨 LOOP DETECTED: ${error.message}`),
    createPagerDutyIncident(error.metadata),
    logToDatadog('agent.loop.detected', error.metadata)
  ]);

  // 4. Calculate costs
  const cost = calculateCost(error.metadata);
  if (cost > 100) {
    await sendExecutiveAlert(`High-cost loop: $${cost.toFixed(2)}`);
  }

  // 5. Implement temporary fix
  await updateEmergencyConfig({
    maxIterations: Math.max(10, error.metadata.iterationCount - 5),
    maxCostUSD: Math.max(5, cost * 0.5)
  });
}

Monitoring and Alerting

Effective loop detection requires continuous monitoring. Set up dashboards that track:

Iteration rate: Iterations per minute per agent
Cost accumulation: Real-time cost tracking
Semantic similarity: Average similarity scores
State progression: Unique states per hour
Loop detection events: Frequency and severity

Alert Thresholds

Metric	Warning	Critical	Emergency
Iterations/hour	100	500	1000
Cost/hour	$10	$50	$100
Avg similarity	0.75	0.85	0.95
State repetition	3	5	10

Advanced Techniques

Predictive Loop Detection

Use ML models to predict loops before they occur by analyzing:

Prompt patterns that historically lead to loops
Tool call sequences that indicate confusion
Token usage trends that suggest inefficiency

Adaptive Thresholds

Dynamically adjust detection thresholds based on:

Task complexity scores
Historical performance of similar tasks
Current system load and cost constraints

Circuit Breakers

Implement cascading circuit breakers that:

Reduce agent capabilities when costs approach limits
Switch to cheaper models for non-critical steps
Route tasks to human operators when loops are suspected

Loop detector (trace data → loop identification)

Interactive widget derived from “Loop Detection & Breaking: Stop Infinite Agent Loops” that lets readers explore loop detector (trace data → loop identification).

Key models to cover:

Anthropic claude-3-5-sonnet (tier: general) — refreshed 2024-11-15
OpenAI gpt-4o-mini (tier: balanced) — refreshed 2024-10-10
Anthropic haiku-3.5 (tier: throughput) — refreshed 2024-11-15

Widget metrics to capture: user_selections, calculated_monthly_cost, comparison_delta.

Data sources: model-catalog.json, retrieved-pricing.

Production-Ready Loop Detector Module

Here is a complete, production-ready loop detection module that you can integrate into any agent framework. It implements all four detection layers and includes emergency shutdown procedures.

export class LoopDetectedError extends Error {
  constructor(message: string, public metadata: LoopMetadata) {
    super(message);
    this.name = 'LoopDetectedError';
  }
}

export class CostLimitExceededError extends Error {
  constructor(message: string, public cumulativeCost: number) {
    super(message);
    this.name = 'CostLimitExceededError';
  }
}

interface LoopMetadata {
  iterationCount: number;
  cumulativeCost: number;
  lastOutput: string;
  stateHash: string;
}

interface ModelPricing {
  inputPer1M: number;  // USD per 1 million tokens
  outputPer1M: number;
}

interface LoopDetectorConfig {
  maxIterations: number;
  maxCostUSD: number;
  similarityThreshold: number;
  stateHistorySize: number;
  modelPricing: ModelPricing;
  enableEmergencyAlerts: boolean;
}

export class ProductionLoopDetector {
  private iterationCount = 0;
  private cumulativeCost = 0;
  private outputHistory: string[] = [];
  private stateHistory: string[] = [];
  private lastAlertTime = 0;
  private readonly ALERT_COOLDOWN_MS = 60000; // 1 minute

  constructor(private config: LoopDetectorConfig) {}

  /**
   * Check for loops across all detection layers
   */
  async checkLoop(
    currentOutput: string,
    usage: { input_tokens: number; output_tokens: number },
    stateHash: string
  ): Promise<void> {
    const metadata: LoopMetadata = {
      iterationCount: this.iterationCount,
      cumulativeCost: this.cumulativeCost,
      lastOutput: currentOutput,
      stateHash
    };

    // Layer 1: Iteration counter
    this.iterationCount++;
    if (this.iterationCount > this.config.maxIterations) {
      await this.handleDetection(
        new LoopDetectedError(
          `Iteration limit exceeded: ${this.iterationCount}/${this.config.maxIterations}`,
          metadata
        )
      );
    }

    // Layer 2: Cost tracking
    const iterationCost = this.calculateCost(usage);
    this.cumulativeCost += iterationCost;
    if (this.cumulativeCost > this.config.maxCostUSD) {
      await this.handleDetection(
        new CostLimitExceededError(
          `Cost limit exceeded: ${this.cumulativeCost.toFixed(2)}/${this.config.maxCostUSD}`,
          this.cumulativeCost
        )
      );
    }

    // Layer 3: Semantic similarity
    this.outputHistory.push(currentOutput);
    if (this.outputHistory.length > 3) {
      this.outputHistory.shift();
    }
    if (this.isRepetitive(this.outputHistory)) {
      await this.handleDetection(
        new LoopDetectedError(
          "Semantic repetition detected in recent outputs",
          metadata
        )
      );
    }

    // Layer 4: State progression
    this.stateHistory.push(stateHash);
    if (this.stateHistory.length > this.config.stateHistorySize) {
      this.stateHistory.shift();
    }
    if (this.hasStateLoop(this.stateHistory)) {
      await this.handleDetection(
        new LoopDetectedError(
          "State oscillation detected: no progression in last 5 states",
          metadata
        )
      );
    }
  }

  /**
   * Calculate exact cost in USD for a single iteration
   */
  private calculateCost(usage: { input_tokens: number; output_tokens: number }): number {
    const inputCost = (usage.input_tokens / 1_000_000) * this.config.modelPricing.inputPer1M;
    const outputCost = (usage.output_tokens / 1_000_000) * this.config.modelPricing.outputPer1M;
    return inputCost + outputCost;
  }

  /**
   * Check if recent outputs are semantically similar
   */
  private isRepetitive(outputs: string[]): boolean {
    if (outputs.length < 2) return false;

    for (let i = 0; i < outputs.length - 1; i++) {
      const similarity = this.calculateSimilarity(outputs[i], outputs[i + 1]);
      if (similarity > this.config.similarityThreshold) {
        return true;
      }
    }
    return false;
  }

  /**
   * Simple Jaccard similarity for text comparison
   */
  private calculateSimilarity(text1: string, text2: string): number {
    const shorter = text1.length < text2.length ? text1 : text2;
    const longer = text1.length < text2.length ? text2 : text1;

    if (shorter.length === 0) return 1.0;

    const shorterSet = new Set(shorter.toLowerCase().split(/\s+/));
    const longerSet = new Set(longer.toLowerCase().split(/\s+/));

    const intersection = new Set([...shorterSet].filter(x => longerSet.has(x)));
    const union = new Set([...shorterSet, ...longerSet]);

    return intersection.size / union.size;
  }

  /**
   * Check for state repetition
   */
  private hasStateLoop(history: string[]): boolean {
    if (history.length < 3) return false;
    const uniqueStates = new Set(history);
    return uniqueStates.size === 1;
  }

  /**
   * Handle detection: alert, log, and potentially shutdown
   */
  private async handleDetection(error: Error): Promise<void> {
    const now = Date.now();

    // Log the incident
    console.error('[LOOP_DETECTION]', {
      timestamp: new Date().toISOString(),
      error: error.name,
      message: error.message,
      metadata: (error as LoopDetectedError).metadata,
      cumulativeCost: this.cumulativeCost,
      iterations: this.iterationCount
    });

    // Send alerts (with cooldown)
    if (this.config.enableEmergencyAlerts &&
        (now - this.lastAlertTime) > this.ALERT_COOLDOWN_MS) {
      await this.sendEmergencyAlert(error);
      this.lastAlertTime = now;
    }

    // Re-throw for upstream handling
    throw error;
  }

  /**
   * Send emergency alert to operators
   */
  private async sendEmergencyAlert(error: Error): Promise<void> {
    // Integrate with your alerting system (PagerDuty, Slack, etc.)
    // Example: await pagerduty.triggerIncident(error.message);
    console.warn('🚨 EMERGENCY ALERT:', error.message);
  }

  /**
   * Get current metrics for monitoring
   */
  public getMetrics() {
    return {
      iterations: this.iterationCount,
      cumulativeCost: this.cumulativeCost,
      costRemaining: this.config.maxCostUSD - this.cumulativeCost,
      iterationsRemaining: this.config.maxIterations - this.iterationCount,
      outputHistoryLength: this.outputHistory.length,
      stateHistoryLength: this.stateHistory.length
    };
  }

  /**
   * Reset detector for new task
   */
  public reset(): void {
    this.iterationCount = 0;
    this.cumulativeCost = 0;
    this.outputHistory = [];
    this.stateHistory = [];
  }
}

// Pre-configured detectors for common models
export const detectors = {
  gpt4o: new ProductionLoopDetector({
    maxIterations: 50,
    maxCostUSD: 10.0,
    similarityThreshold: 0.85,
    stateHistorySize: 5,
    modelPricing: { inputPer1M: 5.0, outputPer1M: 15.0 },
    enableEmergencyAlerts: true
  }),

  gpt4oMini: new ProductionLoopDetector({
    maxIterations: 100,
    maxCostUSD: 5.0,
    similarityThreshold: 0.85,
    stateHistorySize: 5,
    modelPricing: { inputPer1M: 0.15, outputPer1M: 0.60 },
    enableEmergencyAlerts: true
  }),

  claudeSonnet: new ProductionLoopDetector({
    maxIterations: 50,
    maxCostUSD: 10.0,
    similarityThreshold: 0.85,
    stateHistorySize: 5,
    modelPricing: { inputPer1M: 3.0, outputPer1M: 15.0 },
    enableEmergencyAlerts: true
  })
};

// Usage in agent loop
export async function runAgentWithLoopDetection(
  agent: any,
  detector: ProductionLoopDetector,
  maxRuntimeMinutes = 10
): Promise<any> {
  const startTime = Date.now();

  try {
    while (agent.isRunning()) {
      // Check runtime limit
      const elapsed = (Date.now() - startTime) / 60000;
      if (elapsed > maxRuntimeMinutes) {
        throw new Error(`Runtime exceeded ${maxRuntimeMinutes} minutes`);
      }

      // Execute agent step
      const result = await agent.step();

      // Check for loops
      await detector.checkLoop(
        result.output,
        result.usage,
        agent.getStateHash()
      );

      // Log progress
      console.log('Step completed:', detector.getMetrics());
    }

    return agent.getResult();
  } catch (error) {
    if (error instanceof LoopDetectedError) {
      // Emergency shutdown
      await agent.emergencyStop();
      await logIncident(error);
      throw error;
    }
    throw error;
  }
}

Integration Checklist

Use this checklist when implementing loop detection in production:

Conclusion

Loop detection is not optional for production AI agents—it’s a critical safety and cost control mechanism. The four-layer approach described here provides comprehensive protection against the most common loop patterns while minimizing false positives.

Start with simple iteration counters and cost budgets, then add semantic similarity and state progression checks as your system matures. The production-ready module provided here gives you a solid foundation that can be adapted to any agent framework.

Remember: the goal isn’t perfect loop prevention—it’s early detection and graceful degradation. A loop that’s caught after 5 iterations is far better than one that runs for 500.

The financial and operational risks of uncontrolled loops far outweigh the implementation cost of robust detection. Your future self—and your finance team—will thank you.