How Traxo Detects 6 Common LLM Anti-Patterns Automatically

Production LLM systems develop bad habits fast. Unlike traditional software bugs that crash loudly, AI anti-patterns degrade performance and inflate costs silently. Your application keeps returning 200 status codes while the underlying behavior drifts further from optimal.

Traxo's AI analyzer runs hourly against your ingested event data and flags six categories of anti-patterns automatically. Here is what each one means and why it matters.

1. Context stuffing

Context stuffing occurs when applications send increasingly large prompts -- usually by appending conversation history, RAG context, or system instructions without pruning. The symptom is a steady upward trend in average input token count. The cost impact is linear: twice the input tokens means twice the input cost.

Traxo detects this by analyzing the input token distribution over rolling windows. If the 90th percentile input size has grown significantly over the past 24 hours compared to the prior week, you get an alert with the exact magnitude of the drift.

2. Model overspend

Model overspend means using an expensive model for tasks that a cheaper one handles equally well. The classic example is routing every request through GPT-4 or Claude Opus when 80% of your traffic is simple classification or extraction that GPT-3.5 or Haiku handles at a fraction of the cost.

Traxo flags this when it detects that a high-cost model is handling a large volume of requests with consistently short outputs -- a strong signal that the task complexity does not justify the model tier.

3. Excessive retries

Retry storms happen when your application retries failed LLM calls without proper backoff. Each retry burns tokens (if the request reaches the model) and can cascade into rate limit errors that affect your entire application. Traxo tracks error rates and request patterns to identify retry loops, distinguishing between healthy retry behavior and runaway loops.

4. Prompt regression

A prompt regression is a change in prompt content that degrades output quality or increases cost without a corresponding improvement. Traxo detects this by monitoring the relationship between prompt changes (inferred from input token distribution shifts) and output quality signals like response length variance, error rate changes, and latency shifts.

5. Latency degradation

LLM latency is highly variable, but sustained increases in P95 response time often indicate a problem -- either on the provider side or because your prompts have grown. Traxo tracks latency percentiles per model and endpoint, alerting when degradation exceeds your configured thresholds.

6. Token waste

Token waste covers patterns where tokens are consumed without productive output: duplicate requests within short windows, requests where the vast majority of output is discarded by the application, and calls that consistently return truncated responses because of max_tokens limits set too low.

From detection to action

Each detected anti-pattern appears as an insight on your Traxo dashboard with a severity level, affected monitor, time window, and a plain-language description of what was found. The goal is not just to flag problems but to give you enough context to fix them quickly -- before they compound into serious cost or reliability issues.