Fixes
LLM & AI error fixes
Fast, practical fixes for the errors that block teams shipping AI — with the cause, the steps, and how to stop them recurring.
429 rate limit error
→You exceeded the requests-per-minute (RPM) or tokens-per-minute (TPM) quota for your account tier.
OpenAI, Anthropic, and most LLM APIs
context length exceeded / maximum context length
→The prompt plus requested output exceeds the model’s context window (measured in tokens).
All LLM providers
model returned invalid / unparseable JSON
→The model produced free-form text or slightly malformed JSON instead of strictly valid, schema-conforming output.
Extraction and tool-use pipelines
request timeout on large or streaming responses
→A non-streaming request with a high `max_tokens` exceeds the SDK/HTTP timeout before the full response returns.
Long generations and high max_tokens
401 authentication error / invalid API key
→The API key is missing, malformed, revoked, or sent the wrong way (e.g. an OAuth token in the API-key header).
All LLM providers
model won’t call my tool / function calling not triggering
→The tool description is vague, the model is being conservative, or the prompt doesn’t make clear when the tool should be used.
Tool use and agents
529 overloaded / service temporarily unavailable
→The provider is temporarily overloaded; this is transient and retryable.
Anthropic and other providers under load
LLM API bill is unexpectedly high
→Spend is invisible by route, so an expensive prompt, oversized context, or a background job is quietly driving most of the cost.
Production AI features
prompt caching not working / cache_read_input_tokens is zero
→Something in the cached prefix changes every request — a timestamp, UUID, unsorted JSON, or a varying tool set — invalidating the cache.
Anthropic and other providers with prompt caching
AI generated code references a nonexistent API or package
→The model generated a plausible but nonexistent function, library, or dependency (a hallucination).
AI coding agents
agent stuck in a loop / repeating the same tool call
→The agent isn’t making progress — often a tool keeps failing the same way, or it lacks the context/feedback to change approach.
AI agents
LLM responses are too slow / high latency
→Latency is driven by model choice, large input/output, high reasoning effort, and lack of streaming.
Production AI features
MCP server not connecting / tools not showing up
→The server config, transport, or auth is misconfigured, so the client can’t reach or authenticate to the MCP server.
Claude Code, Cursor, and other MCP clients
model gives inconsistent / non-deterministic outputs
→LLMs are inherently probabilistic; outputs vary across runs, and small prompt or model changes shift behavior.
All LLM providers
sensitive data leaking into prompts
→Secrets, PII, or proprietary data are being sent to a model provider inside prompts, often unintentionally.
Teams using hosted LLM APIs
prompt injection / agent following malicious instructions
→Content the agent ingests (a web page, a document, a ticket) contains instructions that hijack its behavior.
Agents that read external content and call tools