Fixes

LLM & AI error fixes

Fast, practical fixes for the errors that block teams shipping AI — with the cause, the steps, and how to stop them recurring.

429 rate limit error
→
You exceeded the requests-per-minute (RPM) or tokens-per-minute (TPM) quota for your account tier.
OpenAI, Anthropic, and most LLM APIs
context length exceeded / maximum context length
→
The prompt plus requested output exceeds the model’s context window (measured in tokens).
All LLM providers
model returned invalid / unparseable JSON
→
The model produced free-form text or slightly malformed JSON instead of strictly valid, schema-conforming output.
Extraction and tool-use pipelines
request timeout on large or streaming responses
→
A non-streaming request with a high `max_tokens` exceeds the SDK/HTTP timeout before the full response returns.
Long generations and high max_tokens
401 authentication error / invalid API key
→
The API key is missing, malformed, revoked, or sent the wrong way (e.g. an OAuth token in the API-key header).
All LLM providers
model won’t call my tool / function calling not triggering
→
The tool description is vague, the model is being conservative, or the prompt doesn’t make clear when the tool should be used.
Tool use and agents
529 overloaded / service temporarily unavailable
→
The provider is temporarily overloaded; this is transient and retryable.
Anthropic and other providers under load
LLM API bill is unexpectedly high
→
Spend is invisible by route, so an expensive prompt, oversized context, or a background job is quietly driving most of the cost.
Production AI features
prompt caching not working / cache_read_input_tokens is zero
→
Something in the cached prefix changes every request — a timestamp, UUID, unsorted JSON, or a varying tool set — invalidating the cache.
Anthropic and other providers with prompt caching
AI generated code references a nonexistent API or package
→
The model generated a plausible but nonexistent function, library, or dependency (a hallucination).
AI coding agents
agent stuck in a loop / repeating the same tool call
→
The agent isn’t making progress — often a tool keeps failing the same way, or it lacks the context/feedback to change approach.
AI agents
LLM responses are too slow / high latency
→
Latency is driven by model choice, large input/output, high reasoning effort, and lack of streaming.
Production AI features
MCP server not connecting / tools not showing up
→
The server config, transport, or auth is misconfigured, so the client can’t reach or authenticate to the MCP server.
Claude Code, Cursor, and other MCP clients
model gives inconsistent / non-deterministic outputs
→
LLMs are inherently probabilistic; outputs vary across runs, and small prompt or model changes shift behavior.
All LLM providers
sensitive data leaking into prompts
→
Secrets, PII, or proprietary data are being sent to a model provider inside prompts, often unintentionally.
Teams using hosted LLM APIs
prompt injection / agent following malicious instructions
→
Content the agent ingests (a web page, a document, a ticket) contains instructions that hijack its behavior.
Agents that read external content and call tools

LLM & AI error fixes

429 rate limit error

context length exceeded / maximum context length

model returned invalid / unparseable JSON

request timeout on large or streaming responses

401 authentication error / invalid API key

model won’t call my tool / function calling not triggering

529 overloaded / service temporarily unavailable

LLM API bill is unexpectedly high

prompt caching not working / cache_read_input_tokens is zero

AI generated code references a nonexistent API or package

agent stuck in a loop / repeating the same tool call

LLM responses are too slow / high latency

MCP server not connecting / tools not showing up

model gives inconsistent / non-deterministic outputs

sensitive data leaking into prompts

prompt injection / agent following malicious instructions