Skip to content
Agent Month

How to fix: context length exceeded / maximum context length

Cause

The prompt plus requested output exceeds the model’s context window (measured in tokens).

The fix

  1. 1Count tokens with the target model’s tokenizer — not a word count or another model’s tokenizer — to see how far over you are.
  2. 2Trim the input: remove redundant context, summarize long history, or retrieve only the most relevant chunks (RAG) instead of stuffing everything.
  3. 3Lower `max_tokens` if you reserved more output room than you need — output counts against the window.
  4. 4For long conversations, enable compaction or context editing so older turns are summarized or cleared automatically.
  5. 5If you genuinely need more room, switch to a model with a larger context window.

Prevent it

Design retrieval and compaction in from the start so requests stay well under the limit regardless of conversation length.

Frequently asked questions

What causes “context length exceeded / maximum context length”?

The prompt plus requested output exceeds the model’s context window (measured in tokens).

How do I prevent “context length exceeded / maximum context length” from recurring?

Design retrieval and compaction in from the start so requests stay well under the limit regardless of conversation length.