How to fix: context length exceeded / maximum context length

Cause

The prompt plus requested output exceeds the model’s context window (measured in tokens).

The fix

1Count tokens with the target model’s tokenizer — not a word count or another model’s tokenizer — to see how far over you are.
2Trim the input: remove redundant context, summarize long history, or retrieve only the most relevant chunks (RAG) instead of stuffing everything.
3Lower `max_tokens` if you reserved more output room than you need — output counts against the window.
4For long conversations, enable compaction or context editing so older turns are summarized or cleared automatically.
5If you genuinely need more room, switch to a model with a larger context window.

Prevent it

Design retrieval and compaction in from the start so requests stay well under the limit regardless of conversation length.

What causes “context length exceeded / maximum context length”?

The prompt plus requested output exceeds the model’s context window (measured in tokens).

How do I prevent “context length exceeded / maximum context length” from recurring?

Design retrieval and compaction in from the start so requests stay well under the limit regardless of conversation length.