How to fix: 429 rate limit error
Cause
You exceeded the requests-per-minute (RPM) or tokens-per-minute (TPM) quota for your account tier.
The fix
- 1Read the response: a `429` includes a `retry-after` header telling you how many seconds to wait.
- 2Implement exponential backoff with jitter — most official SDKs retry 429s automatically (default ~2 retries); raise the limit if needed.
- 3Reduce token pressure: trim prompts, cache stable prefixes, and route low-stakes calls to a cheaper/separate model with its own quota.
- 4Batch non-urgent work through the provider’s batch API, which has separate, higher limits.
- 5If you’re consistently capped, request a tier/quota increase from the provider.
Prevent it
Add a gateway that meters and queues requests, with backoff and per-route rate budgets, so spikes degrade gracefully instead of failing.
Frequently asked questions
What causes “429 rate limit error”?
You exceeded the requests-per-minute (RPM) or tokens-per-minute (TPM) quota for your account tier.
How do I prevent “429 rate limit error” from recurring?
Add a gateway that meters and queues requests, with backoff and per-route rate budgets, so spikes degrade gracefully instead of failing.