GPT-4o mini
Fast / cheapLow-cost small model with a 128K context window. Confirm current pricing with OpenAI.
- Input / 1M
- $0.15
- Output / 1M
- $0.60
- Context
- 128K tokens
- Provider
- OpenAI
Pricing verified June 2026. Prices change frequently. Always confirm against the provider’s official pricing page before relying on these figures for budgeting. Official pricing →
What GPT-4o mini is best for
Cheap, fast, high-volume tasks on the OpenAI stack — the common downgrade target for cost work.
Use it for high-volume, latency-sensitive, low-stakes work: classification, extraction, routing, first-pass drafts. Avoid it for tasks where a wrong answer is expensive — keep those on a stronger model.
GPT-4o mini cost by volume
Estimated monthly cost at three realistic volumes, at $0.15 input / $0.60 output per million tokens.
| Scenario | Input / mo | Output / mo | Est. cost / mo |
|---|---|---|---|
| Prototype | 2M | 0.5M | $1 |
| Growing product | 50M | 10M | $14 |
| At scale | 500M | 100M | $135 |
Plug in your own numbers with the cost calculator.
How to cut your GPT-4o mini bill
The headline price isn’t the lever — your usage pattern is. The biggest reductions come from how you route and structure requests, not from switching models alone:
- Route low-stakes calls to a cheaper tier and keep GPT-4o mini on the routes where its strengths matter.
- Cache large stable prompt prefixes so repeated context bills at a fraction of the price.
- Batch non-urgent work, and trim oversized context by retrieving instead of stuffing.
- Add fallback chains so a timeout doesn’t trigger an expensive retry.
Do it behind evals so quality holds — that combination is how teams cut 30–60% without regressions.
Context window: 128K tokens
GPT-4o mini’s context window bounds how much it can consider at once — system prompt, history, retrieved docs, and the response all draw from those 128K tokens. A larger window enables whole-codebase reasoning and long documents, but using more of it costs more per request — so retrieval and caching still matter even when the window is large.
Frequently asked questions
How much does GPT-4o mini cost?
GPT-4o mini costs $0.15 per million input tokens and $0.60 per million output tokens. A workload of 50M input and 10M output tokens per month would cost about $13.5. Confirm current pricing with the provider.
What is GPT-4o mini's context window?
GPT-4o mini has a 128K-token context window. Low-cost small model with a 128K context window. Confirm current pricing with OpenAI.
Is GPT-4o mini the right model for my workload?
Cheap, fast, high-volume tasks on the OpenAI stack — the common downgrade target for cost work. The cheapest correct model is workload-specific — route low-stakes calls to a cheaper tier and reserve GPT-4o mini for work where its strengths matter, validated by evals.