Leaderboard · Guide · Updated
The best cheap LLM API in 2026
Ranked by price-per-quality: MMLU-Pro points per dollar, coding scores per dollar, and real-world value. No affiliate bias — just the math.
OpenRouter routes GPT-5, Claude, Gemini, DeepSeek, Llama, Qwen and 100+ other LLMs behind a single key — pay-as-you-go, no monthly minimum, transparent per-token pricing. Try OpenRouter → (affiliate · supports this site)
TL;DR — cheapest pick by workload
| Workload | Cheapest pick | $ in / out (per 1M) | Quality score |
|---|---|---|---|
| General chat (high volume) | Gemini 2.0 Flash | $0.10 / $0.40 | 76.4 MMLU-Pro |
| Coding agent (overnight batch) | DeepSeek V3 | $0.27 / $1.10 | 42.0 SWE-Bench |
| Coding (real-time IDE) | Qwen2.5-Coder 32B | $0.18 flat | 92.7 HumanEval |
| Frontier quality, low price | GPT-5 mini | $0.25 / $2.00 | 80.1 MMLU-Pro |
| Reasoning (math / science) | o3-mini | $1.10 / $4.40 | 92.0 MATH |
| Ultra-budget (prototyping) | Phi-4 | $0.07 / $0.14 | 70.4 MMLU-Pro |
OpenRouter exposes Gemini Flash, DeepSeek V3, Qwen Coder, GPT-5 mini, and 100+ others behind a single key — same per-token price as direct. Try OpenRouter → (affiliate)
The value formula
We rank cheap LLM APIs by two metrics:
- General intelligence per dollar: MMLU-Pro score ÷ (input price + output price)
- Coding value per dollar: HumanEval score ÷ (input price + output price)
These aren't perfect — they ignore latency, context window, and multimodality — but they give a first-order approximation of "bang for buck".
Best $/quality for general intelligence
| Model | MMLU-Pro | $ in+out / 1M | MMLU/$ |
|---|---|---|---|
| Gemini 2.0 Flash | 76.4 | $0.50 | 152.8 |
| Phi-4 | 70.4 | $0.21 | 335.2 |
| GPT-5 mini | 80.1 | $2.25 | 35.6 |
| Gemini 2.5 Flash | 79.0 | $2.80 | 28.2 |
| DeepSeek V3 | 75.9 | $1.37 | 55.4 |
| Qwen2.5-Coder 32B | 68.4 | $0.36 | 190.0 |
| GPT-5 | 86.8 | $11.25 | 7.7 |
MMLU-Pro/$ = MMLU-Pro score ÷ (input + output price). Higher is better.
Best $/quality for coding
| Model | HumanEval | $ in+out / 1M | HE/$ |
|---|---|---|---|
| Qwen2.5-Coder 32B | 92.7 | $0.36 | 257.5 |
| Gemini 2.0 Flash | 89.0 | $0.50 | 178.0 |
| DeepSeek V3 | 91.0 | $1.37 | 66.4 |
| Phi-4 | 82.6 | $0.21 | 393.3 |
| GPT-5 mini | 90.5 | $2.25 | 40.2 |
| Claude Sonnet 4 | 93.7 | $18.00 | 5.2 |
HE/$ = HumanEval score ÷ (input + output price). Higher is better.
Hidden fees to watch out for
- Caching semantics vary. OpenAI discounts cached input 50%. Anthropic discounts 90%. Google discounts — but only for prompts >32k tokens. DeepSeek has no cache discount yet. If you reuse prompts heavily, Anthropic's 90% cache discount can make Sonnet cheaper than Flash.
- Batch vs real-time. Some providers offer 2–5× cheaper batch inference with 24h latency. If you're doing overnight processing, ask your provider about batch pricing.
- Tokenisation differences. Chinese text tokenises to ~2× more tokens in GPT-tokeniser models than in Qwen's own tokeniser. A Chinese app running on GPT-5 mini may cost more than one on Qwen2.5 72B despite the headline $/1M rate being lower.
- Output minimums. Very short requests ("say hi") can incur per-request minimums on some platforms that make the effective $/1M 10× higher.
Cost calculator: 10M tokens/day
A moderately busy chatbot sending ~5M input and ~5M output tokens per day:
| Model | Daily cost | Monthly cost | Yearly cost |
|---|---|---|---|
| Gemini 2.0 Flash | $2.50 | $75 | $913 |
| Phi-4 | $1.05 | $32 | $383 |
| DeepSeek V3 | $6.85 | $206 | $2,500 |
| GPT-5 mini | $11.25 | $338 | $4,106 |
| Claude Sonnet 4 | $90.00 | $2,700 | $32,850 |
| GPT-5 | $56.25 | $1,688 | $20,531 |
| Claude Opus 4.1 | $450.00 | $13,500 | $164,250 |
The verdict
For general chat at massive scale, Gemini 2.0 Flash is unbeatable at $0.10/$0.40. For coding, Qwen2.5-Coder 32B delivers 92.7% HumanEval at $0.18 flat — the best pure coding value on the market. For frontier-quality reasoning on a budget, GPT-5 mini is the sweet spot: 80.1% MMLU-Pro at $0.25/$2.00.
The most expensive mistake is over-provisioning. Start with the cheapest model that clears your quality bar, measure real-world latency and error rates, and only upgrade if the numbers justify it.
Frequently asked questions
What is the cheapest LLM API that is still good?
Gemini 2.0 Flash at $0.10 input / $0.40 output per 1M tokens is the cheapest production-grade API with 76.4% MMLU-Pro. For coding, Qwen2.5-Coder 32B at $0.18 flat delivers 92.7% HumanEval.
Is GPT-5 mini worth it over Gemini Flash?
GPT-5 mini costs 4.5× more but scores 80.1% MMLU-Pro vs Flash's 76.4%. If you need the extra 4 points for customer-facing quality, yes. For internal tools and prototypes, Flash is the better value.
Can I get discounts for high volume?
Yes — all major providers offer enterprise pricing above ~$10k/month. DeepSeek and Google are typically the most aggressive on volume discounts. Anthropic rarely discounts list price but offers committed-use credits.
Methodology and sources: see About. Spotted a price that's out of date? Open an issue.
Get the weekly LLM digest
Price drops, new free tiers, and the best value models we found this week. No spam.
Or follow updates on GitHub.