What is the cheapest LLM API with good quality?

Gemini 2.0 Flash at $0.10 input / $0.40 output per 1M tokens is the cheapest production-grade API. For coding, Qwen2.5-Coder 32B at $0.18 flat and DeepSeek V3 at $0.27 / $1.10 offer frontier-tier capability at 1/10 the price of GPT-5.

How much does it cost to use GPT-5 at scale?

GPT-5 costs $1.25 input / $10 output per 1M tokens. A 10M-token-per-day workload costs ~$56/day. GPT-5 mini drops this to $11.25/day with only modest quality loss.

Are there hidden fees with LLM APIs?

Yes. Watch for: (1) caching charges — some providers bill cached tokens at a discount but still bill; (2) batching premiums — real-time vs batch pricing can differ 2-5x; (3) context-window overruns — some providers silently truncate and still bill for the full prompt; (4) output-token minimums on very short requests.

Is OpenRouter cheaper than going direct?

OpenRouter charges the same per-token price as going direct — it earns margin from volume discounts, not markup. The value is convenience: one API key, one invoice, and fallback routing if a provider is down.

Leaderboard · Guide · Updated 2026-05-09

The best cheap LLM API in 2026

Ranked by price-per-quality: MMLU-Pro points per dollar, coding scores per dollar, and real-world value. No affiliate bias — just the math.

TL;DR — cheapest pick by workload

Workload	Cheapest pick	$ in / out (per 1M)	Quality score
General chat (high volume)	Gemini 2.0 Flash	$0.10 / $0.40	76.4 MMLU-Pro
Coding agent (overnight batch)	DeepSeek V3	$0.27 / $1.10	42.0 SWE-Bench
Coding (real-time IDE)	Qwen2.5-Coder 32B	$0.18 flat	92.7 HumanEval
Frontier quality, low price	GPT-5 mini	$0.25 / $2.00	80.1 MMLU-Pro
Reasoning (math / science)	o3-mini	$1.10 / $4.40	92.0 MATH
Ultra-budget (prototyping)	Phi-4	$0.07 / $0.14	70.4 MMLU-Pro

One API key for every model in this article.

OpenRouter exposes Gemini Flash, DeepSeek V3, Qwen Coder, GPT-5 mini, and 100+ others behind a single key — same per-token price as direct. Try OpenRouter → (affiliate)

The value formula

We rank cheap LLM APIs by two metrics:

General intelligence per dollar: MMLU-Pro score ÷ (input price + output price)
Coding value per dollar: HumanEval score ÷ (input price + output price)

These aren't perfect — they ignore latency, context window, and multimodality — but they give a first-order approximation of "bang for buck".

Best $/quality for general intelligence

Model	MMLU-Pro	$ in+out / 1M	MMLU/$
Gemini 2.0 Flash	76.4	$0.50	152.8
Phi-4	70.4	$0.21	335.2
GPT-5 mini	80.1	$2.25	35.6
Gemini 2.5 Flash	79.0	$2.80	28.2
DeepSeek V3	75.9	$1.37	55.4
Qwen2.5-Coder 32B	68.4	$0.36	190.0
GPT-5	86.8	$11.25	7.7

MMLU-Pro/$ = MMLU-Pro score ÷ (input + output price). Higher is better.

Best $/quality for coding

Model	HumanEval	$ in+out / 1M	HE/$
Qwen2.5-Coder 32B	92.7	$0.36	257.5
Gemini 2.0 Flash	89.0	$0.50	178.0
DeepSeek V3	91.0	$1.37	66.4
Phi-4	82.6	$0.21	393.3
GPT-5 mini	90.5	$2.25	40.2
Claude Sonnet 4	93.7	$18.00	5.2

HE/$ = HumanEval score ÷ (input + output price). Higher is better.

Hidden fees to watch out for

Caching semantics vary. OpenAI discounts cached input 50%. Anthropic discounts 90%. Google discounts — but only for prompts >32k tokens. DeepSeek has no cache discount yet. If you reuse prompts heavily, Anthropic's 90% cache discount can make Sonnet cheaper than Flash.
Batch vs real-time. Some providers offer 2–5× cheaper batch inference with 24h latency. If you're doing overnight processing, ask your provider about batch pricing.
Tokenisation differences. Chinese text tokenises to ~2× more tokens in GPT-tokeniser models than in Qwen's own tokeniser. A Chinese app running on GPT-5 mini may cost more than one on Qwen2.5 72B despite the headline $/1M rate being lower.
Output minimums. Very short requests ("say hi") can incur per-request minimums on some platforms that make the effective $/1M 10× higher.

Cost calculator: 10M tokens/day

A moderately busy chatbot sending ~5M input and ~5M output tokens per day:

Model	Daily cost	Monthly cost	Yearly cost
Gemini 2.0 Flash	$2.50	$75	$913
Phi-4	$1.05	$32	$383
DeepSeek V3	$6.85	$206	$2,500
GPT-5 mini	$11.25	$338	$4,106
Claude Sonnet 4	$90.00	$2,700	$32,850
GPT-5	$56.25	$1,688	$20,531
Claude Opus 4.1	$450.00	$13,500	$164,250

The verdict

For general chat at massive scale, Gemini 2.0 Flash is unbeatable at $0.10/$0.40. For coding, Qwen2.5-Coder 32B delivers 92.7% HumanEval at $0.18 flat — the best pure coding value on the market. For frontier-quality reasoning on a budget, GPT-5 mini is the sweet spot: 80.1% MMLU-Pro at $0.25/$2.00.

The most expensive mistake is over-provisioning. Start with the cheapest model that clears your quality bar, measure real-world latency and error rates, and only upgrade if the numbers justify it.

Frequently asked questions

What is the cheapest LLM API that is still good?

Gemini 2.0 Flash at $0.10 input / $0.40 output per 1M tokens is the cheapest production-grade API with 76.4% MMLU-Pro. For coding, Qwen2.5-Coder 32B at $0.18 flat delivers 92.7% HumanEval.

Is GPT-5 mini worth it over Gemini Flash?

GPT-5 mini costs 4.5× more but scores 80.1% MMLU-Pro vs Flash's 76.4%. If you need the extra 4 points for customer-facing quality, yes. For internal tools and prototypes, Flash is the better value.

Can I get discounts for high volume?

Yes — all major providers offer enterprise pricing above ~$10k/month. DeepSeek and Google are typically the most aggressive on volume discounts. Anthropic rarely discounts list price but offers committed-use credits.

Methodology and sources: see About. Spotted a price that's out of date? Open an issue.

Get the weekly LLM digest

Price drops, new free tiers, and the best value models we found this week. No spam.

Or follow updates on GitHub.