LLM Rank.top

Leaderboard · Guide · Updated

The best cheap LLM API in 2026

Ranked by price-per-quality: MMLU-Pro points per dollar, coding scores per dollar, and real-world value. No affiliate bias — just the math.

Try every model in this guide from one API key.

OpenRouter routes GPT-5, Claude, Gemini, DeepSeek, Llama, Qwen and 100+ other LLMs behind a single key — pay-as-you-go, no monthly minimum, transparent per-token pricing. Try OpenRouter → (affiliate · supports this site)

TL;DR — cheapest pick by workload

WorkloadCheapest pick$ in / out (per 1M)Quality score
General chat (high volume)Gemini 2.0 Flash$0.10 / $0.4076.4 MMLU-Pro
Coding agent (overnight batch)DeepSeek V3$0.27 / $1.1042.0 SWE-Bench
Coding (real-time IDE)Qwen2.5-Coder 32B$0.18 flat92.7 HumanEval
Frontier quality, low priceGPT-5 mini$0.25 / $2.0080.1 MMLU-Pro
Reasoning (math / science)o3-mini$1.10 / $4.4092.0 MATH
Ultra-budget (prototyping)Phi-4$0.07 / $0.1470.4 MMLU-Pro
One API key for every model in this article.

OpenRouter exposes Gemini Flash, DeepSeek V3, Qwen Coder, GPT-5 mini, and 100+ others behind a single key — same per-token price as direct. Try OpenRouter → (affiliate)

The value formula

We rank cheap LLM APIs by two metrics:

These aren't perfect — they ignore latency, context window, and multimodality — but they give a first-order approximation of "bang for buck".

Best $/quality for general intelligence

ModelMMLU-Pro$ in+out / 1MMMLU/$
Gemini 2.0 Flash76.4$0.50152.8
Phi-470.4$0.21335.2
GPT-5 mini80.1$2.2535.6
Gemini 2.5 Flash79.0$2.8028.2
DeepSeek V375.9$1.3755.4
Qwen2.5-Coder 32B68.4$0.36190.0
GPT-586.8$11.257.7

MMLU-Pro/$ = MMLU-Pro score ÷ (input + output price). Higher is better.

Best $/quality for coding

ModelHumanEval$ in+out / 1MHE/$
Qwen2.5-Coder 32B92.7$0.36257.5
Gemini 2.0 Flash89.0$0.50178.0
DeepSeek V391.0$1.3766.4
Phi-482.6$0.21393.3
GPT-5 mini90.5$2.2540.2
Claude Sonnet 493.7$18.005.2

HE/$ = HumanEval score ÷ (input + output price). Higher is better.

Hidden fees to watch out for

Cost calculator: 10M tokens/day

A moderately busy chatbot sending ~5M input and ~5M output tokens per day:

ModelDaily costMonthly costYearly cost
Gemini 2.0 Flash$2.50$75$913
Phi-4$1.05$32$383
DeepSeek V3$6.85$206$2,500
GPT-5 mini$11.25$338$4,106
Claude Sonnet 4$90.00$2,700$32,850
GPT-5$56.25$1,688$20,531
Claude Opus 4.1$450.00$13,500$164,250

The verdict

For general chat at massive scale, Gemini 2.0 Flash is unbeatable at $0.10/$0.40. For coding, Qwen2.5-Coder 32B delivers 92.7% HumanEval at $0.18 flat — the best pure coding value on the market. For frontier-quality reasoning on a budget, GPT-5 mini is the sweet spot: 80.1% MMLU-Pro at $0.25/$2.00.

The most expensive mistake is over-provisioning. Start with the cheapest model that clears your quality bar, measure real-world latency and error rates, and only upgrade if the numbers justify it.

Frequently asked questions

What is the cheapest LLM API that is still good?

Gemini 2.0 Flash at $0.10 input / $0.40 output per 1M tokens is the cheapest production-grade API with 76.4% MMLU-Pro. For coding, Qwen2.5-Coder 32B at $0.18 flat delivers 92.7% HumanEval.

Is GPT-5 mini worth it over Gemini Flash?

GPT-5 mini costs 4.5× more but scores 80.1% MMLU-Pro vs Flash's 76.4%. If you need the extra 4 points for customer-facing quality, yes. For internal tools and prototypes, Flash is the better value.

Can I get discounts for high volume?

Yes — all major providers offer enterprise pricing above ~$10k/month. DeepSeek and Google are typically the most aggressive on volume discounts. Anthropic rarely discounts list price but offers committed-use credits.


Methodology and sources: see About. Spotted a price that's out of date? Open an issue.

Get the weekly LLM digest

Price drops, new free tiers, and the best value models we found this week. No spam.

Or follow updates on GitHub.