Leaderboard · Pricing · 30 models · Updated 2026-05-10

LLM API pricing — every major model, cheapest first

Per-million-token prices for 30 commercial LLMs — input, output, output-to-input ratio, context window, and a one-click route to try each behind a single key. Grouped by tier so you can match price to your workload at a glance.

Cost calculator → Cheap-tier guide → Free-tier guide → Try any model via OpenRouter →

Cheapest 3 right now

Phi-4Microsoft$0.07 in · $0.14 out / 1M tokens · blended $0.63
Qwen2.5-Coder 32BAlibaba$0.18 in · $0.18 out / 1M tokens · blended $0.90
Gemini 2.0 FlashGoogle$0.10 in · $0.40 out / 1M tokens · blended $1.70

Most-expensive frontier models

Claude Opus 4.1Anthropic$15.00 in · $75.00 out / 1M tokens
o1OpenAI$15.00 in · $60.00 out / 1M tokens
Claude 3.5 SonnetAnthropic$3.00 in · $15.00 out / 1M tokens

Blended cost = $/1M input + 4× $/1M output. Output usually dominates real bills, so the blended number is the best single proxy for "what this model will cost you per request" before you've measured your actual mix. For an exact answer, use the API cost calculator.

Cheap / fast tier — under $5 blended

Production-volume workhorses. The cheapest mainstream APIs on the market — pay-as-you-go and well under $1 per million input tokens.

Model	Score	$ in / 1M	$ out / 1M	Out/in	Context
Phi-4 Microsoft · open-weights	71.2	$0.07	$0.14	2.0×	16.384k	Try →
Qwen2.5-Coder 32B Alibaba · open-weights	68.8	$0.18	$0.18	1.0×	131.072k	Try →
Gemini 2.0 Flash Google · fast / cheap	65.6	$0.10	$0.40	4.0×	1M	Try →
Llama 3.3 70B Instruct Meta · open-weights	64.7	$0.23	$0.40	1.7×	128k	Try →
Llama 3.1 70B Instruct Meta · open-weights	60.2	$0.23	$0.40	1.7×	128k	Try →
Qwen2.5 72B Instruct Alibaba · open-weights	65.6	$0.35	$0.40	1.1×	131.072k	Try →
GPT-4o mini OpenAI · fast / cheap	61.3	$0.15	$0.60	4.0×	128k	Try →
Codestral 25.01 Mistral AI · general-purpose	—	$0.30	$0.90	3.0×	256k	Try →
DeepSeek V3 DeepSeek · open-weights	68.0	$0.27	$1.10	4.1×	128k	Try →

Mid tier — $5 to $30 blended

General-purpose models that balance quality and cost — the default tier for most production deployments.

Model	Score	$ in / 1M	$ out / 1M	Out/in	Context
GPT-5 mini OpenAI · fast / cheap	77.0	$0.25	$2.00	8.0×	400k	Try →
DeepSeek R1 DeepSeek · open-weights	75.4	$0.55	$2.19	4.0×	128k	Try →
Gemini 2.5 Flash Google · fast / cheap	73.3	$0.30	$2.50	8.3×	1M	Try →
Llama 3.1 405B Instruct Meta · open-weights	65.7	$2.70	$2.70	1.0×	128k	Try →
Claude 3.5 Haiku Anthropic · fast / cheap	56.2	$0.80	$4.00	5.0×	200k	Try →
o3-mini OpenAI · fast / cheap	72.7	$1.10	$4.40	4.0×	200k	Try →
Gemini 1.5 Pro Google · general-purpose	67.9	$1.25	$5.00	4.0×	2M	Try →
Mistral Large 2 Mistral AI · general-purpose	63.7	$2.00	$6.00	3.0×	128k	Try →

Frontier tier — $30 to $100 blended

Top-of-leaderboard closed models. Pricing reflects R&D + premium inference; reserve for hard work.

Model	Score	$ in / 1M	$ out / 1M	Out/in	Context
o3 OpenAI · frontier	83.7	$2.00	$8.00	4.0×	200k	Try →
GPT-4.1 OpenAI · general-purpose	74.5	$2.00	$8.00	4.0×	1M	Try →
GPT-5 OpenAI · frontier	86.0	$1.25	$10.00	8.0×	400k	Try →
Gemini 2.5 Pro Google · frontier	80.9	$1.25	$10.00	8.0×	2M	Try →
GPT-4o OpenAI · general-purpose	66.8	$2.50	$10.00	4.0×	128k	Try →
Command R+ Cohere · general-purpose	47.0	$2.50	$10.00	4.0×	128k	Try →
Grok 4 xAI · frontier	83.6	$3.00	$15.00	5.0×	256k	Try →
Grok 3 xAI · general-purpose	81.7	$3.00	$15.00	5.0×	1M	Try →
Claude Sonnet 4 Anthropic · general-purpose	80.7	$3.00	$15.00	5.0×	200k	Try →
Claude 3.7 Sonnet Anthropic · general-purpose	76.0	$3.00	$15.00	5.0×	200k	Try →
Claude 3.5 Sonnet Anthropic · general-purpose	69.1	$3.00	$15.00	5.0×	200k	Try →

Premium tier — over $100 blended

The most expensive models on the market. Use only when frontier reasoning is unavoidable.

Model	Score	$ in / 1M	$ out / 1M	Out/in	Context
o1 OpenAI · frontier	75.7	$15.00	$60.00	4.0×	200k	Try →
Claude Opus 4.1 Anthropic · frontier	83.6	$15.00	$75.00	5.0×	200k	Try →

How we collect prices

Every price on this page comes from the provider's published per-1M-token rate, last verified 2026-05-10. We do not include:

Per-request fees (most providers don't charge them; some do for batch / fine-tuning).
Cached-input discounts (typically 10–50% off the regular input rate after the first call). These are workload-dependent — see each model page.
Batch-API discounts (typically 50% off both input and output, with up to 24h latency).
Volume / committed-use discounts (negotiated per-account).

If you spot a stale number, open an issue — we update weekly.

Frequently asked questions

What is the cheapest LLM API right now?

The cheapest production-grade APIs in early 2026 are Gemini 2.0 Flash, GPT-4o mini, Claude 3.5 Haiku, DeepSeek V3, and Phi-4 — every one of them is well under $1 per million input tokens. The exact ranking depends on your input/output mix; the table on this page is sorted by blended (input + 4× output) cost so the order matches what you'll actually see on a typical bill.

Why does output cost more than input?

Generating tokens is much more compute-intensive than reading them. For most frontier models, output is 3–5× the input price. That means prompt-heavy workloads (RAG, classification, extraction) are far cheaper per call than generation-heavy workloads (long-form writing, code generation).

How do these prices compare to ChatGPT Plus or Claude Pro?

These are pay-as-you-go API prices, not consumer chat subscriptions. ChatGPT Plus / Pro and Claude Pro are flat-rate plans aimed at human chat use; the API prices above are what you pay per token when calling the model from your own application. For programmatic use, the API is almost always cheaper unless your usage is very low.

How can I cut LLM costs without changing models?

Three highest-impact moves: (1) cache static prompt prefixes — most providers now bill cached input tokens at 10–50% of the regular rate, (2) trim system prompts aggressively, (3) route easy queries to a cheaper model and only escalate to a frontier model when needed. OpenRouter exposes per-request model selection on a single API key, which makes A/B testing this trivial.

Is this list complete?

The table covers every major commercial LLM with a public API as of 2026-05-10 — 30 models from OpenAI, Anthropic, Google, Meta, DeepSeek, xAI, Mistral, Microsoft, Cohere, and Alibaba. Smaller providers and self-hosted setups (where you bear hardware cost instead of per-token cost) are out of scope.

Get the weekly LLM digest

Big releases, leaderboard movements, price drops, and the one chart that actually mattered this week. No spam.

Or follow updates on GitHub.