Leaderboard · Pricing · 30 models · Updated
LLM API pricing — every major model, cheapest first
Per-million-token prices for 30 commercial LLMs — input, output, output-to-input ratio, context window, and a one-click route to try each behind a single key. Grouped by tier so you can match price to your workload at a glance.
OpenRouter routes one API key across every model on this page — pay-as-you-go, no monthly minimum, real-time per-token billing. Switch models without code changes. Try OpenRouter → (affiliate · supports this site)
Cheapest 3 right now
- Phi-4Microsoft$0.07 in · $0.14 out / 1M tokens · blended $0.63
- Qwen2.5-Coder 32BAlibaba$0.18 in · $0.18 out / 1M tokens · blended $0.90
- Gemini 2.0 FlashGoogle$0.10 in · $0.40 out / 1M tokens · blended $1.70
Most-expensive frontier models
- Claude Opus 4.1Anthropic$15.00 in · $75.00 out / 1M tokens
- o1OpenAI$15.00 in · $60.00 out / 1M tokens
- Claude 3.5 SonnetAnthropic$3.00 in · $15.00 out / 1M tokens
Blended cost = $/1M input + 4× $/1M output. Output usually dominates real bills, so the blended number is the best single proxy for "what this model will cost you per request" before you've measured your actual mix. For an exact answer, use the API cost calculator.
Cheap / fast tier — under $5 blended
Production-volume workhorses. The cheapest mainstream APIs on the market — pay-as-you-go and well under $1 per million input tokens.
| Model | Score | $ in / 1M | $ out / 1M | Out/in | Context | |
|---|---|---|---|---|---|---|
| Phi-4 Microsoft · open-weights |
71.2 | $0.07 | $0.14 | 2.0× | 16.384k | Try → |
| Qwen2.5-Coder 32B Alibaba · open-weights |
68.8 | $0.18 | $0.18 | 1.0× | 131.072k | Try → |
| Gemini 2.0 Flash Google · fast / cheap |
65.6 | $0.10 | $0.40 | 4.0× | 1M | Try → |
| Llama 3.3 70B Instruct Meta · open-weights |
64.7 | $0.23 | $0.40 | 1.7× | 128k | Try → |
| Llama 3.1 70B Instruct Meta · open-weights |
60.2 | $0.23 | $0.40 | 1.7× | 128k | Try → |
| Qwen2.5 72B Instruct Alibaba · open-weights |
65.6 | $0.35 | $0.40 | 1.1× | 131.072k | Try → |
| GPT-4o mini OpenAI · fast / cheap |
61.3 | $0.15 | $0.60 | 4.0× | 128k | Try → |
| Codestral 25.01 Mistral AI · general-purpose |
— | $0.30 | $0.90 | 3.0× | 256k | Try → |
| DeepSeek V3 DeepSeek · open-weights |
68.0 | $0.27 | $1.10 | 4.1× | 128k | Try → |
Mid tier — $5 to $30 blended
General-purpose models that balance quality and cost — the default tier for most production deployments.
| Model | Score | $ in / 1M | $ out / 1M | Out/in | Context | |
|---|---|---|---|---|---|---|
| GPT-5 mini OpenAI · fast / cheap |
77.0 | $0.25 | $2.00 | 8.0× | 400k | Try → |
| DeepSeek R1 DeepSeek · open-weights |
75.4 | $0.55 | $2.19 | 4.0× | 128k | Try → |
| Gemini 2.5 Flash Google · fast / cheap |
73.3 | $0.30 | $2.50 | 8.3× | 1M | Try → |
| Llama 3.1 405B Instruct Meta · open-weights |
65.7 | $2.70 | $2.70 | 1.0× | 128k | Try → |
| Claude 3.5 Haiku Anthropic · fast / cheap |
56.2 | $0.80 | $4.00 | 5.0× | 200k | Try → |
| o3-mini OpenAI · fast / cheap |
72.7 | $1.10 | $4.40 | 4.0× | 200k | Try → |
| Gemini 1.5 Pro Google · general-purpose |
67.9 | $1.25 | $5.00 | 4.0× | 2M | Try → |
| Mistral Large 2 Mistral AI · general-purpose |
63.7 | $2.00 | $6.00 | 3.0× | 128k | Try → |
Frontier tier — $30 to $100 blended
Top-of-leaderboard closed models. Pricing reflects R&D + premium inference; reserve for hard work.
| Model | Score | $ in / 1M | $ out / 1M | Out/in | Context | |
|---|---|---|---|---|---|---|
| o3 OpenAI · frontier |
83.7 | $2.00 | $8.00 | 4.0× | 200k | Try → |
| GPT-4.1 OpenAI · general-purpose |
74.5 | $2.00 | $8.00 | 4.0× | 1M | Try → |
| GPT-5 OpenAI · frontier |
86.0 | $1.25 | $10.00 | 8.0× | 400k | Try → |
| Gemini 2.5 Pro Google · frontier |
80.9 | $1.25 | $10.00 | 8.0× | 2M | Try → |
| GPT-4o OpenAI · general-purpose |
66.8 | $2.50 | $10.00 | 4.0× | 128k | Try → |
| Command R+ Cohere · general-purpose |
47.0 | $2.50 | $10.00 | 4.0× | 128k | Try → |
| Grok 4 xAI · frontier |
83.6 | $3.00 | $15.00 | 5.0× | 256k | Try → |
| Grok 3 xAI · general-purpose |
81.7 | $3.00 | $15.00 | 5.0× | 1M | Try → |
| Claude Sonnet 4 Anthropic · general-purpose |
80.7 | $3.00 | $15.00 | 5.0× | 200k | Try → |
| Claude 3.7 Sonnet Anthropic · general-purpose |
76.0 | $3.00 | $15.00 | 5.0× | 200k | Try → |
| Claude 3.5 Sonnet Anthropic · general-purpose |
69.1 | $3.00 | $15.00 | 5.0× | 200k | Try → |
Premium tier — over $100 blended
The most expensive models on the market. Use only when frontier reasoning is unavoidable.
| Model | Score | $ in / 1M | $ out / 1M | Out/in | Context | |
|---|---|---|---|---|---|---|
| o1 OpenAI · frontier |
75.7 | $15.00 | $60.00 | 4.0× | 200k | Try → |
| Claude Opus 4.1 Anthropic · frontier |
83.6 | $15.00 | $75.00 | 5.0× | 200k | Try → |
How we collect prices
Every price on this page comes from the provider's published per-1M-token rate, last verified 2026-05-10. We do not include:
- Per-request fees (most providers don't charge them; some do for batch / fine-tuning).
- Cached-input discounts (typically 10–50% off the regular input rate after the first call). These are workload-dependent — see each model page.
- Batch-API discounts (typically 50% off both input and output, with up to 24h latency).
- Volume / committed-use discounts (negotiated per-account).
If you spot a stale number, open an issue — we update weekly.
Frequently asked questions
What is the cheapest LLM API right now?
The cheapest production-grade APIs in early 2026 are Gemini 2.0 Flash, GPT-4o mini, Claude 3.5 Haiku, DeepSeek V3, and Phi-4 — every one of them is well under $1 per million input tokens. The exact ranking depends on your input/output mix; the table on this page is sorted by blended (input + 4× output) cost so the order matches what you'll actually see on a typical bill.
Why does output cost more than input?
Generating tokens is much more compute-intensive than reading them. For most frontier models, output is 3–5× the input price. That means prompt-heavy workloads (RAG, classification, extraction) are far cheaper per call than generation-heavy workloads (long-form writing, code generation).
How do these prices compare to ChatGPT Plus or Claude Pro?
These are pay-as-you-go API prices, not consumer chat subscriptions. ChatGPT Plus / Pro and Claude Pro are flat-rate plans aimed at human chat use; the API prices above are what you pay per token when calling the model from your own application. For programmatic use, the API is almost always cheaper unless your usage is very low.
How can I cut LLM costs without changing models?
Three highest-impact moves: (1) cache static prompt prefixes — most providers now bill cached input tokens at 10–50% of the regular rate, (2) trim system prompts aggressively, (3) route easy queries to a cheaper model and only escalate to a frontier model when needed. OpenRouter exposes per-request model selection on a single API key, which makes A/B testing this trivial.
Is this list complete?
The table covers every major commercial LLM with a public API as of 2026-05-10 — 30 models from OpenAI, Anthropic, Google, Meta, DeepSeek, xAI, Mistral, Microsoft, Cohere, and Alibaba. Smaller providers and self-hosted setups (where you bear hardware cost instead of per-token cost) are out of scope.
See also: API cost calculator · Head-to-head comparisons · Best cheap LLM API guide · Best free LLM API guide
Get the weekly LLM digest
Big releases, leaderboard movements, price drops, and the one chart that actually mattered this week. No spam.
Or follow updates on GitHub.