Which is cheapest: ChatGPT, Claude, or Gemini?

GPT-5 and Gemini 2.5 Pro are tied at $1.25 input / $10 output per 1M tokens. Claude Opus 4.1 at $15 / $75 is roughly 12x more expensive than either. For mid-tier, GPT-5 mini ($0.25/$2.00) and Gemini 2.5 Flash are nearly identical and 5x cheaper than the flagships.

Which is best for coding?

GPT-5 leads on SWE-Bench Verified at 74.9%, just ahead of Claude Opus 4.1 (74.5%). Gemini 2.5 Pro at 63.8% is a clear step behind. For real coding agents, GPT-5 and Claude are interchangeable; pick by personality (Claude refactors more conservatively, GPT-5 ships faster).

Which is best for writing?

Claude Opus 4.1 has the strongest reputation for prose quality, voice consistency, and long-form structure — though benchmarks for 'writing quality' are subjective. Gemini 2.5 Pro is competitive on factual long-form. GPT-5 is the strongest at structured output (JSON, tools).

Leaderboard · Head-to-head · Updated 2026-05-09

ChatGPT vs Claude vs Gemini in 2026

Q: Which has the longest context window?

Gemini 2.5 Pro at 2,000,000 tokens — 5x larger than GPT-5 (400k) and 10x larger than Claude Opus 4.1 (200k). For very long documents, codebases, or video, Gemini is the only viable choice.

GPT-5 vs Claude Opus 4.1 vs Gemini 2.5 Pro — benchmarks, pricing, context, and a clear verdict by use case. The frontier three are closer than ever; the right choice depends on what you're shipping.

The headline numbers

Metric	GPT-5	Claude Opus 4.1	Gemini 2.5 Pro
Arena Elo	1410	1390	1380
MMLU-Pro	86.8	87.0	86.0
GPQA Diamond	87.3	79.6	84.0
MATH	96.7	95.0	92.0
HumanEval	95.1	95.4	92.0
SWE-Bench Verified	74.9	74.5	63.8
Context window	400k	200k	2M
Input price ($/1M)	$1.25	$15.00	$1.25
Output price ($/1M)	$10.00	$75.00	$10.00

Want to A/B test all three with one API key?

OpenRouter exposes GPT-5, Claude Opus 4.1, and Gemini 2.5 Pro behind a single API — same price as direct, no per-provider invoices. Try OpenRouter → (affiliate)

Verdict by use case

Coding agents — GPT-5 (narrow win)

SWE-Bench Verified measures real-world GitHub issue resolution. GPT-5 leads at 74.9%, with Claude Opus 4.1 a hair behind at 74.5% — effectively tied. Gemini 2.5 Pro at 63.8% is a clear step down for autonomous coding work, though it remains excellent for code completion and review.

Pick Claude if you care about clean refactors and conservative changes. Pick GPT-5 if you want the agent to ship the PR.

Writing & long-form prose — Claude

This category resists clean benchmarks, but Claude Opus 4.1 has the strongest reputation among professional writers, technical-doc authors, and long-form journalists. Voice consistency over 50k+ tokens is its differentiator. GPT-5 is sharper at structured writing (JSON, outlines, schemas). Gemini's writing is competent but lacks personality.

Reasoning & math — GPT-5

GPT-5 leads GPQA Diamond (87.3 vs 79.6 vs 84.0) and MATH (96.7 vs 95.0 vs 92.0). Among the three, GPT-5 (and its sibling o3) are the strongest pure reasoners. Gemini holds the middle on natural-science reasoning; Claude Opus 4.1's GPQA is the weakest of the three.

Long-context (50k+ tokens) — Gemini 2.5 Pro

Only Gemini handles 2 million tokens in a single request. For codebase analysis, multi-document RAG, or video understanding, this is decisive. Even at 200k tokens, Gemini's "needle in a haystack" recall is the strongest of the three.

Cost-sensitive production — GPT-5 or Gemini 2.5 Pro (tied)

GPT-5 and Gemini 2.5 Pro are priced identically at $1.25 input / $10 output per 1M tokens. Claude Opus 4.1 at $15/$75 is roughly 12× more expensive than either. For high-volume work, the choice between GPT-5 and Gemini comes down to model fit, not cost. Claude Opus is the wrong choice for any cost-sensitive production workload — drop down to Claude Sonnet 4 ($3/$15) instead.

Tools & structured output — GPT-5

GPT-5's tool calling, JSON mode, and function-call latency are the most polished. Claude's tool use is excellent but slightly slower; Gemini's is improving but still lags on complex multi-tool sequences.

Cheaper sub-models within each family

If you're cost-sensitive but want the same family, drop a tier:

Family	Sub-model	MMLU-Pro	$ in / out
OpenAI	GPT-5 mini	80.1	$0.25 / $2.00
Anthropic	Claude Sonnet 4	84.0	$3.00 / $15.00
Google	Gemini 2.5 Flash	79.0	$0.30 / $2.50

For most production work, Claude Sonnet 4 / GPT-5 mini / Gemini 2.5 Flash deliver close to flagship quality at 5–20× lower price. Default to mid-tier unless you've measured a quality gap that hurts you.

What about the open-source frontier?

If you've decided "frontier closed model", but want to know what you're giving up by skipping open-weights: DeepSeek R1 (composite 78.0) is the closest open-weights model to this trio, sitting roughly between GPT-5 mini and Gemini 2.5 Flash. See best open-source LLM for the full picture.

The verdict, simplified

"I'm building one app, pick the default" → GPT-5. Best generalist with the strongest tool ecosystem and tied-cheapest at this tier.
"My users care about prose quality" → Claude Opus 4.1. Premium price, premium writing.
"I need 2M tokens of context, video, or audio in" → Gemini 2.5 Pro. Only viable choice for very long inputs and native multimodal.
"I haven't decided" → ship on OpenRouter so you can switch with a string change.

Frequently asked questions

Which is better in 2026: ChatGPT, Claude, or Gemini?

All three sit at the top of the llmrank.top composite within ~4 points of each other. The gap is small enough that fit for use case matters more than score. Coding → GPT-5. Writing → Claude Opus 4.1. Long context / multimodal → Gemini 2.5 Pro.

Which is cheapest?

GPT-5 and Gemini 2.5 Pro are tied at $1.25 input / $10 output per 1M tokens. Claude Opus 4.1 at $15/$75 is roughly 12× more expensive. For mid-tier, GPT-5 mini ($0.25/$2.00) and Gemini 2.5 Flash ($0.30/$2.50) are the value picks; Claude Sonnet 4 ($3/$15) is 10× pricier but stronger on coding.

Which has the longest context window?

Gemini 2.5 Pro at 2 million tokens — 5× GPT-5 and 10× Claude Opus 4.1. For very long inputs (codebases, multi-PDF, hour-long video), Gemini is the only frontier choice.

Which is best for ChatGPT-style apps?

If you want the literal ChatGPT experience, GPT-5 is the model behind it. If you want the same quality at lower cost, GPT-5 mini ($0.25/$2.00) is 4× cheaper and only ~6 composite points lower.

Should I use ChatGPT, Claude, or Gemini for code?

For autonomous coding agents (SWE-Bench), GPT-5 leads narrowly over Claude Opus 4.1, and both clearly beat Gemini. For inline code completion, all three are excellent — quality differences disappear at the token-by-token level.

Methodology and sources: see About. Spotted an error? Open an issue.

Get the weekly LLM digest

Frontier-model price drops, leaderboard movements, and the one chart that mattered this week. No spam.

Or follow updates on GitHub.