LLM Rank.top

Leaderboard · Head-to-head · Updated

ChatGPT vs Claude vs Gemini in 2026

GPT-5 vs Claude Opus 4.1 vs Gemini 2.5 Pro — benchmarks, pricing, context, and a clear verdict by use case. The frontier three are closer than ever; the right choice depends on what you're shipping.

Try every model in this guide from one API key.

OpenRouter routes GPT-5, Claude, Gemini, DeepSeek, Llama, Qwen and 100+ other LLMs behind a single key — pay-as-you-go, no monthly minimum, transparent per-token pricing. Try OpenRouter → (affiliate · supports this site)

The headline numbers

MetricGPT-5Claude Opus 4.1Gemini 2.5 Pro
Arena Elo141013901380
MMLU-Pro86.887.086.0
GPQA Diamond87.379.684.0
MATH96.795.092.0
HumanEval95.195.492.0
SWE-Bench Verified74.974.563.8
Context window400k200k2M
Input price ($/1M)$1.25$15.00$1.25
Output price ($/1M)$10.00$75.00$10.00
Want to A/B test all three with one API key?

OpenRouter exposes GPT-5, Claude Opus 4.1, and Gemini 2.5 Pro behind a single API — same price as direct, no per-provider invoices. Try OpenRouter → (affiliate)

Verdict by use case

Coding agents — GPT-5 (narrow win)

SWE-Bench Verified measures real-world GitHub issue resolution. GPT-5 leads at 74.9%, with Claude Opus 4.1 a hair behind at 74.5% — effectively tied. Gemini 2.5 Pro at 63.8% is a clear step down for autonomous coding work, though it remains excellent for code completion and review.

Pick Claude if you care about clean refactors and conservative changes. Pick GPT-5 if you want the agent to ship the PR.

Writing & long-form prose — Claude

This category resists clean benchmarks, but Claude Opus 4.1 has the strongest reputation among professional writers, technical-doc authors, and long-form journalists. Voice consistency over 50k+ tokens is its differentiator. GPT-5 is sharper at structured writing (JSON, outlines, schemas). Gemini's writing is competent but lacks personality.

Reasoning & math — GPT-5

GPT-5 leads GPQA Diamond (87.3 vs 79.6 vs 84.0) and MATH (96.7 vs 95.0 vs 92.0). Among the three, GPT-5 (and its sibling o3) are the strongest pure reasoners. Gemini holds the middle on natural-science reasoning; Claude Opus 4.1's GPQA is the weakest of the three.

Long-context (50k+ tokens) — Gemini 2.5 Pro

Only Gemini handles 2 million tokens in a single request. For codebase analysis, multi-document RAG, or video understanding, this is decisive. Even at 200k tokens, Gemini's "needle in a haystack" recall is the strongest of the three.

Cost-sensitive production — GPT-5 or Gemini 2.5 Pro (tied)

GPT-5 and Gemini 2.5 Pro are priced identically at $1.25 input / $10 output per 1M tokens. Claude Opus 4.1 at $15/$75 is roughly 12× more expensive than either. For high-volume work, the choice between GPT-5 and Gemini comes down to model fit, not cost. Claude Opus is the wrong choice for any cost-sensitive production workload — drop down to Claude Sonnet 4 ($3/$15) instead.

Tools & structured output — GPT-5

GPT-5's tool calling, JSON mode, and function-call latency are the most polished. Claude's tool use is excellent but slightly slower; Gemini's is improving but still lags on complex multi-tool sequences.

Cheaper sub-models within each family

If you're cost-sensitive but want the same family, drop a tier:

FamilySub-modelMMLU-Pro$ in / out
OpenAIGPT-5 mini80.1$0.25 / $2.00
AnthropicClaude Sonnet 484.0$3.00 / $15.00
GoogleGemini 2.5 Flash79.0$0.30 / $2.50

For most production work, Claude Sonnet 4 / GPT-5 mini / Gemini 2.5 Flash deliver close to flagship quality at 5–20× lower price. Default to mid-tier unless you've measured a quality gap that hurts you.

What about the open-source frontier?

If you've decided "frontier closed model", but want to know what you're giving up by skipping open-weights: DeepSeek R1 (composite 78.0) is the closest open-weights model to this trio, sitting roughly between GPT-5 mini and Gemini 2.5 Flash. See best open-source LLM for the full picture.

The verdict, simplified

Frequently asked questions

Which is better in 2026: ChatGPT, Claude, or Gemini?

All three sit at the top of the llmrank.top composite within ~4 points of each other. The gap is small enough that fit for use case matters more than score. Coding → GPT-5. Writing → Claude Opus 4.1. Long context / multimodal → Gemini 2.5 Pro.

Which is cheapest?

GPT-5 and Gemini 2.5 Pro are tied at $1.25 input / $10 output per 1M tokens. Claude Opus 4.1 at $15/$75 is roughly 12× more expensive. For mid-tier, GPT-5 mini ($0.25/$2.00) and Gemini 2.5 Flash ($0.30/$2.50) are the value picks; Claude Sonnet 4 ($3/$15) is 10× pricier but stronger on coding.

Which has the longest context window?

Gemini 2.5 Pro at 2 million tokens — 5× GPT-5 and 10× Claude Opus 4.1. For very long inputs (codebases, multi-PDF, hour-long video), Gemini is the only frontier choice.

Which is best for ChatGPT-style apps?

If you want the literal ChatGPT experience, GPT-5 is the model behind it. If you want the same quality at lower cost, GPT-5 mini ($0.25/$2.00) is 4× cheaper and only ~6 composite points lower.

Should I use ChatGPT, Claude, or Gemini for code?

For autonomous coding agents (SWE-Bench), GPT-5 leads narrowly over Claude Opus 4.1, and both clearly beat Gemini. For inline code completion, all three are excellent — quality differences disappear at the token-by-token level.


Methodology and sources: see About. Spotted an error? Open an issue.

Get the weekly LLM digest

Frontier-model price drops, leaderboard movements, and the one chart that mattered this week. No spam.

Or follow updates on GitHub.