Leaderboard · Compare · Qwen2.5-Coder 32B vs Codestral 25.01 · Updated 2026-05-10

Qwen2.5-Coder 32B vs Codestral 25.01

Qwen2.5-Coder 32B has the better-documented benchmark profile (composite 68.8). Codestral 25.01 is harder to compare quantitatively.

Qwen2.5-Coder 32B · composite 68.8 Codestral 25.01 · composite — open-weights vs general-purpose

Try Qwen2.5-Coder 32B → Try Codestral 25.01 → A/B test both via OpenRouter →

At a glance

Spec	Qwen2.5-Coder 32B	Codestral 25.01
Provider	Alibaba	Mistral AI
Released	2024-11	2025-01
Tier	open-weights	general-purpose
License	Open · Apache-2.0	Closed
Context window	131.072k	256k
$ in / out (per 1M)	$0.18 / $0.18	$0.30 / $0.90

Benchmark scoreboard

Higher is better on every benchmark. Δ shows Qwen2.5-Coder 32B − Codestral 25.01.

Benchmark	Qwen2.5-Coder 32B	Codestral 25.01	Δ
Chatbot Arena Elo	N/A	N/A	—
MMLU-Pro	68.4	N/A	—
GPQA Diamond	40.0	N/A	—
MATH	83.1	N/A	—
HumanEval	92.7	86.6	+6.1
SWE-Bench Verified	N/A	N/A	—

Numbers compiled from provider technical reports and Chatbot Arena snapshots — see methodology.

Don't pick blind — A/B test both models on the same API key.

OpenRouter routes Qwen2.5-Coder 32B, Codestral 25.01, and 100+ other LLMs behind a single API key — pay-as-you-go, no monthly minimum, fallback if a provider is down. Try OpenRouter → (affiliate · supports this site)

Qwen2.5-Coder 32B vs Codestral 25.01: where each one wins

Qwen2.5-Coder 32B is stronger on

HumanEval

Codestral 25.01 is stronger on

No benchmarks where Codestral 25.01 beats Qwen2.5-Coder 32B with comparable data.

Cost comparison

At 10M tokens/day (50/50 split), Qwen2.5-Coder 32B costs ~$1.80/day vs $6.00/day for Codestral 25.01 — Qwen2.5-Coder 32B is the cheaper pick at this volume.

Verdict

Qwen2.5-Coder 32B has the better-documented benchmark profile (composite 68.8). Codestral 25.01 is harder to compare quantitatively.

If you can only pick one and your workload is unclear, route via OpenRouter and switch by request — same key, no lock-in.

Frequently asked questions

Which is better, Qwen2.5-Coder 32B or Codestral 25.01?

Qwen2.5-Coder 32B has the better-documented benchmark profile (composite 68.8). Codestral 25.01 is harder to compare quantitatively. Qwen2.5-Coder 32B wins on HumanEval; Codestral 25.01 wins on no benchmarks.

What does Qwen2.5-Coder 32B cost compared to Codestral 25.01?

At 10M tokens/day (50/50 split), Qwen2.5-Coder 32B costs ~$1.80/day vs $6.00/day for Codestral 25.01 — Qwen2.5-Coder 32B is the cheaper pick at this volume.

What is the context window of Qwen2.5-Coder 32B vs Codestral 25.01?

Qwen2.5-Coder 32B: 131.072k tokens. Codestral 25.01: 256k tokens. Codestral 25.01 has the larger window — useful for long-document RAG and full-codebase prompting.

Is Qwen2.5-Coder 32B or Codestral 25.01 open source?

Qwen2.5-Coder 32B: open weights (Apache-2.0). Codestral 25.01: closed / proprietary.

Can I try Qwen2.5-Coder 32B and Codestral 25.01 on the same API key?

Yes — OpenRouter routes both models behind a single key, so you can A/B test Qwen2.5-Coder 32B against Codestral 25.01 without juggling provider accounts.

Model deep-dives: Qwen2.5-Coder 32B · Codestral 25.01 · Full leaderboard

Spotted out-of-date numbers? Open an issue — corrections usually ship within 24h.

Try Qwen2.5-Coder 32B and Codestral 25.01 now

One API key, both models — switch between them per request and let real traffic pick the winner.

Try Qwen2.5-Coder 32B → Try Codestral 25.01 → A/B test both via OpenRouter →