Compare

LLM 模型比較 2026

把主流模型的 Context、價格、Arena、Coding 與 Reasoning 表現整理成可篩選的比較頁。

涵蓋供應商

開源模型

最大 Context

10M

26 個模型

模型 ⇅	供應商 ⇅	發布 ⇅	Context ⇅	價格 ($/1M)	Arena ELO ▼	程式能力 ⇅	推理 ⇅	速度	授權
GPT-5.4Unifies Codex + GPT; 1M context; built-in computer use	OpenAI	2026-03	1M	$3/$15	1560	unknown	unknown	~50 t/s t/s	閉源
Claude Opus 4.6#1 Arena Hard Prompts & Coding; 128K max output	Anthropic	2026-02	1M	$5/$25	1549	80.8% SWE-bench	65.4% Terminal-Bench	~40 t/s t/s	閉源
DeepSeek R1671B MoE (37B active); MIT license; distilled variants available	DeepSeek	2025-01	128K	$0.55/$2.19	1500	unknown	#1 Math & Coding Arena	~45 t/s t/s	開源
Gemini 3.1 Flash Lite#3 Arena overall; #1 creative writing; ultra-fast	Google	2026-03	1M	$0.10/$0.40	1492	unknown	unknown	~200 t/s t/s	閉源
Gemini 2.5 ProThinking model; top WebDev Arena 1415; native multimodal	Google	2025-03	1M	$1.25/$10	1470	75.6% LiveCodeBench	84.6% GPQA Diamond	~60 t/s t/s	閉源
Claude Sonnet 4.6Best value frontier; beats Opus 4.5 in 59% head-to-head	Anthropic	2026-02	1M	$3/$15	1440	79.6% SWE-bench	72.5% OSWorld	~80 t/s t/s	閉源
Qwen 3 235B235B MoE (22B active); Apache 2.0; strongest OSS competitive programming	Alibaba	2025-04	128K	$0.86/$2	1422	70.7% LiveCodeBench	2056 CodeForces ELO	~65 t/s t/s	開源
Mistral Large 3675B MoE (41B active); Apache 2.0; best cost-efficiency frontier	Mistral	2025-12	256K	$0.5/$1.5	1418	unknown	43.9% GPQA Diamond	~70 t/s t/s	開源
o3Strongest OpenAI reasoning model	OpenAI	2025-04	200K	$10/$40	1390	unknown	unknown	~30 t/s t/s	閉源
Claude Opus 4.5Major price cut from Opus 4; strong agentic coding	Anthropic	2025-11	200K	$5/$25	1380	80.9% SWE-bench	unknown	~35 t/s t/s	閉源
Kimi K21T params; Agent Swarm (100 agents); Modified MIT	Moonshot	2025-07	128K	$0.55/$2.2	1380	65.8% SWE-bench	60.2% BrowseComp	~50 t/s t/s	開源
DeepSeek V3.2~90% GPT-5.4 quality at 1/50th cost; best value model	DeepSeek	2026-02	128K	$0.28/$0.42	1380	unknown	unknown	~80 t/s t/s	開源
Grok 3Strong math/science; now legacy (Grok 4 series launched)	xAI	2025-02	131K	$3/$15	1370	unknown	93.3% AIME 2025	~55 t/s t/s	閉源
GPT-4oLegacy but still available; superseded by GPT-5 family	OpenAI	2024-05	128K	$2.5/$10	1340	30.8% SWE-bench	unknown	~100 t/s t/s	閉源
Grok 4Top-5 Arena; strong reasoning & real-time X data	xAI	2026-01	256K	$5/$25	1340	unknown	unknown	~45 t/s t/s	閉源
Gemini 2.5 FlashCheapest frontier model at scale	Google	2025-03	1M	$0.30/$2.50	1330	unknown	unknown	~150 t/s t/s	閉源
Claude Haiku 4.5Fastest Claude, cheapest tier	Anthropic	2025-10	200K	$0.8/$4	1290	unknown	unknown	~120 t/s t/s	閉源
Claude Opus 4.7Opus 4.7 features a new tokenizer that inflates token counts by 35-45%.	Anthropic		128K		N/A			t/s	閉源
GPT-5.5OpenAI's latest model, GPT-5.5, offers advanced capabilities for coding and complex tasks.	OpenAI		128K		N/A			t/s	閉源
Llama 4 Scout10M context industry record; 109B MoE (17B active)	Meta	2025-04	10M	$0.08/$0.3	N/A	unknown	unknown	~90 t/s t/s	開源
Muse SparkMuse Spark achieves its reasoning capabilities using over an order of magnitude less compute than Llama 4 Maverick.	Meta		N/A		N/A			t/s	閉源
MiMo V2Free coding model; 256K context; open weights	Xiaomi	2026-02	256K	Free	N/A	unknown	unknown	~70 t/s t/s	開源
Gemini 3.2 FlashFaster than Gemini 3.1 Pro with improved performance.	Google		128K		N/A			t/s	閉源
Devstral 2Cheapest agentic coding model; 256K context	Mistral	2026-01	256K	$0.05/$0.22	N/A	unknown	unknown	~100 t/s t/s	開源
DeepSeek V4-ProDeepSeek V4-Pro offers a significant price reduction, making it one of the most cost-effective options in the market.	DeepSeek		128K		N/A			t/s	閉源
Llama 4 Maverick400B MoE (17B active); strong multimodal; open weights	Meta	2025-04	1M	$0.15/$0.6	N/A	unknown	unknown	~60 t/s t/s	開源

Arena ELO 1380+ Arena ELO 1350–1379 Arena ELO <1350 最佳 = 該欄最佳

Benchmark 數字為近似值，來源包含公開排行榜（LMSYS Chatbot Arena、官方文件）。價格以每百萬 tokens 的 input/output 美元顯示。速度為估算 tokens/sec，會因供應商而異。資料會自資料庫自動更新。