Compare

LLM 模型比較 2026

把主流模型的 Context、價格、Arena、Coding 與 Reasoning 表現整理成可篩選的比較頁。

涵蓋供應商

10

開源模型

9

最大 Context

10M

26 個模型
模型 供應商 發布 Context 價格 ($/1M) Arena ELO 程式能力 推理 速度 授權
GPT-5.4Unifies Codex + GPT; 1M context; built-in computer use
OpenAI2026-031M$3/$151560unknownunknown~50 t/s t/s 閉源
Claude Opus 4.6#1 Arena Hard Prompts & Coding; 128K max output
Anthropic2026-021M$5/$25154980.8% SWE-bench65.4% Terminal-Bench~40 t/s t/s 閉源
DeepSeek R1671B MoE (37B active); MIT license; distilled variants available
DeepSeek2025-01128K$0.55/$2.191500unknown#1 Math & Coding Arena~45 t/s t/s 開源
Gemini 3.1 Flash Lite#3 Arena overall; #1 creative writing; ultra-fast
Google2026-031M$0.10/$0.401492unknownunknown~200 t/s t/s 閉源
Gemini 2.5 ProThinking model; top WebDev Arena 1415; native multimodal
Google2025-031M$1.25/$10147075.6% LiveCodeBench84.6% GPQA Diamond~60 t/s t/s 閉源
Claude Sonnet 4.6Best value frontier; beats Opus 4.5 in 59% head-to-head
Anthropic2026-021M$3/$15144079.6% SWE-bench72.5% OSWorld~80 t/s t/s 閉源
Qwen 3 235B235B MoE (22B active); Apache 2.0; strongest OSS competitive programming
Alibaba2025-04128K$0.86/$2142270.7% LiveCodeBench2056 CodeForces ELO~65 t/s t/s 開源
Mistral Large 3675B MoE (41B active); Apache 2.0; best cost-efficiency frontier
Mistral2025-12256K$0.5/$1.51418unknown43.9% GPQA Diamond~70 t/s t/s 開源
o3Strongest OpenAI reasoning model
OpenAI2025-04200K$10/$401390unknownunknown~30 t/s t/s 閉源
Claude Opus 4.5Major price cut from Opus 4; strong agentic coding
Anthropic2025-11200K$5/$25138080.9% SWE-benchunknown~35 t/s t/s 閉源
Kimi K21T params; Agent Swarm (100 agents); Modified MIT
Moonshot2025-07128K$0.55/$2.2138065.8% SWE-bench60.2% BrowseComp~50 t/s t/s 開源
DeepSeek V3.2~90% GPT-5.4 quality at 1/50th cost; best value model
DeepSeek2026-02128K$0.28/$0.421380unknownunknown~80 t/s t/s 開源
Grok 3Strong math/science; now legacy (Grok 4 series launched)
xAI2025-02131K$3/$151370unknown93.3% AIME 2025~55 t/s t/s 閉源
GPT-4oLegacy but still available; superseded by GPT-5 family
OpenAI2024-05128K$2.5/$10134030.8% SWE-benchunknown~100 t/s t/s 閉源
Grok 4Top-5 Arena; strong reasoning & real-time X data
xAI2026-01256K$5/$251340unknownunknown~45 t/s t/s 閉源
Gemini 2.5 FlashCheapest frontier model at scale
Google2025-031M$0.30/$2.501330unknownunknown~150 t/s t/s 閉源
Claude Haiku 4.5Fastest Claude, cheapest tier
Anthropic2025-10200K$0.8/$41290unknownunknown~120 t/s t/s 閉源
Claude Opus 4.7Opus 4.7 features a new tokenizer that inflates token counts by 35-45%.
Anthropic128KN/A t/s 閉源
GPT-5.5OpenAI's latest model, GPT-5.5, offers advanced capabilities for coding and complex tasks.
OpenAI128KN/A t/s 閉源
Llama 4 Scout10M context industry record; 109B MoE (17B active)
Meta2025-0410M$0.08/$0.3N/Aunknownunknown~90 t/s t/s 開源
Muse SparkMuse Spark achieves its reasoning capabilities using over an order of magnitude less compute than Llama 4 Maverick.
MetaN/AN/A t/s 閉源
MiMo V2Free coding model; 256K context; open weights
Xiaomi2026-02256KFreeN/Aunknownunknown~70 t/s t/s 開源
Gemini 3.2 FlashFaster than Gemini 3.1 Pro with improved performance.
Google128KN/A t/s 閉源
Devstral 2Cheapest agentic coding model; 256K context
Mistral2026-01256K$0.05/$0.22N/Aunknownunknown~100 t/s t/s 開源
DeepSeek V4-ProDeepSeek V4-Pro offers a significant price reduction, making it one of the most cost-effective options in the market.
DeepSeek128KN/A t/s 閉源
Llama 4 Maverick400B MoE (17B active); strong multimodal; open weights
Meta2025-041M$0.15/$0.6N/Aunknownunknown~60 t/s t/s 開源
Arena ELO 1380+ Arena ELO 1350–1379 Arena ELO <1350 最佳 = 該欄最佳

Benchmark 數字為近似值,來源包含公開排行榜(LMSYS Chatbot Arena、官方文件)。價格以每百萬 tokens 的 input/output 美元顯示。速度為估算 tokens/sec,會因供應商而異。資料會自資料庫自動更新。