[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-gemini-3-1-pro-googles-top-model-in-numbers-zh":3,"tags-gemini-3-1-pro-googles-top-model-in-numbers-zh":34,"related-lang-gemini-3-1-pro-googles-top-model-in-numbers-zh":50,"related-posts-gemini-3-1-pro-googles-top-model-in-numbers-zh":54,"series-model-release-5a3c6417-77a9-4526-bee5-c355979576f2":91},{"id":4,"title":5,"content":6,"summary":7,"source":8,"source_url":9,"author":10,"image_url":11,"keywords":12,"language":22,"translated_content":10,"views":23,"is_premium":24,"created_at":25,"updated_at":25,"cover_image":11,"published_at":26,"rewrite_status":27,"rewrite_error":10,"rewritten_from_id":28,"slug":29,"category":30,"related_article_id":31,"status":32,"google_indexed_at":33,"x_posted_at":10,"tweet_text":10,"title_rewritten_at":10,"title_original":10,"key_takeaways":10,"topic_cluster_id":10,"embedding":10,"is_canonical_seed":24},"5a3c6417-77a9-4526-bee5-c355979576f2","Gemini 3.1 Pro 數字看真實力","\u003Cp>Google DeepMind 的 \u003Ca href=\"https:\u002F\u002Fgemini3.us\u002Fgemini-3.1-pro\" target=\"_blank\" rel=\"noopener\">Gemini 3.1 Pro\u003C\u002Fa> 這次很直接。它在 \u003Cstrong>ARC-AGI-2\u003C\u002Fstrong> 拿到 \u003Cstrong>77.1%\u003C\u002Fstrong>，在 \u003Cstrong>GPQA Diamond\u003C\u002Fstrong> 拿到 \u003Cstrong>94.3%\u003C\u002Fstrong>，\u003Cstrong>SWE-B\u003Ca href=\"\u002Fnews\u002Fopenai-122b-ai-infrastructure-push-zh\">en\u003C\u002Fa>ch Verified\u003C\u002Fstrong> 也有 \u003Cstrong>80.6%\u003C\u002Fstrong>。更扯的是，它還塞進了 \u003Cstrong>1,048,576 token\u003C\u002Fstrong> 的上下文窗口。\u003C\u002Fp>\u003Cp>講白了，這不是只會聊天的模型。它更像一台可以吞整包資料的工作機。上線時間是 \u003Cstrong>\u003Ca href=\"\u002Fnews\u002Fvs-code-cursor-windsurf-jetbrains-web-ides-2026-zh\">2026\u003C\u002Fa> 年 2 月 19 日\u003C\u002Fstrong>。價格也沒亂漲，還是 Gemini 3 的規格：\u003Cstrong>每 100 萬 input token 2 美元\u003C\u002Fstrong>，\u003Cstrong>每 100 萬 output token 12 美元\u003C\u002Fstrong>。\u003C\u002Fp>\u003Cp>對台灣開發者來說，這種組合很有感。因為很多團隊卡住，不是卡在模型不會答，而是卡在上下文太短、切資料太麻煩、成本又太高。Gemini 3.1 Pro 的賣點，就是把這三件事一起往前推。\u003C\u002Fp>\u003Ch2>先看它到底強在哪\u003C\u002Fh2>\u003Cp>先不要被行銷字眼帶走。看數字比較實在。\u003Ca href=\"https:\u002F\u002Fdeepmind.google\u002Ftechnologies\u002Fgemini\u002F\" target=\"_blank\" rel=\"noopener\">Gemini\u003C\u002Fa> 3.1 Pro 的重點，不是單一分數漂亮而已，而是它在推理、科學問答、程式修 bug 這三塊都站得住。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775153580311-vv9w.png\" alt=\"Gemini 3.1 Pro 數字看真實力\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>1M token 上下文，對長文件工作流很有用。你可以把整個 codebase、設計文件、測試紀錄、API 規格一起丟進去。以前要拆成 10 段，現在可能 1 次就夠。這會直接影響 agent 設計。\u003C\u002Fp>\u003Cp>它還支援最多 \u003Cstrong>65,536 output token\u003C\u002Fstrong>。這點常被忽略，但很重要。因為很多模型不是不會想，而是寫到一半就斷掉。對重構程式、產出長報告、整理研究材料來說，這很煩。\u003C\u002Fp>\u003Cul>\u003Cli>\u003Cstrong>ARC-AGI-2：\u003C\u002Fstrong>77.1%\u003C\u002Fli>\u003Cli>\u003Cstrong>GPQA Diamond：\u003C\u002Fstrong>94.3%\u003C\u002Fli>\u003Cli>\u003Cstrong>SWE-Bench Verified：\u003C\u002Fstrong>80.6%\u003C\u002Fli>\u003Cli>\u003Cstrong>上下文窗口：\u003C\u002Fstrong>1,048,576 token\u003C\u002Fli>\u003Cli>\u003Cstrong>輸出長度：\u003C\u002Fstrong>最多 65,536 token\u003C\u002Fli>\u003Cli>\u003Cstrong>定價：\u003C\u002Fstrong>input 2 美元、output 12 美元\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>為什麼這組 benchmark 很有意思\u003C\u002Fh2>\u003Cp>有些模型只會刷題。有些模型只會寫得像人話。Gemini 3.1 Pro 的數字比較像是三路都想打。\u003Ca href=\"https:\u002F\u002Farcprize.org\u002Farc-agi\" target=\"_blank\" rel=\"noopener\">ARC-AGI-2\u003C\u002Fa> 看抽象推理，\u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fopenai\u002FGPQA\" target=\"_blank\" rel=\"noopener\">GPQA Diamond\u003C\u002Fa> 看研究型知識，\u003Ca href=\"https:\u002F\u002Fwww.swebench.com\u002F\" target=\"_blank\" rel=\"noopener\">SWE-Bench Verified\u003C\u002Fa> 看真實 repo 修 bug。這三個放一起，才看得出模型有沒有料。\u003C\u002Fp>\u003Cp>Google 也丟了幾個更偏實戰的數字。像是 \u003Cstrong>LiveCodeBench Pro 2887 Elo\u003C\u002Fstrong>、\u003Cstrong>MCP Atlas 69.2%\u003C\u002Fstrong>、\u003Cstrong>BrowseComp 85.9%\u003C\u002Fstrong>。這些分數不算好記，但意思很清楚。它在 coding、工具協作、網頁研究這三件事上，都有不錯表現。\u003C\u002Fp>\u003Cp>我覺得這比單看聊天品質更重要。因為現在很多 AI 專案，最後都會走向 agent。你不是只問它一句話。你是要它找資料、跑工具、改程式、驗證結果。這時候 benchmark 就不是裝飾品，而是成本預測工具。\u003C\u002Fp>\u003Cblockquote>\"The ultimate goal is to build a universal assistant.\" — Demis Hassabis\u003C\u002Fblockquote>\u003Cp>這句話來自 \u003Ca href=\"https:\u002F\u002Fwww.theverge.com\u002F2024\u002F2\u002F8\u002F24065507\u002Fgoogle-deepmind-demis-hassabis-interview-gemini-ai\" target=\"_blank\" rel=\"noopener\">The Verge\u003C\u002Fa> 對 Demis Hassabis 的訪談。講得很白。Google 想做的不是單純聊天機，而是能處理工作流的通用助理。\u003C\u002Fp>\u003Ch2>價格、競品、還有誰比較划算\u003C\u002Fh2>\u003Cp>很多人只看分數，這很容易中招。真正在意成本的團隊，會先算 token 單價。Gemini 3.1 Pro 的 input 是 \u003Cstrong>$2 \u002F 1M tokens\u003C\u002Fstrong>，output 是 \u003Cstrong>$12 \u002F 1M tokens\u003C\u002Fstrong>。這個價格在長上下文模型裡，算很能打。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775153592714-r1rd.png\" alt=\"Gemini 3.1 Pro 數字看真實力\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>拿 \u003Ca href=\"https:\u002F\u002Fwww.anthropic.com\u002Fclaude\" target=\"_blank\" rel=\"noopener\">Claude\u003C\u002Fa> 來比，頁面上的數字顯示 \u003Ca href=\"https:\u002F\u002Fwww.anthropic.com\u002Fclaude\" target=\"_blank\" rel=\"noopener\">Claude Opus 4.6\u003C\u002Fa> 是 \u003Cstrong>$15 input\u003C\u002Fstrong>、\u003Cstrong>$75 output\u003C\u002Fstrong>。差距很大。對每天跑大量摘要、比對文件、生成程式碼的團隊來說，這不是小錢。\u003C\u002Fp>\u003Cp>再看 \u003Ca href=\"https:\u002F\u002Fopenai.com\u002Findex\u002Fgpt-5-4\u002F\" target=\"_blank\" rel=\"noopener\">GPT-5.4\u003C\u002Fa>。Google 在頁面上列出幾個對照。Gemini 3.1 Pro 在 \u003Cstrong>ARC-AGI-2\u003C\u002Fstrong> 和 \u003Cstrong>GPQA Diamond\u003C\u002Fstrong> 領先。GPT-5.4 則在 \u003Cstrong>OSWorld\u003C\u002Fstrong>、\u003Cstrong>GDPval\u003C\u002Fstrong> 這類電腦操作和辦公任務上更強。也就是說，沒有誰是全包。\u003C\u002Fp>\u003Cul>\u003Cli>\u003Cstrong>ARC-AGI-2：\u003C\u002Fstrong>Gemini 3.1 Pro 77.1%，GPT-5.4 73.3%，Claude Opus 4.6 68.8%\u003C\u002Fli>\u003Cli>\u003Cstrong>GPQA Diamond：\u003C\u002Fstrong>Gemini 3.1 Pro 94.3%，GPT-5.4 92.8%，Claude Opus 4.6 91.3%\u003C\u002Fli>\u003Cli>\u003Cstrong>SWE-Bench Verified：\u003C\u002Fstrong>Gemini 3.1 Pro 80.6%，GPT-5.2 80.0%，Claude Opus 4.6 80.8%\u003C\u002Fli>\u003Cli>\u003Cstrong>OSWorld：\u003C\u002Fstrong>GPT-5.4 75.0%，Claude Opus 4.6 72.7%，Gemini 3.1 Pro 未列為領先者\u003C\u002Fli>\u003Cli>\u003Cstrong>價格：\u003C\u002Fstrong>Gemini 3.1 Pro 明顯低於 Claude Opus 4.6\u003C\u002Fli>\u003Cli>\u003Cstrong>適合情境：\u003C\u002Fstrong>長文件、研究、程式碼、agent 工作流\u003C\u002Fli>\u003C\u002Ful>\u003Ch2>這對開發者代表什麼\u003C\u002Fh2>\u003Cp>如果你是工程師，這顆模型最有感的地方，不是聊天，而是 workflow。1M token 上下文代表你可以少切很多段。少切段，就少掉很多 prompt 管理成本。這對 code review、repo 分析、規格比對、測試失敗追查都很有幫助。\u003C\u002Fp>\u003Cp>另一個重點是視覺輸出。Google 提到它支援 na\u003Ca href=\"\u002Fnews\u002Fbuild-rust-rest-api-actix-sqlx-postgres-zh\">ti\u003C\u002Fa>ve SVG 和 3D code rendering。這聽起來有點花，但實際上很實用。你可以直接叫它生圖表、簡單 UI、流程圖，甚至用在內部工具的原型設計。少一輪人工轉譯，就少一輪出錯。\u003C\u002Fp>\u003Cp>它還有三種思考等級：\u003Cstrong>Low\u003C\u002Fstrong>、\u003Cstrong>Medium\u003C\u002Fstrong>、\u003Cstrong>High\u003C\u002Fstrong>。這設計很務實。不是每個問題都要重算一大堆 Token。分類、抽取、簡單摘要，用 Low 就好。複雜 debug 或多步推理，再開 High。這種控制感，對成本控管很重要。\u003C\u002Fp>\u003Cp>如果你要接 API，\u003Ca href=\"https:\u002F\u002Fai.google.dev\u002F\" target=\"_blank\" rel=\"noopener\">Google AI\u003C\u002Fa> 和 \u003Ca href=\"https:\u002F\u002Fcloud.google.com\u002Fvertex-ai\" target=\"_blank\" rel=\"noopener\">Vertex AI\u003C\u002Fa> 會是主要入口。這也代表它比較像企業工具，不是純消費級聊天產品。對團隊來說，這反而是好事，因為你比較容易把它塞進既有系統。\u003C\u002Fp>\u003Ch2>產業脈絡：大上下文已經不是噱頭\u003C\u002Fh2>\u003Cp>大上下文模型這兩年一直在往前推。原因很簡單。企業資料就是碎的。文件在 Confluence，程式碼在 GitHub，聊天紀錄在 Slack，規格在 Notion。你如果每次都要拆來拆去，AI 就很難真的進工作流。\u003C\u002Fp>\u003Cp>所以現在大家比的不只是模型會不會答，而是它能不能一次看懂整包資料。這也是為什麼 1M token 會被拿來當賣點。因為它直接改變了「一次能處理多少上下文」這個基本單位。\u003C\u002Fp>\u003Cp>另一個趨勢是 agent 化。模型不只回文字，還要會呼叫工具、查資料、改程式、做驗證。這也是為什麼 MCP Atlas、BrowseComp 這類分數會變重要。它們其實在測，模型能不能跟外部工具和平共處。\u003C\u002Fp>\u003Cp>如果你回頭看這波競爭，會發現每家都在搶同一件事：誰能讓 AI 少一點玩具感，多一點工作機感。Gemini 3.1 Pro 這次的數字，至少讓 Google 在這場牌局裡坐到前排。\u003C\u002Fp>\u003Ch2>結論：先別問它會不會取代誰\u003C\u002Fh2>\u003Cp>比較實際的問題是：你的團隊會不會開始把它當預設模型。對長文件、研究、程式碼、agent 工作流來說，我覺得答案很可能是會。因為它的價格、上下文、分數，三個條件湊在一起，真的很難忽視。\u003C\u002Fp>\u003Cp>但如果你的場景是電腦操作、桌面自動化、某些辦公任務，那 GPT-5.4 或 Claude 仍然有機會更合適。講白了，這不是選邊站。這是選工作型態。你要的是推理、成本，還是操作能力，答案會不一樣。\u003C\u002Fp>\u003Cp>我的預測很簡單。接下來 6 到 12 個月，會有更多團隊把「一次丟整包資料」當標準做法。不是因為大家懶，是因為成本算得過去。你如果正在做 AI 產品，現在就該測一輪長上下文流程。別只看 demo。看真實資料，才知道這顆模型到底有沒有料。\u003C\u002Fp>","Gemini 3.1 Pro 以 77.1% ARC-AGI-2、94.3% GPQA Diamond、1M token 上下文登場，價格仍維持 Gemini 3。這次重點不是噱頭，而是長文檔、程式碼與 agent 工作流的實戰成本。","gemini3.us","https:\u002F\u002Fgemini3.us\u002Fgemini-3.1-pro",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775153580311-vv9w.png",[13,14,15,16,17,18,19,20,21],"Gemini 3.1 Pro","Google DeepMind","LLM","長上下文","ARC-AGI-2","GPQA Diamond","SWE-Bench Verified","Vertex AI","Google AI","zh",1,false,"2026-04-02T18:12:41.777858+00:00","2026-04-02T18:12:41.664+00:00","done","7744bf3f-bcf7-458b-9851-4cb442fec49f","gemini-3-1-pro-googles-top-model-in-numbers-zh","model-release","04e78fe1-7f49-40db-bfb2-7bb4b3579276","published","2026-04-08T09:00:49.888+00:00",[35,37,39,41,42,44,46,48],{"name":18,"slug":36},"gpqa-diamond",{"name":17,"slug":38},"arc-agi-2",{"name":15,"slug":40},"llm",{"name":16,"slug":16},{"name":14,"slug":43},"google-deepmind",{"name":19,"slug":45},"swe-bench-verified",{"name":21,"slug":47},"google-ai",{"name":20,"slug":49},"vertex-ai",{"id":31,"slug":51,"title":52,"language":53},"gemini-3-1-pro-googles-top-model-in-numbers-en","Gemini 3.1 Pro: Google’s new top model in numbers","en",[55,61,67,73,79,85],{"id":56,"slug":57,"title":58,"cover_image":59,"image_url":59,"created_at":60,"category":30},"bd8cfc0e-66db-4546-9b9e-fa328f7538d6","weishenme-google-yincang-de-gemini-live-moxing-bi-yanshi-gen-zh","為什麼 Google 隱藏的 Gemini Live 模型，比演示更重要","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778869245574-c25w.png","2026-05-15T18:20:23.111559+00:00",{"id":62,"slug":63,"title":64,"cover_image":65,"image_url":65,"created_at":66,"category":30},"5b5fa24f-5259-4e9e-8270-b08b6805f281","minimax-m1-open-hybrid-attention-reasoning-model-zh","MiniMax-M1：開源 1M Token 推理模型","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778797859209-ea1g.png","2026-05-14T22:30:38.636592+00:00",{"id":68,"slug":69,"title":70,"cover_image":71,"image_url":71,"created_at":72,"category":30},"b1da56ac-8019-4c6b-a8dc-22e6e22b1cb5","gemini-omni-video-review-text-rendering-zh","Gemini Omni 影片模型怎麼了","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778779280109-lrrk.png","2026-05-14T17:20:42.608312+00:00",{"id":74,"slug":75,"title":76,"cover_image":77,"image_url":77,"created_at":78,"category":30},"d63e9d93-e613-4bbf-8135-9599fde11d08","why-xiaomi-mimo-v25-pro-changes-coding-agents-zh","為什麼 Xiaomi 的 MiMo-V2.5-Pro 改變的是 Coding …","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778689858139-v38e.png","2026-05-13T16:30:27.893951+00:00",{"id":80,"slug":81,"title":82,"cover_image":83,"image_url":83,"created_at":84,"category":30},"8f0c9185-52f9-46f2-82c6-5baec126ba2e","openai-realtime-audio-models-live-voice-zh","OpenAI 即時音訊模型瞄準語音互動","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778451657895-2iu7.png","2026-05-10T22:20:32.443798+00:00",{"id":86,"slug":87,"title":88,"cover_image":89,"image_url":89,"created_at":90,"category":30},"52106dc2-4eba-4ca0-8318-fa646064de97","anthropic-10-finance-ai-agents-zh","Anthropic推10款金融AI Agent","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778389843399-vclb.png","2026-05-10T05:10:22.778762+00:00",[92,97,102,107,112,117,122,127,132,137],{"id":93,"slug":94,"title":95,"created_at":96},"58b64033-7eb6-49b9-9aab-01cf8ae1b2f2","nvidia-rubin-six-chips-one-ai-supercomputer-zh","NVIDIA Rubin 把六顆晶片塞進 AI 機櫃","2026-03-26T07:18:45.861277+00:00",{"id":98,"slug":99,"title":100,"created_at":101},"0dcc2c61-c2a6-480d-adb8-dd225fc68914","march-2026-ai-model-news-what-mattered-zh","2026 年 3 月 AI 模型新聞重點","2026-03-26T07:32:08.386348+00:00",{"id":103,"slug":104,"title":105,"created_at":106},"214ab08b-5ce5-4b5c-8b72-47619d8675dd","why-small-models-are-winning-on-device-ai-zh","小模型為何吃下裝置端 AI","2026-03-26T07:36:30.488966+00:00",{"id":108,"slug":109,"title":110,"created_at":111},"785624b2-0355-4b82-adc3-de5e45eecd88","midjourney-v8-faster-images-higher-costs-zh","Midjourney V8 變快了，也變貴了","2026-03-26T07:52:03.562971+00:00",{"id":113,"slug":114,"title":115,"created_at":116},"cda76b92-d209-4134-86c1-a60f5bc7b128","xiaomi-mimo-trio-agents-robots-voice-zh","小米 MiMo 三模型瞄準代理、機器人與語音","2026-03-28T03:05:08.779489+00:00",{"id":118,"slug":119,"title":120,"created_at":121},"9e1044b4-946d-47fe-9e2a-c2ee032e1164","xiaomi-mimo-v2-pro-1t-moe-agents-zh","小米 MiMo-V2-Pro 登場：1T MoE 模型","2026-03-28T03:06:19.002353+00:00",{"id":123,"slug":124,"title":125,"created_at":126},"d68e59a2-55eb-4a8f-95d6-edc8fcbff581","cursor-composer-2-started-from-kimi-zh","Cursor Composer 2 其實從 Kimi 起步","2026-03-28T03:11:58.893796+00:00",{"id":128,"slug":129,"title":130,"created_at":131},"c4b6186f-bd84-4598-997e-c6e31d543c0d","cursor-composer-2-agentic-coding-model-zh","Cursor Composer 2 走向代理式寫碼","2026-03-28T03:13:06.422716+00:00",{"id":133,"slug":134,"title":135,"created_at":136},"45812c46-99fc-4b1f-aae1-56f64f5c9024","openai-shuts-down-sora-video-app-api-zh","OpenAI 關閉 Sora App 與 API","2026-03-29T04:47:48.974108+00:00",{"id":138,"slug":139,"title":140,"created_at":141},"e112e76f-ec3b-408f-810e-e93ae21a888a","apple-siri-gemini-distilled-models-zh","Apple Siri 牽手 Gemini 的真相","2026-03-29T04:52:57.886544+00:00"]