[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-nvidia-sets-new-mlperf-inference-records-zh":3,"tags-nvidia-sets-new-mlperf-inference-records-zh":35,"related-lang-nvidia-sets-new-mlperf-inference-records-zh":52,"related-posts-nvidia-sets-new-mlperf-inference-records-zh":56,"series-industry-d9fda242-d695-4ea4-a0e0-c6c64ad72965":93},{"id":4,"title":5,"content":6,"summary":7,"source":8,"source_url":9,"author":10,"image_url":11,"keywords":12,"language":23,"translated_content":10,"views":24,"is_premium":25,"created_at":26,"updated_at":26,"cover_image":11,"published_at":27,"rewrite_status":28,"rewrite_error":10,"rewritten_from_id":29,"slug":30,"category":31,"related_article_id":32,"status":33,"google_indexed_at":34,"x_posted_at":10,"tweet_text":10,"title_rewritten_at":10,"title_original":10,"key_takeaways":10,"topic_cluster_id":10,"embedding":10,"is_canonical_seed":25},"d9fda242-d695-4ea4-a0e0-c6c64ad72965","NVIDIA 再刷 MLPerf 推論紀錄","\u003Cp>\u003Ca href=\"https:\u002F\u002Fdeveloper.nvidia.com\u002Fblog\u002Fnvidia-extreme-co-design-delivers-new-mlperf-inference-records\u002F\" target=\"_blank\" rel=\"noopener\">NVIDIA\u003C\u002Fa> 這次又來刷榜了。\u003Ca href=\"https:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fdata-center\u002Fgb300-nvl72\u002F\" target=\"_blank\" rel=\"noopener\">GB300 NVL72\u003C\u002Fa> 在 MLPerf Inference v6.0 的 DeepSeek-R1 伺服器測試，官方說比前一版快了 2.7 倍。Llama 3.1 405B 也提升 1.5 倍。講白了，這種數字不是拿來拍簡報，是拿來算每個 Token 成本的。\u003C\u002Fp>\u003Cp>這次更有意思的點，不是單一成績。\u003Ca href=\"https:\u002F\u002Fmlcommons.org\u002Fen\u002Finference-overview\u002F\" target=\"_blank\" rel=\"noopener\">MLPerf Inference\u003C\u002Fa> v6.0 把題目加難了。它加入多模態、影片生成、互動推理，還有新的推薦系統測試。NVIDIA 這回幾乎全包。像 \u003Ca href=\"https:\u002F\u002Fwww.deepseek.com\u002F\" target=\"_blank\" rel=\"noopener\">DeepSeek-R1\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fqwenlm.github.io\u002F\" target=\"_blank\" rel=\"noopener\">Qwen3-VL-235B-A22B\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fopenai.com\u002Findex\u002Fintroducing-gpt-oss-120b\u002F\" target=\"_blank\" rel=\"noopener\">GPT-OSS-120B\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fgithub.com\u002FWan-Video\u002FWan2.2\" target=\"_blank\" rel=\"noopener\">WAN-2.2-T2V-A14B\u003C\u002Fa> 都有參與。這代表它不是只會跑單一 LLM，而是整套推論堆疊都在拚。\u003C\u002Fp>\u003Cp>你可能會想問，這跟一般開發者有什麼關係。答案很直接。訓練模型很燒錢，但推論才是上線後的日常。吞吐量高一點，伺服器就能多接幾個人。延遲低一點，產品體感就差很多。每秒多吐幾千個 Token，帳單差距也會很真實。\u003C\u002Fp>\u003Ch2>MLPerf v6.0 到底改了什麼\u003C\u002Fh2>\u003Cp>MLPerf Inference 一直在改題目。這次不是小修小補，而是把很多真實場景拉進來。以前你可能只看文字分類、影像辨識。現在直接上多模態、影片、互動式 LLM，還有推薦系統。這些工作負載更接近生產環境，也更難作弊。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775122496881-vxz0.png\" alt=\"NVIDIA 再刷 MLPerf 推論紀錄\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>對硬體廠來說，這種變化很煩。因為你不能只靠某個模型的特化優化混過去。你得同時處理 prefill、decode、batch\u003Ca href=\"\u002Fnews\u002Fchainalysis-agents-crypto-investigations-compliance-zh\">in\u003C\u002Fa>g、記憶體搬移，還要顧到網路。說白了，這是整個系統在比，不是單顆 GPU 在比。\u003C\u002Fp>\u003Cp>NVIDIA 這次說自己在新增項目上都拿到頂尖吞吐。這句話聽起來很像公關稿，但背後有工程味。因為新增工作負載越多，代表你的軟體堆疊越不能偏科。只會跑文字模型的時代，現在真的沒那麼好混了。\u003C\u002Fp>\u003Cul>\u003Cli>DeepSeek-R1 server：2,494,310 tokens\u002Fsec\u003C\u002Fli>\u003Cli>GPT-OSS-120B server：1,096,770 tokens\u002Fsec\u003C\u002Fli>\u003Cli>Qwen3-VL offline：79 samples\u002Fsec\u003C\u002Fli>\u003Cli>DLRMv3 offline：104,637 samples\u002Fsec\u003C\u002Fli>\u003Cli>GB300 NVL72 對 DeepSeek-R1 提升：2.77x\u003C\u002Fli>\u003C\u002Ful>\u003Cp>這些數字看起來很硬，但其實很好懂。伺服器吞吐越高，雲端業者越能把同一批機器切給更多客戶。對企業內部 AI 服務來說，則是同樣的機房空間，能跑更多查詢。這就是推論優化最現實的價值。\u003C\u002Fp>\u003Ch2>為什麼軟體會決定成績\u003C\u002Fh2>\u003Cp>很多人看到這種新聞，第一反應是「又是新 GPU 很強」。但老實說，這只對一半。NVIDIA 自己也很清楚，真正拉開差距的，常常是軟體。像 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002FNVIDIA\u002Fdynamo\" target=\"_blank\" rel=\"noopener\">NVIDIA Dynamo\u003C\u002Fa>、TensorRT-LLM、以及各種模型專用最佳化，才是把硬體榨乾的關鍵。\u003C\u002Fp>\u003Cp>這次的優化手法很工程宅。像是 kernel fusion，可以減少啟動次數。attention 的資料排程調整，可以讓不同請求更平均地吃到算力。disaggregated serving 則把 prefill 和 decode 分開，讓兩段工作各自調參。這些名詞很硬，但效果很實際。\u003C\u002Fp>\u003Cp>對 MoE 模型來說，Wide Expert Parallel、Multi-Token Prediction、KV-aware routing 也很重要。因為這類模型不是單純堆參數就好。它們的瓶頸常常在路由、記憶體、以及小 batch 互動延遲。只要其中一段卡住，整體體感就會爛掉。\u003C\u002Fp>\u003Cblockquote>“If you can make one thing 10 percent better, that’s great. If you can make 10 things 1 percent better, that’s much more powerful.” — Jensen Huang, NVIDIA GTC 2024 keynote\u003C\u002Fblockquote>\u003Cp>這句話拿來看這次結果，很貼切。NVIDIA 不是靠單一招式吃天下，而是把很多小優化疊起來。每個地方多賺一點，最後就變成很可怕的總和。這種作法很像在做系統工程，不像在賣夢。\u003C\u002Fp>\u003Cp>我覺得這也提醒一件事。做 AI 產品的人，別只盯模型名字。真正影響成本的，還有 serving 架構、batch 策略、網路、KV cache 管理。模型本身很重要，但系統設計常常更誠實。\u003C\u002Fp>\u003Ch2>這次數字為什麼有參考價值\u003C\u002Fh2>\u003Cp>最有用的比較，是看同一套硬體前後差多少。NVIDIA 提到，GB300 NVL72 在 DeepSeek-R1 server 測試，從每 GPU 2,907 tokens\u002Fsec 拉到 8,064 tokens\u002Fsec。這不是小修小補，是非常明顯的提升。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775122506903-fv7d.png\" alt=\"NVIDIA 再刷 MLPerf 推論紀錄\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>Llama 3.1 405B 也有進步。server 模式從 170 tokens\u002Fsec\u002Fgpu 變成 259。offl\u003Ca href=\"\u002Fnews\u002Fwhy-prompt-engineering-isnt-engineering-zh\">ine\u003C\u002Fa> 模式從 224 變成 271。這表示就算是比較老的 dense model，系統還是能挖出額外空間。這點對企業很重要，因為很多公司不會只跑最新模型。\u003C\u002Fp>\u003Cp>再看系統層級，NVIDIA 說四套 \u003Ca href=\"https:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fdata-center\u002Fgb300-nvl72\u002F\" target=\"_blank\" rel=\"noopener\">GB300 NVL72\u003C\u002Fa> 搭配 \u003Ca href=\"https:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fnetworking\u002Fquantum-x800\u002F\" target=\"_blank\" rel=\"noopener\">Quantum-X800 InfiniBand\u003C\u002Fa>、共 288 顆 Blackwell Ultra GPU，拿下系統級吞吐紀錄。這種配置很像大型 AI 工廠的標配，不是一般實驗室玩具。\u003C\u002Fp>\u003Cul>\u003Cli>DeepSeek-R1 server：2,907 → 8,064 tokens\u002Fsec\u002Fgpu\u003C\u002Fli>\u003Cli>DeepSeek-R1 offline：5,842 → 9,821 tokens\u002Fsec\u002Fgpu\u003C\u002Fli>\u003Cli>Llama 3.1 405B server：170 → 259 tokens\u002Fsec\u002Fgpu\u003C\u002Fli>\u003Cli>Llama 3.1 405B offline：224 → 271 tokens\u002Fsec\u002Fgpu\u003C\u002Fli>\u003Cli>DeepSeek-R1 server 提升：2.77x\u003C\u002Fli>\u003Cli>Llama 3.1 405B server 提升：1.52x\u003C\u002Fli>\u003C\u002Ful>\u003Cp>如果把這些數字翻成商業語言，就是同樣一組機器，能服務更多請求，或把相同流量壓到更少機器上。對雲端業者來說，這直接影響毛利。對自建機房的團隊來說，則是少買幾台伺服器的差別。\u003C\u002Fp>\u003Cp>這也是為什麼 inference benchmark 不能只看峰值。你要看的是穩定輸出、互動延遲、以及系統整合後的結果。單點分數很漂亮，但如果上線後 cache 爆掉，照樣沒用。\u003C\u002Fp>\u003Ch2>競品和市場脈絡怎麼看\u003C\u002Fh2>\u003Cp>這波不是 NVIDIA 一家在玩。\u003Ca href=\"https:\u002F\u002Fwww.asus.com\u002F\" target=\"_blank\" rel=\"noopener\">ASUS\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fwww.cisco.com\u002F\" target=\"_blank\" rel=\"noopener\">Cisco\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fwww.coreweave.com\u002F\" target=\"_blank\" rel=\"noopener\">CoreWeave\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fwww.dell.com\u002F\" target=\"_blank\" rel=\"noopener\">Dell Technologies\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fwww.supermicro.com\u002F\" target=\"_blank\" rel=\"noopener\">Supermicro\u003C\u002Fa>、\u003Ca href=\"https:\u002F\u002Fwww.lenovo.com\u002F\" target=\"_blank\" rel=\"noopener\">Lenovo\u003C\u002Fa> 都有在 NVIDIA 平台上提交結果。這代表整個生態系都在圍著推論效能轉。\u003C\u002Fp>\u003Cp>這也解釋了為什麼 NVIDIA 會一直推開源工具。像 TensorRT-LLM、Dynamo，還有 \u003Ca href=\"https:\u002F\u002Fgithub.com\u002Fvllm-project\u002Fvllm\" target=\"_blank\" rel=\"noopener\">vLLM\u003C\u002Fa>，都不是單純的附加品。它們讓平台更像預設選項。對很多團隊來說，能少踩坑就是價值。\u003C\u002Fp>\u003Cp>如果拿競品來看，AMD、Intel、甚至雲端自研晶片，現在都在拚推論效率。但現實是，生態完整度很難追。硬體是一層，編譯器是一層，serving 框架又是一層。少一層，整體就會很卡。\u003C\u002Fp>\u003Cul>\u003Cli>NVIDIA：強在 GPU、網路、軟體整套\u003C\u002Fli>\u003Cli>AMD：硬體進步快，但軟體生態還在追\u003C\u002Fli>\u003Cli>Intel：偏向 CPU 與部分加速方案\u003C\u002Fli>\u003Cli>雲端自研晶片：成本漂亮，但可移植性較弱\u003C\u002Fli>\u003Cli>vLLM：對開放生態很重要，已成常見 serving 選項\u003C\u002Fli>\u003C\u002Ful>\u003Cp>我自己的看法很直接。推論市場現在不是比誰會喊口號，而是比誰能把模型真的跑便宜、跑穩、跑快。MLPerf 的價值就在這裡。它至少逼大家面對同一套題目。\u003C\u002Fp>\u003Ch2>台灣團隊該看什麼\u003C\u002Fh2>\u003Cp>如果你是做 AI 產品、SaaS，或內部知識助理，這些數字不是遙遠新聞。它會直接影響你的雲端帳單。尤其是每天有大量互動請求的服務，Token 成本常常比你想像中更快爆。\u003C\u002Fp>\u003Cp>台灣很多團隊現在卡在兩個問題。第一是模型選得太大。第二是 serving 沒有認真調。其實不少場景不需要最強模型，只需要夠穩、夠快、夠便宜。這時候推論系統的優化，比換更大模型還實際。\u003C\u002Fp>\u003Cp>所以這篇新聞的重點，不只是 NVIDIA 又拿了幾個紀錄，而是它把推論當成長期戰場在打。對開發者來說，該學的不是怎麼背 benchmark，而是怎麼看懂 throughput、latency、batch、KV cache、以及網路瓶頸。\u003C\u002Fp>\u003Ch2>結尾：真正該追的不是榜單，是成本\u003C\u002Fh2>\u003Cp>我覺得接下來 12 個月，推論競爭會更像系統戰。模型會繼續長大，但能不能便宜跑、穩定跑，會更重要。你如果在選平台，別只看峰值數字。請直接問供應商：每百萬 Token 成本多少，互動延遲多少，滿載時掉多少。\u003C\u002Fp>\u003Cp>如果你是工程團隊，現在就可以做一件事。把你們最常見的 3 種請求拿出來測。看 prefill、decode、batch size、以及 cache 命中率。很多時候，優化 \u003Ca href=\"\u002Fnews\u002Fapril-2026-ai-model-releases-zh\">20\u003C\u002Fa>% 不是換硬體，而是把 serving 調對。這種事很土，但很有效。\u003C\u002Fp>","NVIDIA 在 MLPerf Inference v6.0 再交出新成績，GB300 NVL72 對 DeepSeek-R1 伺服器推論提升 2.7x，Llama 3.1 405B 也提升 1.5x。","developer.nvidia.com","https:\u002F\u002Fdeveloper.nvidia.com\u002Fblog\u002Fnvidia-extreme-co-design-delivers-new-mlperf-inference-records\u002F",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775122496881-vxz0.png",[13,14,15,16,17,18,19,20,21,22],"NVIDIA","MLPerf","推論","GB300 NVL72","Blackwell Ultra","DeepSeek-R1","Llama 3.1 405B","AI伺服器","TensorRT-LLM","vLLM","zh",2,false,"2026-04-02T08:48:38.43437+00:00","2026-04-02T08:48:38.317+00:00","done","c27c189a-f1c6-4790-88c1-0678673d9ecd","nvidia-sets-new-mlperf-inference-records-zh","industry","3e10b782-08fe-4a58-aabc-0f4ca77eaa50","published","2026-04-08T09:00:53.422+00:00",[36,38,40,43,45,46,48,50],{"name":14,"slug":37},"mlperf",{"name":18,"slug":39},"deepseek-r1",{"name":41,"slug":42},"Nvidia","nvidia",{"name":17,"slug":44},"blackwell-ultra",{"name":15,"slug":15},{"name":22,"slug":47},"vllm",{"name":20,"slug":49},"ai伺服器",{"name":16,"slug":51},"gb300-nvl72",{"id":32,"slug":53,"title":54,"language":55},"nvidia-sets-new-mlperf-inference-records-en","NVIDIA Sets New MLPerf Inference Records","en",[57,63,69,75,81,87],{"id":58,"slug":59,"title":60,"cover_image":61,"image_url":61,"created_at":62,"category":31},"cd078ce9-0a92-485a-b428-2f5523250a19","circles-agent-stack-targets-machine-speed-payments-zh","Circle 推出 Agent Stack，瞄準機器速度支付","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778871663628-uyk5.png","2026-05-15T19:00:44.16849+00:00",{"id":64,"slug":65,"title":66,"cover_image":67,"image_url":67,"created_at":68,"category":31},"96d96399-f674-4269-997a-cddfc34291a0","iren-signs-nvidia-ai-infrastructure-pact-zh","IREN 綁上 Nvidia AI 基建","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778871057561-bukp.png","2026-05-15T18:50:37.57206+00:00",{"id":70,"slug":71,"title":72,"cover_image":73,"image_url":73,"created_at":74,"category":31},"de12a36e-52f9-4bca-8deb-a41cf974ffd9","circle-agent-stack-ai-payments-zh","Circle 推出 Agent Stack 做 AI 付款","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778870462187-t9xv.png","2026-05-15T18:40:30.945394+00:00",{"id":76,"slug":77,"title":78,"cover_image":79,"image_url":79,"created_at":80,"category":31},"e6379f8a-3305-4862-bd15-1192d3247841","why-nebius-ai-pivot-is-more-real-than-hype-zh","為什麼 Nebius 的 AI 轉型比炒作更真實","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778823044520-9mfz.png","2026-05-15T05:30:24.978992+00:00",{"id":82,"slug":83,"title":84,"cover_image":85,"image_url":85,"created_at":86,"category":31},"66c4e357-d84d-43ef-a2e7-120c4609e98e","nvidia-backs-corning-factories-with-billions-zh","Nvidia 出資 Corning 工廠擴產","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778822450270-trdb.png","2026-05-15T05:20:27.701475+00:00",{"id":88,"slug":89,"title":90,"cover_image":91,"image_url":91,"created_at":92,"category":31},"31d8109c-8b0b-46e2-86bc-d274a03269d1","why-anthropic-gates-foundation-ai-public-goods-zh","為什麼 Anthropic 和 Gates Foundation 應該投資 A…","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778796636474-u508.png","2026-05-14T22:10:21.138177+00:00",[94,99,104,109,114,119,124,129,134,139],{"id":95,"slug":96,"title":97,"created_at":98},"ee073da7-28b3-4752-a319-5a501459fb87","ai-in-2026-what-actually-matters-now-zh","2026 AI 真正重要的事","2026-03-26T07:09:12.008134+00:00",{"id":100,"slug":101,"title":102,"created_at":103},"83bd1795-8548-44c9-9a7e-de50a0923f71","trump-ai-framework-power-speech-state-preemption-zh","川普 AI 框架瞄準電力、言論與州權","2026-03-26T07:12:18.695466+00:00",{"id":105,"slug":106,"title":107,"created_at":108},"ea6be18b-c903-4e54-97b7-5f7447a612e0","nvidia-gtc-2026-big-ai-announcements-zh","NVIDIA GTC 2026 重點拆解","2026-03-26T07:14:26.62638+00:00",{"id":110,"slug":111,"title":112,"created_at":113},"4bcec76f-4c36-4daa-909f-54cd702f7c93","claude-users-spreading-out-and-getting-better-zh","Claude 用戶更分散，也更會用","2026-03-26T07:22:52.325888+00:00",{"id":115,"slug":116,"title":117,"created_at":118},"bd903b15-2473-4178-9789-b7557816e535","openclaw-raises-hard-question-for-ai-models-zh","OpenClaw 逼問 AI 模型價值","2026-03-26T07:24:54.707486+00:00",{"id":120,"slug":121,"title":122,"created_at":123},"eeac6b9e-ad9d-4831-8eec-8bba3f9bca6a","gap-google-gemini-checkout-fashion-search-zh","Gap 把結帳搬進 Gemini","2026-03-26T07:28:23.937768+00:00",{"id":125,"slug":126,"title":127,"created_at":128},"0740e53f-605d-4d57-8601-c10beb126f3c","google-pushes-gemini-transition-to-march-2026-zh","Google 把 Gemini 轉換延到 2026 年 3…","2026-03-26T07:30:12.825269+00:00",{"id":130,"slug":131,"title":132,"created_at":133},"e660d801-2421-4529-8fa9-86b82b066990","metas-llama-4-benchmark-scandal-gets-worse-zh","Meta Llama 4 分數風波又擴大","2026-03-26T07:34:21.156421+00:00",{"id":135,"slug":136,"title":137,"created_at":138},"183f9e7c-e143-40bb-a6d5-67ba84a3a8bc","accenture-mistral-ai-sovereign-enterprise-deal-zh","Accenture 攜手 Mistral AI 賣主權 AI","2026-03-26T07:38:14.818906+00:00",{"id":140,"slug":141,"title":142,"created_at":143},"191d9b1b-768a-478c-978c-dd7431a38149","mistral-ai-faces-its-hardest-year-yet-zh","Mistral AI 迎來最硬的一年","2026-03-26T07:40:23.716374+00:00"]