[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"article-nvidia-b300-vs-h200-deepseek-perf-zh":3,"tags-nvidia-b300-vs-h200-deepseek-perf-zh":33,"related-lang-nvidia-b300-vs-h200-deepseek-perf-zh":50,"related-posts-nvidia-b300-vs-h200-deepseek-perf-zh":54,"series-industry-c701c93e-a74b-49a7-ac72-40ed577a6e92":91},{"id":4,"title":5,"content":6,"summary":7,"source":8,"source_url":9,"author":10,"image_url":11,"keywords":12,"language":21,"translated_content":10,"views":22,"is_premium":23,"created_at":24,"updated_at":24,"cover_image":11,"published_at":25,"rewrite_status":26,"rewrite_error":10,"rewritten_from_id":27,"slug":28,"category":29,"related_article_id":30,"status":31,"google_indexed_at":32,"x_posted_at":10,"tweet_text":10,"title_rewritten_at":10,"title_original":10,"key_takeaways":10,"topic_cluster_id":10,"embedding":10,"is_canonical_seed":23},"c701c93e-a74b-49a7-ac72-40ed577a6e92","NVIDIA B300 對 H200：DeepSeek 實…","\u003Cp>NVIDIA 的 \u003Ca href=\"https:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fdata-center\u002Fblackwell-ultra\u002F\" target=\"_blank\" rel=\"noopener\">B300\u003C\u002Fa> 很兇。它有 288GB HBM3e，頻寬到 8TB\u002Fs。這兩個數字，對 LLM 推論很有感。\u003C\u002Fp>\u003Cp>講白了，模型能不能塞進一張卡。KV cache 能不能撐住。這些都會直接影響延遲。尤其是跑 \u003Ca href=\"https:\u002F\u002Fwww.deepseek.com\u002Fen\" target=\"_blank\" rel=\"noopener\">DeepSeek\u003C\u002Fa> 這種重推理工作負載時，差距很明顯。\u003C\u002Fp>\u003Cp>所以 \u003Ca href=\"https:\u002F\u002Fwww.nvidia.com\u002Fen-us\u002Fdata-center\u002Fh200\u002F\" target=\"_blank\" rel=\"noopener\">H200\u003C\u002Fa> 跟 B300 的比較，不只是規格表比大小。它其實是在問你：你要的是便宜、夠用，還是一次把\u003Ca href=\"\u002Fnews\u002Fturboquant-cuts-memory-use-without-accuracy-loss-zh\">記憶體\u003C\u002Fa>瓶頸拉高很多。\u003C\u002Fp>\u003Ch2>B300 到底改了什麼\u003C\u002Fh2>\u003Cp>B300 屬於 Blackwell Ultra。NVIDIA 給的重點很直白。288GB HBM3e。8TB\u002Fs 頻寬。還有面向推論的設計方向。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775161680437-1ibz.png\" alt=\"NVIDIA B300 對 H200：DeepSeek 實…\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>這種卡最有感的地方，不是單純 FLOPS。是它讓模型和 cache 更容易待在同一張 GPU 上。少一點搬來搬去，延遲就比較不會亂跳。\u003C\u002Fp>\u003Cp>如果你只看算力，會看錯重點。現在很多 LLM 服務，卡住的不是算不算得動。卡住的是記憶體夠不夠。這就是 B300 比 H\u003Ca href=\"\u002Fnews\u002Fmarch-2026-synology-docker-updates-zh\">20\u003C\u002Fa>0 更兇的地方。\u003C\u002Fp>\u003Cul>\u003Cli>B300：288GB HBM3e\u003C\u002Fli>\u003Cli>H200：141GB HBM3e\u003C\u002Fli>\u003Cli>H100：80GB HBM3e\u003C\u002Fli>\u003Cli>B300：8TB\u002Fs 頻寬\u003C\u002Fli>\u003Cli>H200：4.8TB\u002Fs 頻寬\u003C\u002Fli>\u003C\u002Ful>\u003Cp>你可以把它想成更大的工作桌。桌面夠大，文件才不會一直堆到地上。對推論服務來說，這很重要。\u003C\u002Fp>\u003Ch2>H200 還能打嗎\u003C\u002Fh2>\u003Cp>可以。H200 不是過氣貨。它還是很強，尤其是大模型推論。141GB HBM3e 對很多 70B 級別模型來說，已經很夠用。\u003C\u002Fp>\u003Cp>但 B300 的打法更狠。它直接把記憶體容量拉到接近兩倍。對長上下文、多人同時打 API、或是重度 KV cache 的場景，這差很多。\u003C\u002Fp>\u003Cp>我覺得這裡最該看的是部署成本，而不是只看單卡價格。因為如果一張 B300 能少拆幾張卡，整體機櫃、網路、維運都會跟著變簡單。\u003C\u002Fp>\u003Cblockquote>“The pace of innovation in AI is accelerating, and the demand for compute is insatiable.” — Jensen Huang\u003C\u002Fblockquote>\u003Cp>這句話很適合拿來看 B300。不是每次升級都要追求更高峰值。很多時候，能不能把模型穩穩跑完，才是重點。\u003C\u002Fp>\u003Cp>H200 的優勢是成熟、便宜一點、部署壓力小。B300 的優勢是更大記憶體和更高頻寬。兩者不是同一種打法。\u003C\u002Fp>\u003Ch2>DeepSeek 推論，為什麼記憶體先贏\u003C\u002Fh2>\u003Cp>DeepSeek 的推理模型很吃 KV cache。上下文一長，cache 就膨脹。這時候，算力還沒先爆，記憶體先滿了。\u003C\u002Fp>\n\u003Cfigure class=\"my-6\">\u003Cimg src=\"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775161683934-xhzm.png\" alt=\"NVIDIA B300 對 H200：DeepSeek 實…\" class=\"rounded-xl w-full\" loading=\"lazy\" \u002F>\u003C\u002Ffigure>\n\u003Cp>這也是為什麼 288GB 會很有感。你可以放更大的 batch。你可以留更長的 context。你也比較不需要一直做 cache eviction。\u003C\u002Fp>\u003Cp>vLLM 對 Blackwell Ultra 的測試，對 DeepSeek-V3.2 和 DeepSeek-R1 都有不錯結果。重點不是某一個數字神到不行。重點是它證明 B300 這類卡，真的能扛住大模型推論。\u003C\u002Fp>\u003Cul>\u003Cli>DeepSeek-V3.2 prefill-only：7,360 TGS\u003C\u002Fli>\u003Cli>DeepSeek-V3.2 mixed context：2,816 TGS\u003C\u002Fli>\u003Cli>DeepSeek-R1 prefill-only：22,476 TGS\u003C\u002Fli>\u003Cli>DeepSeek-R1 mixed context：3,072 TGS\u003C\u002Fli>\u003Cli>NVFP4 + TP2 在部分測試中，mixed-context 最高提升到 8 倍\u003C\u002Fli>\u003C\u002Ful>\u003Cp>這些數字對聊天機器人、程式助理、企業知識庫都很實際。因為使用者最在意的，常常不是峰值吞吐。是卡不卡、等多久、會不會突然變慢。\u003C\u002Fp>\u003Cp>說真的，很多推論系統不是輸在模型。是輸在記憶體配置太小，最後只能硬切 batch 或縮短上下文。\u003C\u002Fp>\u003Ch2>B300 和雲端 GPU 怎麼比\u003C\u002Fh2>\u003Cp>如果你自己蓋機房，B300 不是隨便插上去就能用。它大約 1,400W。這代表散熱、供電、機櫃設計都要跟上。\u003C\u002Fp>\u003Cp>所以很多團隊會直接租。像 \u003Ca href=\"https:\u002F\u002Fwww.digitalocean.com\u002Fproducts\u002Fdroplets\u002Fgpu-droplets\" target=\"_blank\" rel=\"noopener\">DigitalOcean GPU Droplets\u003C\u002Fa> 已經在規劃 B300。\u003Ca href=\"https:\u002F\u002Faws.amazon.com\u002Fec2\u002Finstance-types\u002Fp6\u002F\" target=\"_blank\" rel=\"noopener\">AWS P6\u003C\u002Fa> 也有 B300 系列。\u003C\u002Fp>\u003Cp>這時候不要只看每小時單價。要看每個 tok\u003Ca href=\"\u002Fnews\u002Fopenai-gpt-5-2-chatgpt-release-notes-zh\">en\u003C\u002Fa> 的成本。卡越快，完成同樣工作所花的時間越少。這會直接影響總成本。\u003C\u002Fp>\u003Cul>\u003Cli>H100 SXM，Llama 70B 約 21,800 tok\u002Fs\u003C\u002Fli>\u003Cli>H200 SXM，Llama 70B 約 31,700 tok\u002Fs\u003C\u002Fli>\u003Cli>B300 FP8，Llama 70B 100,000+ tok\u002Fs\u003C\u002Fli>\u003Cli>B300 FP4，Llama 70B 150,000+ tok\u002Fs\u003C\u002Fli>\u003Cli>AWS P6 cited 價格約 $11.70 \u002F GPU-hour\u003C\u002Fli>\u003C\u002Ful>\u003Cp>如果你的 SLA 很硬，B300 可能反而比較划算。因為你用更少的 GPU，就能撐住同樣流量。這種情況下，便宜單價不一定最省錢。\u003C\u002Fp>\u003Cp>另外，網路也不能忽略。雲端部署常提到 25Gbps 機器對機器網路，還有 10Gbps 對外頻寬。對分散式推論來說，這已經比很多人想像中重要。\u003C\u002Fp>\u003Ch2>誰該買 B300，誰先別急\u003C\u002Fh2>\u003Cp>如果你的工作負載已經很吃記憶體，那 B300 很合理。像長上下文文件系統、重推理模型、多人同時打 API 的服務，都很適合。\u003C\u002Fp>\u003Cp>如果你還在試模型，H200 可能更實際。它便宜一點，散熱壓力小一點，部署門檻也低一點。很多團隊其實還沒碰到 B300 的甜蜜點。\u003C\u002Fp>\u003Cp>我自己的看法很直接。你如果已經在跑 DeepSeek-R1、70B 級模型，還一直被 KV cache 卡住，那就該認真看 B300。反過來說，若你現在的瓶頸是產品還沒做對，換卡也救不了。\u003C\u002Fp>\u003Cul>\u003Cli>B300：適合長上下文與高併發\u003C\u002Fli>\u003Cli>H200：適合較低成本的大模型服務\u003C\u002Fli>\u003Cli>H100：適合較小規模或舊堆疊\u003C\u002Fli>\u003Cli>雲端租用：適合不想碰液冷與供電設計\u003C\u002Fli>\u003C\u002Ful>\u003Cp>軟體堆疊也要一起看。CUDA 12.x、cuDNN 9.x、TensorRT-LLM 0.15+ 這些都很重要。硬體再強，軟體沒跟上，效果就會打折。\u003C\u002Fp>\u003Ch2>背景補充：這波 GPU 為什麼都在拼記憶體\u003C\u002Fh2>\u003Cp>以前大家比的是算力。現在越來越像在比誰的記憶體更大、頻寬更高。原因很簡單。LLM 服務已經不是單次推理而已。\u003C\u002Fp>\u003Cp>現在的主流場景是多輪對話、長文件摘要、程式碼生成、企業內部問答。這些工作都會把 cache 撐大。當 cache 變成主角，GPU 的記憶體規格就變得很關鍵。\u003C\u002Fp>\u003Cp>這也是為什麼 B300 這種卡會被放大檢視。它不是只給 benchmark 看。它是給真正要長時間跑服務的團隊看。\u003C\u002Fp>\u003Cp>H200 仍然很能打。只是它比較像成熟解。B300 比較像你已經把規模拉上去之後，才會真的需要的解法。\u003C\u002Fp>\u003Ch2>結論：先看你的瓶頸，再決定買哪張\u003C\u002Fh2>\u003Cp>如果你的瓶頸是記憶體，B300 很值得看。288GB HBM3e 不是小數字。它會直接改變你怎麼切模型、怎麼配 batch、怎麼留 cache。\u003C\u002Fp>\u003Cp>如果你的瓶頸還在模型品質、資料整理、產品設計，那先別急著升級。GPU 很貴。買錯卡，只會讓帳單更好看，效果沒變。\u003C\u002Fp>\u003Cp>我會建議先做一輪 profiling。看你的 DeepSeek 工作負載，到底是算力吃緊，還是記憶體先滿。這個答案，會直接決定你要 H200，還是 B300。\u003C\u002Fp>","B300 有 288GB HBM3e 和 8TB\u002Fs 頻寬。這篇直接比 H200，拆解 DeepSeek 推論、KV cache、雲端成本與部署取捨。","zhuanlan.zhihu.com","https:\u002F\u002Fzhuanlan.zhihu.com\u002Fp\u002F2015473154676507339",null,"https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775161680437-1ibz.png",[13,14,15,16,17,18,19,20],"NVIDIA B300","H200","DeepSeek","LLM 推論","HBM3e","GPU 雲端成本","KV cache","Blackwell Ultra","zh",1,false,"2026-04-02T20:27:38.70665+00:00","2026-04-02T20:27:38.495+00:00","done","0643e29c-3d1e-4753-b5ed-8571a3914e8a","nvidia-b300-vs-h200-deepseek-perf-zh","industry","b6caa87d-6766-486b-b510-0b27c6222f8e","published","2026-04-08T09:00:48.843+00:00",[34,36,38,40,42,44,46,48],{"name":14,"slug":35},"h200",{"name":19,"slug":37},"kv-cache",{"name":13,"slug":39},"nvidia-b300",{"name":20,"slug":41},"blackwell-ultra",{"name":18,"slug":43},"gpu-雲端成本",{"name":15,"slug":45},"deepseek",{"name":17,"slug":47},"hbm3e",{"name":16,"slug":49},"llm-推論",{"id":30,"slug":51,"title":52,"language":53},"nvidia-b300-vs-h200-deepseek-perf-en","NVIDIA B300 vs H200: Specs and DeepSeek Perf","en",[55,61,67,73,79,85],{"id":56,"slug":57,"title":58,"cover_image":59,"image_url":59,"created_at":60,"category":29},"cd078ce9-0a92-485a-b428-2f5523250a19","circles-agent-stack-targets-machine-speed-payments-zh","Circle 推出 Agent Stack，瞄準機器速度支付","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778871663628-uyk5.png","2026-05-15T19:00:44.16849+00:00",{"id":62,"slug":63,"title":64,"cover_image":65,"image_url":65,"created_at":66,"category":29},"96d96399-f674-4269-997a-cddfc34291a0","iren-signs-nvidia-ai-infrastructure-pact-zh","IREN 綁上 Nvidia AI 基建","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778871057561-bukp.png","2026-05-15T18:50:37.57206+00:00",{"id":68,"slug":69,"title":70,"cover_image":71,"image_url":71,"created_at":72,"category":29},"de12a36e-52f9-4bca-8deb-a41cf974ffd9","circle-agent-stack-ai-payments-zh","Circle 推出 Agent Stack 做 AI 付款","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778870462187-t9xv.png","2026-05-15T18:40:30.945394+00:00",{"id":74,"slug":75,"title":76,"cover_image":77,"image_url":77,"created_at":78,"category":29},"e6379f8a-3305-4862-bd15-1192d3247841","why-nebius-ai-pivot-is-more-real-than-hype-zh","為什麼 Nebius 的 AI 轉型比炒作更真實","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778823044520-9mfz.png","2026-05-15T05:30:24.978992+00:00",{"id":80,"slug":81,"title":82,"cover_image":83,"image_url":83,"created_at":84,"category":29},"66c4e357-d84d-43ef-a2e7-120c4609e98e","nvidia-backs-corning-factories-with-billions-zh","Nvidia 出資 Corning 工廠擴產","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778822450270-trdb.png","2026-05-15T05:20:27.701475+00:00",{"id":86,"slug":87,"title":88,"cover_image":89,"image_url":89,"created_at":90,"category":29},"31d8109c-8b0b-46e2-86bc-d274a03269d1","why-anthropic-gates-foundation-ai-public-goods-zh","為什麼 Anthropic 和 Gates Foundation 應該投資 A…","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1778796636474-u508.png","2026-05-14T22:10:21.138177+00:00",[92,97,102,107,112,117,122,127,132,137],{"id":93,"slug":94,"title":95,"created_at":96},"ee073da7-28b3-4752-a319-5a501459fb87","ai-in-2026-what-actually-matters-now-zh","2026 AI 真正重要的事","2026-03-26T07:09:12.008134+00:00",{"id":98,"slug":99,"title":100,"created_at":101},"83bd1795-8548-44c9-9a7e-de50a0923f71","trump-ai-framework-power-speech-state-preemption-zh","川普 AI 框架瞄準電力、言論與州權","2026-03-26T07:12:18.695466+00:00",{"id":103,"slug":104,"title":105,"created_at":106},"ea6be18b-c903-4e54-97b7-5f7447a612e0","nvidia-gtc-2026-big-ai-announcements-zh","NVIDIA GTC 2026 重點拆解","2026-03-26T07:14:26.62638+00:00",{"id":108,"slug":109,"title":110,"created_at":111},"4bcec76f-4c36-4daa-909f-54cd702f7c93","claude-users-spreading-out-and-getting-better-zh","Claude 用戶更分散，也更會用","2026-03-26T07:22:52.325888+00:00",{"id":113,"slug":114,"title":115,"created_at":116},"bd903b15-2473-4178-9789-b7557816e535","openclaw-raises-hard-question-for-ai-models-zh","OpenClaw 逼問 AI 模型價值","2026-03-26T07:24:54.707486+00:00",{"id":118,"slug":119,"title":120,"created_at":121},"eeac6b9e-ad9d-4831-8eec-8bba3f9bca6a","gap-google-gemini-checkout-fashion-search-zh","Gap 把結帳搬進 Gemini","2026-03-26T07:28:23.937768+00:00",{"id":123,"slug":124,"title":125,"created_at":126},"0740e53f-605d-4d57-8601-c10beb126f3c","google-pushes-gemini-transition-to-march-2026-zh","Google 把 Gemini 轉換延到 2026 年 3…","2026-03-26T07:30:12.825269+00:00",{"id":128,"slug":129,"title":130,"created_at":131},"e660d801-2421-4529-8fa9-86b82b066990","metas-llama-4-benchmark-scandal-gets-worse-zh","Meta Llama 4 分數風波又擴大","2026-03-26T07:34:21.156421+00:00",{"id":133,"slug":134,"title":135,"created_at":136},"183f9e7c-e143-40bb-a6d5-67ba84a3a8bc","accenture-mistral-ai-sovereign-enterprise-deal-zh","Accenture 攜手 Mistral AI 賣主權 AI","2026-03-26T07:38:14.818906+00:00",{"id":138,"slug":139,"title":140,"created_at":141},"191d9b1b-768a-478c-978c-dd7431a38149","mistral-ai-faces-its-hardest-year-yet-zh","Mistral AI 迎來最硬的一年","2026-03-26T07:40:23.716374+00:00"]