[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"tag-tensorrt-llm":3},{"tag":4,"articles":11},{"id":5,"name":6,"slug":7,"article_count":8,"description_zh":9,"description_en":10},"9634621c-ff83-44fd-a73f-8941397f5465","TensorRT-LLM","tensorrt-llm",4,"TensorRT-LLM 是 NVIDIA 針對大型語言模型推論的最佳化框架，重點在降低延遲、提升吞吐量與硬體利用率。它常與 MLPerf、Blackwell\u002FGB300、Dynamo 等軟體堆疊一起出現，反映 LLM 伺服器效能不只看晶片，也看編譯與排程。","TensorRT-LLM is NVIDIA’s optimization stack for LLM inference, focused on lower latency, higher throughput, and better GPU utilization. It often shows up alongside MLPerf, Blackwell\u002FGB300, and Dynamo, highlighting how server performance depends on compilation, scheduling, and runtime software as much as hardware.",[12,21],{"id":13,"slug":14,"title":15,"summary":16,"category":17,"image_url":18,"cover_image":18,"language":19,"created_at":20},"a15782d7-4678-4415-9a0b-4c642e46b022","nvidia-mlperf-software-inference-benchmarks-en","Nvidia’s MLPerf Gains Show Software Still Matters","Nvidia posted up to 2.77x MLPerf gains on GB300 NVL72, with software tricks like Dynamo and TensorRT-LLM doing heavy lifting.","research","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775185791842-obyu.png","en","2026-04-03T03:09:35.154603+00:00",{"id":22,"slug":23,"title":24,"summary":25,"category":26,"image_url":27,"cover_image":27,"language":19,"created_at":28},"3e10b782-08fe-4a58-aabc-0f4ca77eaa50","nvidia-sets-new-mlperf-inference-records-en","NVIDIA Sets New MLPerf Inference Records","Blackwell Ultra hit new MLPerf Inference v6.0 highs, with GB300 NVL72 gaining 2.7x on DeepSeek-R1 server tests and 1.5x on Llama 3.1 405B.","industry","https:\u002F\u002Fxxdpdyhzhpamafnrdkyq.supabase.co\u002Fstorage\u002Fv1\u002Fobject\u002Fpublic\u002Fcovers\u002Finline-1775122498583-yuhr.png","2026-04-02T08:48:38.893048+00:00"]