Tag
TensorRT-LLM
TensorRT-LLM is NVIDIA’s optimization stack for LLM inference, focused on lower latency, higher throughput, and better GPU utilization. It often shows up alongside MLPerf, Blackwell/GB300, and Dynamo, highlighting how server performance depends on compilation, scheduling, and runtime software as much as hardware.
2 articles

Research/Apr 3
Nvidia’s MLPerf Gains Show Software Still Matters
Nvidia posted up to 2.77x MLPerf gains on GB300 NVL72, with software tricks like Dynamo and TensorRT-LLM doing heavy lifting.

Industry News/Apr 2
NVIDIA Sets New MLPerf Inference Records
Blackwell Ultra hit new MLPerf Inference v6.0 highs, with GB300 NVL72 gaining 2.7x on DeepSeek-R1 server tests and 1.5x on Llama 3.1 405B.