Tag

TensorRT-LLM

TensorRT-LLM is NVIDIA’s optimization stack for LLM inference, focused on lower latency, higher throughput, and better GPU utilization. It often shows up alongside MLPerf, Blackwell/GB300, and Dynamo, highlighting how server performance depends on compilation, scheduling, and runtime software as much as hardware.

2 articles

Research/Apr 3

Nvidia’s MLPerf Gains Show Software Still Matters

Nvidia posted up to 2.77x MLPerf gains on GB300 NVL72, with software tricks like Dynamo and TensorRT-LLM doing heavy lifting.

Industry News/Apr 2

NVIDIA Sets New MLPerf Inference Records

Blackwell Ultra hit new MLPerf Inference v6.0 highs, with GB300 NVL72 gaining 2.7x on DeepSeek-R1 server tests and 1.5x on Llama 3.1 405B.