[IND] 7 min readOraCore Editors

NVIDIA Rubin Pushes AI Infrastructure to a New Scale

NVIDIA’s Rubin platform bundles six chips, claims 10x lower inference cost, and brings Vera Rubin NVL72 systems to cloud and enterprise AI.

Share LinkedIn
NVIDIA Rubin Pushes AI Infrastructure to a New Scale

NVIDIA says its new Rubin platform can cut inference token cost by up to 10x versus Blackwell. It also says the platform can train mixture-of-experts models with 4x fewer GPUs. Those are the kind of numbers that make cloud providers, AI labs, and enterprise buyers pay attention fast.

The announcement, made at CES in Las Vegas, is NVIDIA’s latest attempt to keep the AI hardware cycle moving on an annual rhythm. Rubin is not a single chip. It is a six-part system built around the Vera CPU, Rubin GPU, NVLink 6 switch, ConnectX-9 SuperNIC, BlueField-4 DPU, and Spectrum-6 Ethernet switch.

What NVIDIA actually announced

Get the latest AI news in your inbox

Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.

No spam. Unsubscribe at any time.

The headline is simple: NVIDIA wants Rubin to be the next default platform for giant AI systems. The company says the design uses extreme codesign across hardware and software to improve training speed, lower inference cost, and support agentic AI workloads that need long context windows and lots of back-and-forth reasoning.

That matters because the AI industry has moved from “can we train it?” to “can we afford to run it?” The economics of inference are becoming as important as model quality. If NVIDIA’s claims hold up in production, Rubin could reduce the compute bill for companies running large models at scale.

  • Up to 10x lower inference token cost than Blackwell, according to NVIDIA
  • 4x fewer GPUs needed to train MoE models, according to NVIDIA
  • 3.6TB/s GPU-to-GPU bandwidth per GPU
  • 260TB/s bandwidth in the Vera Rubin NVL72 rack
  • 50 petaflops of NVFP4 compute for inference on Rubin GPU

The platform also brings new versions of familiar infrastructure ideas: faster interconnects, better memory movement, and more attention to security and reliability. NVIDIA says the new rack-scale system includes confidential computing across CPU, GPU, and NVLink domains, plus a second-generation RAS engine for fault detection and recovery.

Why the six-chip design matters

Rubin is interesting because it treats AI infrastructure like a system problem, not a chip problem. That is a smart move. Modern AI workloads are bottlenecked by memory, networking, storage, and power as much as by raw GPU math. NVIDIA is trying to control all of those layers in one stack.

The company’s pitch is that agentic AI and reasoning models need a lot more than fast matrix math. They need long token sequences, frequent communication between GPUs, and enough memory movement to keep the model from stalling. Rubin’s mix of CPU, GPU, NIC, DPU, and Ethernet silicon is meant to keep the whole machine busy.

“Rubin arrives at exactly the right moment, as AI computing demand for both training and inference is going through the roof,” said Jensen Huang, founder and CEO of NVIDIA.

Huang also said the company’s annual cadence for new AI supercomputers is meant to keep pushing toward “the next frontier of AI.” That cadence is the real story here. NVIDIA is no longer shipping isolated accelerators. It is shipping a yearly infrastructure refresh cycle that cloud vendors and large enterprises are expected to plan around.

There is also a storage angle. NVIDIA introduced an Inference Context Memory Storage Platform with BlueField-4 storage processing to speed agentic AI reasoning. In plain English, it is trying to make memory and storage behave less like bottlenecks and more like extensions of the model runtime.

The numbers behind the competition

NVIDIA framed Rubin against Blackwell, which is the right comparison because Blackwell is the current benchmark for high-end AI infrastructure. The company’s own numbers suggest a major efficiency jump, but the value depends on workload. Training a giant MoE model is one thing. Serving a consumer chatbot or a coding assistant is another.

Still, the comparison is strong enough to matter. If a platform can cut token cost by 10x and GPU count by 4x for certain training jobs, buyers will at least run the math. Power, cooling, and rack density are now board-level concerns for AI buyers, not just data center engineers.

  • Microsoft said its Fairwater AI superfactories will scale to hundreds of thousands of Vera Rubin Superchips
  • CoreWeave said it will add Rubin through Mission Control for production operations
  • AWS said Rubin will expand its AI infrastructure offerings
  • Google Cloud said it will bring Rubin to customers
  • Oracle Cloud Infrastructure said it will build gigascale AI factories with Vera Rubin architecture

The broader ecosystem list is long: Anthropic, OpenAI, Meta, xAI, Mistral AI, Perplexity, Cisco, Dell Technologies, HPE, Lenovo, and Supermicro are all named as expected adopters or partners. That kind of roster is less about hype and more about procurement reality. If the biggest buyers are lining up early, the supply chain and software stack will follow.

What the cloud and enterprise angle says about 2026

This announcement also tells us where the money is going. The biggest AI spenders are no longer just training frontier models. They are building factories for inference, reasoning, and agentic workflows that can handle long-running tasks. That is why NVIDIA keeps talking about token cost, uptime, and rack-scale systems instead of raw peak FLOPS.

It also explains the expanded work with Red Hat AI. NVIDIA says the Rubin platform will pair with Red Hat Enterprise Linux, OpenShift, and Red Hat AI to give enterprises a full stack optimized for Rubin. That is a practical move, since many enterprise buyers care more about deployment, governance, and support than about benchmark charts.

For readers tracking the AI infrastructure race, the interesting question is not whether Rubin is fast. NVIDIA almost certainly made it fast. The real question is whether this new stack changes deployment economics enough to push more companies from pilot projects to full production systems.

One more detail matters: NVIDIA says Vera Rubin NVL72 is the first rack-scale platform to deliver confidential computing across CPU, GPU, and NVLink domains. For regulated industries and model owners worried about proprietary data, that is the kind of feature that can move a purchase decision.

What to watch next

Rubin looks like NVIDIA’s attempt to keep AI infrastructure on a yearly upgrade cycle while making each generation easier to justify on cost. If the company’s token-cost claims hold in the real world, cloud providers and enterprise buyers will have a strong reason to refresh faster than they would with a normal server upgrade.

My read: the next fight is not just about who has the fastest GPU. It is about who can deliver the cheapest secure token at scale. If Rubin lands the way NVIDIA says it will, the companies that run large models will start asking a sharper question: how much inference can we buy per watt, per rack, and per dollar?

That is the metric to watch through 2026, because it will decide whether Rubin becomes a niche flagship or the default platform for the next wave of AI factories.