Tag

AI inference

AI inference is the runtime phase where trained models generate outputs in production, so latency, memory footprint, and compute cost matter most. Topics here include home-based inference nodes, KV-cache compression, and how long contexts keep DRAM demand high.

4 articles

Industry News/May 14

Why Zyphra Cloud on AMD Matters More Than Another Model Launch

Zyphra Cloud matters because inference, not training, is now the real AI platform battle.

Industry News/Apr 16

Span, Nvidia, Pulte: Mini AI Data Centers in Homes

Span is testing home-based AI inference nodes with 1.25 MW across 100 homes, cutting build time from years to months.

Research/Apr 3

TurboQuant cuts memory use 6x without accuracy loss

Google Research’s TurboQuant claims 6x less memory and 8x faster inference with no accuracy loss, jolting AI inference economics.

Research/Apr 2

TurboQuant Won’t Fix the Memory Crunch

Google’s TurboQuant can cut KV-cache memory use 6x, but longer contexts may keep DRAM and NAND demand climbing.