Tag
AI inference
AI inference is the runtime phase where trained models generate outputs in production, so latency, memory footprint, and compute cost matter most. Topics here include home-based inference nodes, KV-cache compression, and how long contexts keep DRAM demand high.
4 articles

Industry News/May 14
Why Zyphra Cloud on AMD Matters More Than Another Model Launch
Zyphra Cloud matters because inference, not training, is now the real AI platform battle.

Industry News/Apr 16
Span, Nvidia, Pulte: Mini AI Data Centers in Homes
Span is testing home-based AI inference nodes with 1.25 MW across 100 homes, cutting build time from years to months.

Research/Apr 3
TurboQuant cuts memory use 6x without accuracy loss
Google Research’s TurboQuant claims 6x less memory and 8x faster inference with no accuracy loss, jolting AI inference economics.

Research/Apr 2
TurboQuant Won’t Fix the Memory Crunch
Google’s TurboQuant can cut KV-cache memory use 6x, but longer contexts may keep DRAM and NAND demand climbing.