Back to home

Tag

KV cache

KV cache is the working memory that lets LLMs reuse past tokens during inference, and it often becomes the main limit on context length, latency, and serving cost. This tag covers quantization, compression, HBM capacity and bandwidth trade-offs, and papers like TurboQuant.

5 articles