Back to home

Tag

quantization

Quantization compresses model weights, activations, or KV cache into lower-bit formats to reduce memory and inference cost. Recent work spans 4-bit hybrid schemes and lower-bit LLM inference methods that target bottlenecks without sacrificing too much accuracy.

7 articles