← Glossary

Quantization

Technique

Definition

Reducing the numerical precision of model weights (e.g., from 32-bit float to 4-bit integer) to shrink model size and speed up inference with minimal accuracy loss. Enables running large models on consumer hardware. Key for local deployments.