What is Distillation? — AI Glossary 2026

Definition

Training a small "student" model to mimic the behavior of a larger "teacher" model. Produces compact models that retain much of the teacher's capability at a fraction of the compute cost. Used by DeepSeek-R1-Zero and many production models.

Related Terms

Quantization

Reducing the numerical precision of model weights (e.g., from 32-bit float to 4-bit integer) to shrink model size and speed up inference with minimal accuracy loss. Enables running large models on consumer hardware. Key for local deployments.

Fine-tuning

Continuing to train a pre-trained model on a domain-specific or task-specific dataset to specialize its behavior. Ranges from full fine-tuning (updating all weights) to parameter-efficient methods like LoRA and QLoRA.

LoRA (Low-Rank Adaptation)

A parameter-efficient fine-tuning technique that adds small trainable rank-decomposition matrices to frozen model layers. Achieves near full fine-tuning performance while training less than 1% of parameters. Industry standard for adapting LLMs.

Articles about Distillation

MLOps in 2026: Architecture and Strategy Guide

Normalizing Trajectory Models for 4-Step Generation

Select-to-Think: Let SLMs Re-rank Themselves

TIDE distills diffusion LLMs across architectures

GitHub's Top 10 Shows Skills as the Coding Agent Form Factor

Definition

Related Terms

Articles about Distillation

All Terms