Tag
distillation
Distillation transfers a larger model’s behavior—ranking preferences, generation patterns, or reasoning signals—into a smaller student model. It matters because teams use it to cut inference cost and latency while keeping SLMs useful for reranking, generation, and cross-architecture alignment.
2 articles

Research/Apr 30
Select-to-Think: Let SLMs Re-rank Themselves
A new method lets small language models re-rank their own candidates instead of calling an LLM at inference time.

Research/Apr 30
TIDE distills diffusion LLMs across architectures
TIDE distills diffusion LLMs across architectures, adding noise-aware weighting and tokenizer-aware objectives to improve a 0.6B student.