Tag
model compression
2 articles

Research/Jun 5
Reinforcement-aware distillation for LLM reasoning
This paper proposes reinforcement-aware knowledge distillation to improve LLM reasoning, but the abstract provides no benchmark numbers.

Research/Mar 31
IF4: Smarter 4-Bit Quantization That Adapts to Your Data
MIT researchers propose a hybrid data format that switches between floating-point and integer representations, improving accuracy in 4-bit neural network quantization.