Tag

model compression

2 articles

Reinforcement-aware distillation for LLM reasoning

This paper proposes reinforcement-aware knowledge distillation to improve LLM reasoning, but the abstract provides no benchmark numbers.

Research/Mar 31

IF4: Smarter 4-Bit Quantization That Adapts to Your Data

MIT researchers propose a hybrid data format that switches between floating-point and integer representations, improving accuracy in 4-bit neural network quantization.