Tag

LLM reasoning

LLM reasoning covers how models plan, verify, and correct multi-step solutions in math, physics, and other structured tasks. Recent work spans reinforcement learning in pre-train space, synthetic simulator data, and zero-shot gains on benchmark problems beyond web QA.

3 articles

Research/Jun 5

Reinforcement-aware distillation for LLM reasoning

This paper proposes reinforcement-aware knowledge distillation to improve LLM reasoning, but the abstract provides no benchmark numbers.

Research/Apr 16

PreRL: Training LLMs in pre-train space

PreRL shifts reinforcement learning from P(y|x) to P(y), using reward-driven updates in pre-train space to improve reasoning and exploration.

Research/Apr 14

Physics Simulators as RL Data for LLM Reasoning

Researchers train LLMs on synthetic physics from simulators and report zero-shot gains on IPhO problems, showing a new path beyond web QA data.