Tag
LLM reasoning
LLM reasoning covers how models plan, verify, and correct multi-step solutions in math, physics, and other structured tasks. Recent work spans reinforcement learning in pre-train space, synthetic simulator data, and zero-shot gains on benchmark problems beyond web QA.
2 articles

Research/Apr 16
PreRL: Training LLMs in pre-train space
PreRL shifts reinforcement learning from P(y|x) to P(y), using reward-driven updates in pre-train space to improve reasoning and exploration.

Research/Apr 14
Physics Simulators as RL Data for LLM Reasoning
Researchers train LLMs on synthetic physics from simulators and report zero-shot gains on IPhO problems, showing a new path beyond web QA data.