Tag

reasoning models

Reasoning models are built to handle multi-step inference, verification, and agentic tasks such as math, coding, and interactive problem solving. This tag covers training methods, cold-start behavior, RLVR, loss design, and the cost-performance tradeoffs that shape deployment.

4 articles

Research/Jul 7

Direct-OPD reuses weak-model RL gains for stronger models

Direct-OPD lifts Qwen3-1.7B from 48.3% to 62.4% on AIME 2024 by distilling RL gains from a weaker model.

Research/Jun 10

A New Way to Think About SFT Targets

This paper reframes supervised fine-tuning as designing target distributions, not just minimizing token loss.

Research/Apr 29

Tsallis loss for faster reasoning-model training

A Tsallis-loss continuum may help reasoning models escape cold-start stalls faster than RLVR, with tradeoffs between speed, noise, and stability.

Research/Apr 2

ARC Prize leaderboard shows cost still matters

ARC Prize’s leaderboard tracks how AI systems trade cost for score, and ARC-AGI-3 pushes agents into interactive tasks.