Tag
RLVR
RLVR, or reinforcement learning with verifiable rewards, trains models on tasks where success can be checked objectively: math proofs, coding problems, unit tests, or rule-based outputs. It matters because reward design here shapes cold-start behavior, exploration, and training stability.
1 articles
