Back to home

Tag

RLVR

RLVR, or reinforcement learning with verifiable rewards, trains models on tasks where success can be checked objectively: math proofs, coding problems, unit tests, or rule-based outputs. It matters because reward design here shapes cold-start behavior, exploration, and training stability.

1 articles