Tag
1 articles
A Tsallis-loss continuum may help reasoning models escape cold-start stalls faster than RLVR, with tradeoffs between speed, noise, and stability.