Tag
1 articles
A new paper links chaotic, high-learning-rate training to generalization via a “sharpness dimension” built from the Hessian spectrum.