Tag
1 articles
Qwen3.6 Plus tops the AIME 2026 math benchmark with 0.953, while 8 models show a wide gap in olympiad-style reasoning.