Tag
SWE-Bench Pro
4 articles

Model Releases/Jun 9
MiniMax M3 Proves Open-Weight Can Still Win on Coding
MiniMax M3 makes a strong case that open-weight models can still lead on coding, context, and price.

Model Releases/May 17
Why Kimi K2.6 Changes the Coding Model Race
Kimi K2.6 is the open-weight coding model that matches GPT-5.5 on SWE-Bench Pro at far lower cost.

Research/May 13
Why coding benchmarks are finally telling the truth
BenchLM’s coding leaderboard says LiveCodeBench and SWE-bench Pro are the only signals that still matter.

AI Agent/Apr 3
Marginlab Tracks Claude Code Opus 4.6 Drift
Marginlab’s daily tracker watches Claude Code Opus 4.6 on 50 SWE-Bench-Pro tasks and flags statistically significant drops.