Tag

AI coding benchmark

1 articles

DeepSWE reshuffles the AI coding leaderboard

Research/May 29

DeepSWE reshuffles the AI coding leaderboard

DeepSWE’s 113-task test across 91 repos puts GPT-5.5 at 70% and exposes a loophole in Claude Opus.