Tag
BenchLM
3 articles

Research/May 13
Why coding benchmarks are finally telling the truth
BenchLM’s coding leaderboard says LiveCodeBench and SWE-bench Pro are the only signals that still matter.

Model Releases/May 4
Kimi K2.6 Scores: BenchLM’s 2026 Breakdown
Kimi K2.6 ranks #12 overall on BenchLM, with strong coding and agentic scores, plus a 256K context window and open weights.

Model Releases/Apr 13
GPT-5.4 Scores 97.6 in Knowledge Benchmarks
GPT-5.4 tops knowledge benchmarks with 97.6, ranks #2 overall on BenchLM, and posts a 1.05M-token context window.