Tag

Terminal Bench 2.0

Terminal Bench 2.0 measures how well AI systems handle real terminal work: running commands, fixing errors, navigating files, and chaining multi-step shell tasks. It is a useful signal for agentic coding, automation, and models that must operate reliably in CLI-driven workflows.

2 articles

Model Releases/Apr 2

GLM-5: Z.AI's new flagship for coding and agents

GLM-5 posts 77.8 on SWE-bench Verified and 56.2 on Terminal Bench 2.0, putting Z.AI in direct competition with top coding models.

Model Releases/Mar 28

Cursor Composer 2 Bets on Agentic Coding

Cursor’s Composer 2 posts 61.3 on CursorBench and 61.7 on Terminal-Bench 2.0, with pricing aimed at high-volume coding teams.