GLM-5: Z.AI's new flagship for coding and agents
GLM-5 posts 77.8 on SWE-bench Verified and 56.2 on Terminal Bench 2.0, putting Z.AI in direct competition with top coding models.

GLM-5 is Z.AI's new flagship model, and the numbers are hard to ignore: 744B total parameters, 40B active parameters, 28.5T pre-training tokens, and a 200K context window. In Z.AI's own docs, it targets agentic engineering, long-horizon tasks, and coding work that usually breaks smaller models.
The headline benchmarks matter even more. Z.AI says GLM-5 hits 77.8 on SWE-bench Verified and 56.2 on Terminal Bench 2.0, which puts it in the same conversation as the best coding systems developers actually care about.
What makes this release interesting is the mix of scale and practicality. GLM-5 is built for text only, but it supports thinking mode, function calling, structured output, streaming, and context caching, so it is clearly meant for production workflows, not just chat demos.
What Z.AI says GLM-5 is built for
Get the latest AI news in your inbox
Weekly picks of model releases, tools, and deep dives — no spam, unsubscribe anytime.
No spam. Unsubscribe at any time.
On paper, GLM-5 is aimed at agentic engineering, which is Z.AI's term for models that can plan, call tools, and keep working across long tasks without losing the thread. That includes frontend work, backend systems engineering, data processing, translation, extraction, and multi-step office tasks.

The model page also makes a strong claim about usability: GLM-5's real programming performance approaches Claude Opus 4.5. That is a bold comparison, because coding models are usually judged less by benchmark theater and more by whether they can ship usable code with minimal hand-holding.
Here are the core specs Z.AI publishes for GLM-5:
- 744B total parameters, with 40B active at inference time
- 28.5T pre-training tokens, up from 23T in the previous generation
- 200K context length and 128K maximum output tokens
- Text input and text output only
- Support for thinking mode, streaming, function calling, structured output, and context caching
That combination tells you a lot about the product strategy. Z.AI is not trying to make GLM-5 a general multimodal assistant first. It is trying to make it a long-context workhorse for coding and agents.
Why the scale jump matters
Z.AI says GLM-5 moves from 355B parameters in GLM-4.7 to 744B total parameters, while active parameters rise from 32B to 40B. That is a big jump, but the more interesting part is the training data increase from 23T to 28.5T tokens. More tokens do not guarantee better output, yet they usually help a model absorb more code patterns, instruction styles, and long-form reasoning traces.
The company also says it introduced a new asynchronous reinforcement learning framework called Slime. In plain English, that means post-training can keep going across longer agent interactions instead of treating each prompt like an isolated event. For coding and tool use, that matters because the model has to remember goals, recover from mistakes, and keep intermediate state in mind.
Another technical point worth calling out is the sparse attention design. Z.AI says it integrated DeepSeek Sparse Attention for the first time, which helps long-context performance while lowering deployment cost. That is the kind of engineering choice that matters to teams paying for inference, not just to benchmark watchers.
- GLM-5: 744B total, 40B active, 200K context
- GLM-4.7: 355B total, 32B active, 23T pre-training tokens
- GLM-5 output limit: 128K tokens
- Z.AI says sparse attention reduces deployment cost while preserving long-text quality
For developers, the takeaway is simple: GLM-5 is built to hold more state, handle more steps, and spend less of that context budget on overhead.
Benchmark claims and what they actually suggest
Z.AI's strongest public claim is that GLM-5 reaches performance alignment with Claude Opus 4.5 on software engineering tasks. The company says it leads open-weight models on widely used benchmarks, including SWE-bench Verified and Terminal Bench 2.0, with scores of 77.8 and 56.2.

Those numbers matter because they target two very different failure modes. SWE-bench Verified checks whether a model can fix real GitHub issues, while Terminal Bench 2.0 measures command-line problem solving. A model that does well on both is usually better at actual engineering work than one that only writes pretty code snippets.
"The most important thing for any AI system is whether it can do useful work for people." — Sam Altman, OpenAI DevDay 2023
Altman's line from OpenAI DevDay 2023 is still a good lens here. The benchmark story matters, but the real question is whether a model can stay useful after the first step, the second step, and the debugging loop that follows.
Z.AI also says GLM-5 outperforms GLM-4.7 across frontend development, backend systems engineering, and long-horizon execution in internal evaluations aligned with the Claude Code task distribution. That comparison matters because it points to a developer workflow, not a lab-only score.
Compared with other public coding systems, the published numbers suggest three things:
- GLM-5 is aiming directly at premium coding assistants, not low-cost chat models
- The agent benchmarks matter as much as raw code generation
- Long-context reliability is part of the product pitch, not a side feature
How developers can try it
Access is currently tied to the GLM Coding Plan, with Pro and Max tiers available for monthly use. Z.AI also says GLM-5 works with top coding tools like Claude Code and Open Code, which lowers the friction for teams that already have agent-based workflows.
The API shape is familiar enough for most teams. Z.AI exposes chat completions at api.z.ai, and the docs show support for cURL, a Python SDK, a Java SDK, and an OpenAI-style Python SDK. That makes migration easier if your stack already talks to chat-completions style endpoints.
For teams comparing options, here is the practical angle:
- If you need long-context code generation, GLM-5 is interesting because of its 200K context and 128K output ceiling
- If you care about tool use, the model's function calling and structured output matter more than raw chat quality
- If you pay for inference, sparse attention and 40B active parameters may matter as much as benchmark scores
There is also a broader ecosystem angle. Z.AI is clearly trying to make GLM-5 fit into existing agent workflows rather than forcing developers into a brand-new interface. That is smart, because adoption usually depends on how little plumbing you have to rewrite.
What GLM-5 means for the coding-model race
GLM-5 does one thing very well as a product announcement: it narrows the gap between open-weight models and the premium coding assistants people already use every day. The benchmark claims are strong, but the more important signal is the combination of 200K context, long-horizon agent support, and a pricing/access model that is already tied to a developer plan.
My read is that GLM-5 will matter most for teams building coding copilots, internal automation, and multi-step agent systems. If Z.AI's claims hold up in independent testing, the model could become a serious option for shops that want strong coding performance without locking themselves into a single vendor's toolchain.
The next question is simple: will real-world repos, flaky CI, and messy production tickets confirm the benchmark story, or expose the usual gap between lab scores and shipping code? If you are evaluating coding models this quarter, GLM-5 is worth putting on the shortlist now, not later.
// Related Articles
- [MODEL]
MiniMax-M1 brings 1M-token open reasoning model
- [MODEL]
Gemini Omni Video Review: Text Rendering Beats Rivals
- [MODEL]
Why Xiaomi’s MiMo-V2.5-Pro Changes Coding Agents More Than Chatbots
- [MODEL]
OpenAI’s Realtime Audio Models Target Live Voice
- [MODEL]
Anthropic发布10款金融AI Agent
- [MODEL]
Why Claude’s “Infinite” Context Window Still Won’t Make AI Autonomous