Ai Agents 3 min read

MIT-Licensed GLM-5.2 MoE Reaches 74.4% on FrontierSWE

Zhipu AI has released GLM-5.2, a 744-billion parameter MoE model engineered for long-horizon agentic tasks with a stable one-million-token context window.

Beijing-based AI startup Z.ai released GLM-5.2 on June 17, 2026. The new 744-billion parameter model is engineered specifically for autonomous software engineering workflows that require reasoning across massive codebases. With a 1,048,576 token context window and an explicit output limit of 131,072 tokens, the release allows agents to ingest and modify entire mid-sized repositories without aggressive summarization or chunking.

Architectural Optimizations for Long Context

GLM-5.2 relies on a mixture-of-experts architecture that activates 40 billion parameters per token. Processing one million tokens normally creates a severe compute bottleneck, but Z.ai addressed this using a novel optimization called IndexShare. By reusing the same indexer across every four sparse attention layers, the architecture reduces per-token FLOPs by 2.9× at the maximum context length.

The model also features an enhanced Multi-Token Prediction (MTP) layer for speculative decoding. This modification increases the acceptance length by up to 20%, directly improving inference speed for deterministic coding tasks. Developers can toggle between two reasoning modes, High and Max, with Z.ai recommending Max effort for complex, multi-step implementation routines.

Software Engineering Benchmark Results

GLM-5.2 shows measurable gains over the GLM-5.1 version released in April 2026, reaching parity with proprietary frontier models on key agentic tasks.

BenchmarkGLM-5.2 ScoreGLM-5.1 ScoreNotes
FrontierSWE74.4%-Trails Claude Opus 4.8 by 1%
SWE-bench Pro62.1%58.4%-
AIME 202699.2%-Measures mathematical reasoning
Terminal-Bench 2.181.063.5-
GPQA-Diamond91.2%86.2%-

Hardware Requirements and API Availability

Z.ai released the model weights under an MIT license without regional restrictions. This permissive approach arrives as recent U.S. export control directives restrict the global availability of competing frontier models.

Serving the model locally requires significant infrastructure. The full BF16 weights consume approximately 753GB of storage. Running the model at FP8 precision for inference requires a minimum cluster of eight H100 GPUs.

For cloud deployments, Z.ai provides an API priced at $1.40 per one million input tokens and $4.40 per one million output tokens. The model is also available through Fireworks AI and Cloudflare Workers AI, though the Cloudflare implementation currently enforces a 262k context limit. Because the API exposes Anthropic-compatible endpoints, developers can immediately route GLM-5.2 into existing autonomous environments like Cline, OpenClaw, and Claude Code.

If you build autonomous coding systems, GLM-5.2 offers a highly capable open-weight alternative to proprietary APIs. You should evaluate the model using the Max reasoning effort on your own internal codebase regressions, taking advantage of the Anthropic API compatibility to swap it into your existing scaffolding without rewriting your network logic.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading