MIT-Licensed GLM-5.2 MoE Reaches 74.4% on FrontierSWE

Beijing-based AI startup Z.ai released GLM-5.2 on June 17, 2026. The new 744-billion parameter model is engineered specifically for autonomous software engineering workflows that require reasoning across massive codebases. With a 1,048,576 token context window and an explicit output limit of 131,072 tokens, the release allows agents to ingest and modify entire mid-sized repositories without aggressive summarization or chunking.

Architectural Optimizations for Long Context

GLM-5.2 relies on a mixture-of-experts architecture that activates 40 billion parameters per token. Processing one million tokens normally creates a severe compute bottleneck, but Z.ai addressed this using a novel optimization called IndexShare. By reusing the same indexer across every four sparse attention layers, the architecture reduces per-token FLOPs by 2.9× at the maximum context length.

The model also features an enhanced Multi-Token Prediction (MTP) layer for speculative decoding. This modification increases the acceptance length by up to 20%, directly improving inference speed for deterministic coding tasks. Developers can toggle between two reasoning modes, High and Max, with Z.ai recommending Max effort for complex, multi-step implementation routines.

Software Engineering Benchmark Results

GLM-5.2 shows measurable gains over the GLM-5.1 version released in April 2026, reaching parity with proprietary frontier models on key agentic tasks.

Benchmark	GLM-5.2 Score	GLM-5.1 Score	Notes
FrontierSWE	74.4%	-	Trails Claude Opus 4.8 by 1%
SWE-bench Pro	62.1%	58.4%	-
AIME 2026	99.2%	-	Measures mathematical reasoning
Terminal-Bench 2.1	81.0	63.5	-
GPQA-Diamond	91.2%	86.2%	-

Hardware Requirements and API Availability

Z.ai released the model weights under an MIT license without regional restrictions. This permissive approach arrives as recent U.S. export control directives restrict the global availability of competing frontier models.

Serving the model locally requires significant infrastructure. The full BF16 weights consume approximately 753GB of storage. Running the model at FP8 precision for inference requires a minimum cluster of eight H100 GPUs.

For cloud deployments, Z.ai provides an API priced at $1.40 per one million input tokens and $4.40 per one million output tokens. The model is also available through Fireworks AI and Cloudflare Workers AI, though the Cloudflare implementation currently enforces a 262k context limit. Because the API exposes Anthropic-compatible endpoints, developers can immediately route GLM-5.2 into existing autonomous environments like Cline, OpenClaw, and Claude Code.

If you build autonomous coding systems, GLM-5.2 offers a highly capable open-weight alternative to proprietary APIs. You should evaluate the model using the Max reasoning effort on your own internal codebase regressions, taking advantage of the Anthropic API compatibility to swap it into your existing scaffolding without rewriting your network logic.

MIT-Licensed GLM-5.2 MoE Reaches 74.4% on FrontierSWE

Architectural Optimizations for Long Context

Software Engineering Benchmark Results

Hardware Requirements and API Availability

Keep Reading

How to Govern Cursor Agent Autonomy With Auto-Review

Osaurus Pivots to Unified macOS Agent Platform With Linux VMs

Cohere Ships 30B MoE North-Mini-Code for Local Coding Agents

GLM-5.1 MoE Beats GPT-5.4 in Open-Source Engineering Milestone

How to Chain Hugging Face Spaces Using the /agents.md Endpoint