MIT-Licensed GLM-5.2 MoE Reaches 74.4% on FrontierSWE
Zhipu AI has released GLM-5.2, a 744-billion parameter MoE model engineered for long-horizon agentic tasks with a stable one-million-token context window.
Beijing-based AI startup Z.ai released GLM-5.2 on June 17, 2026. The new 744-billion parameter model is engineered specifically for autonomous software engineering workflows that require reasoning across massive codebases. With a 1,048,576 token context window and an explicit output limit of 131,072 tokens, the release allows agents to ingest and modify entire mid-sized repositories without aggressive summarization or chunking.
Architectural Optimizations for Long Context
GLM-5.2 relies on a mixture-of-experts architecture that activates 40 billion parameters per token. Processing one million tokens normally creates a severe compute bottleneck, but Z.ai addressed this using a novel optimization called IndexShare. By reusing the same indexer across every four sparse attention layers, the architecture reduces per-token FLOPs by 2.9× at the maximum context length.
The model also features an enhanced Multi-Token Prediction (MTP) layer for speculative decoding. This modification increases the acceptance length by up to 20%, directly improving inference speed for deterministic coding tasks. Developers can toggle between two reasoning modes, High and Max, with Z.ai recommending Max effort for complex, multi-step implementation routines.
Software Engineering Benchmark Results
GLM-5.2 shows measurable gains over the GLM-5.1 version released in April 2026, reaching parity with proprietary frontier models on key agentic tasks.
| Benchmark | GLM-5.2 Score | GLM-5.1 Score | Notes |
|---|---|---|---|
| FrontierSWE | 74.4% | - | Trails Claude Opus 4.8 by 1% |
| SWE-bench Pro | 62.1% | 58.4% | - |
| AIME 2026 | 99.2% | - | Measures mathematical reasoning |
| Terminal-Bench 2.1 | 81.0 | 63.5 | - |
| GPQA-Diamond | 91.2% | 86.2% | - |
Hardware Requirements and API Availability
Z.ai released the model weights under an MIT license without regional restrictions. This permissive approach arrives as recent U.S. export control directives restrict the global availability of competing frontier models.
Serving the model locally requires significant infrastructure. The full BF16 weights consume approximately 753GB of storage. Running the model at FP8 precision for inference requires a minimum cluster of eight H100 GPUs.
For cloud deployments, Z.ai provides an API priced at $1.40 per one million input tokens and $4.40 per one million output tokens. The model is also available through Fireworks AI and Cloudflare Workers AI, though the Cloudflare implementation currently enforces a 262k context limit. Because the API exposes Anthropic-compatible endpoints, developers can immediately route GLM-5.2 into existing autonomous environments like Cline, OpenClaw, and Claude Code.
If you build autonomous coding systems, GLM-5.2 offers a highly capable open-weight alternative to proprietary APIs. You should evaluate the model using the Max reasoning effort on your own internal codebase regressions, taking advantage of the Anthropic API compatibility to swap it into your existing scaffolding without rewriting your network logic.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
How to Govern Cursor Agent Autonomy With Auto-Review
Configure Cursor's Auto-review classifier to manage agent permissions, evaluate tool context, and prevent unauthorized actions without approval fatigue.
Osaurus Pivots to Unified macOS Agent Platform With Linux VMs
The open-source Osaurus app now routes local MLX models and cloud APIs through a hardware-isolated agent harness natively built for Apple Silicon.
Cohere Ships 30B MoE North-Mini-Code for Local Coding Agents
Cohere Labs has released North-Mini-Code-1.0, an Apache 2.0 licensed 30 billion parameter mixture-of-experts model optimized for local coding workflows.
GLM-5.1 MoE Beats GPT-5.4 in Open-Source Engineering Milestone
Zhipu AI releases GLM-5.1 under MIT license, a 744B parameter MoE model that outperforms GPT-5.4 on the SWE-Bench Pro software engineering benchmark.
How to Chain Hugging Face Spaces Using the /agents.md Endpoint
You will learn how to orchestrate text-to-image and 3D modeling tools by chaining Hugging Face Spaces together using the universal markdown tool interface.