Gemma 4 Arrives With Full Apache 2.0 License
Google releases Gemma 4, a new generation of open models optimized for advanced reasoning, agentic workflows, and high-performance edge deployment.
Google DeepMind’s Gemma 4 release shifts the model family to a fully open Apache 2.0 license. The release introduces four variants designed for advanced reasoning and autonomous execution. By dropping the restrictive custom terms of previous generations, Google allows developers to deploy and redistribute these models without commercial constraints.
Model Variants and Tiers
The Gemma 4 lineup scales from edge devices to enterprise hardware across two distinct tiers. The models are derived directly from the proprietary Gemini 3 architecture.
| Model | Architecture | Context Window | Target Hardware |
|---|---|---|---|
| 31B Dense | Standard Dense | 256K | Workstation / Cloud |
| 26B A4B | Mixture-of-Experts (~4B active) | 256K | Workstation / Low Latency |
| Effective 4B (E4B) | Compact Dense | 128K | Laptops / High-end Mobile |
| Effective 2B (E2B) | Compact Dense | 128K | Smartphones / IoT |
The flagship 31B Dense model prioritizes raw reasoning capability and ranked third globally on the Arena AI text leaderboard at launch. The 26B A4B variant uses a sparse architecture, activating approximately 4 billion parameters per token to reduce inference latency while maintaining high output quality. The edge tier models, E4B and E2B, target consumer hardware and embedded systems like the Raspberry Pi and Jetson Nano.
Reasoning and Multimodal Architecture
The 31B Dense model achieves 89.2% on the AIME 2026 math benchmark. This performance relies on a new thinking mode that uses a dedicated <|channel>thought\n tag to output reasoning traces before generating a final response. For developers building systems that require autonomous execution, the models include native support for the system role and robust function calling capabilities.
Vision processing across all models relies on 2D spatial RoPE, which encodes image patch positions as specific x and y coordinates. Text generation uses a hybrid architecture that alternates between a sliding window and full attention at a 5:1 ratio. This structural design allows the Workstation tier to maintain its 256K context window while optimizing memory consumption.
The E4B and E2B models also process native audio through a conformer-based architecture. This allows the smaller models to perform offline edge ASR and translation directly on the device without routing through external text-to-speech APIs.
Framework Support and Ecosystem
Day-one support is available across major inference frameworks, including transformers, llama.cpp, MLX, and Unsloth. The community has already published 4-bit quantized versions (Q4_K_M) on Hugging Face.
The models are optimized for hardware ranging from the NVIDIA RTX 5090 and DGX Spark to the Apple Mac M3 Ultra and mobile platforms from Qualcomm and MediaTek. You can begin running these models locally immediately using standard open-source infrastructure. Early benchmark comparisons indicate the 31B model competes closely with Alibaba’s Qwen 3.5 27B on specific logic tasks. The E4B variant delivers high intelligence-per-parameter metrics, bringing frontier-level performance to consumer laptops.
If you build embedded AI applications or local desktop agents, the shift to Apache 2.0 simplifies your compliance requirements. You can now package and distribute the E2B and E4B models directly inside commercial mobile applications without relying on specialized enterprise licensing agreements.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
How Function Calling Works in LLMs
Function calling lets LLMs interact with external systems by requesting structured tool executions. Here's how the loop works, how to define tools, and what to watch for across providers.
Falcon Perception: TII's Open-Source Model for Dense Segmentation and OCR
Falcon Perception introduces an early-fusion Transformer architecture that outperforms Meta's SAM 3 in dense image segmentation and OCR-guided grounding.
IBM Releases Granite 4.0 3B Vision for Document Parsing and Chart Extraction
IBM's Granite 4.0 3B Vision is a compact multimodal model optimized for document parsing, chart-to-code extraction, and high-accuracy data retrieval.
Google Releases Veo 3.1 Lite for Low-Cost Video Generation via Gemini API
Google's new Veo 3.1 Lite model offers cost-effective 720p and 1080p video generation with native audio via the Gemini API and Google AI Studio.
Google DeepMind Releases AI Manipulation Toolkit
DeepMind's new toolkit uses human-in-the-loop studies to measure how AI models exploit cognitive vulnerabilities and identifies key manipulation tactics.