Gemma 4 Arrives With Full Apache 2.0 License

Google DeepMind’s Gemma 4 release shifts the model family to a fully open Apache 2.0 license. The release introduces four variants designed for advanced reasoning and autonomous execution. By dropping the restrictive custom terms of previous generations, Google allows developers to deploy and redistribute these models without commercial constraints.

Model Variants and Tiers

The Gemma 4 lineup scales from edge devices to enterprise hardware across two distinct tiers. The models are derived directly from the proprietary Gemini 3 architecture.

Model	Architecture	Context Window	Target Hardware
31B Dense	Standard Dense	256K	Workstation / Cloud
26B A4B	Mixture-of-Experts (~4B active)	256K	Workstation / Low Latency
Effective 4B (E4B)	Compact Dense	128K	Laptops / High-end Mobile
Effective 2B (E2B)	Compact Dense	128K	Smartphones / IoT

The flagship 31B Dense model prioritizes raw reasoning capability and ranked third globally on the Arena AI text leaderboard at launch. The 26B A4B variant uses a sparse architecture, activating approximately 4 billion parameters per token to reduce inference latency while maintaining high output quality. The edge tier models, E4B and E2B, target consumer hardware and embedded systems like the Raspberry Pi and Jetson Nano.

Reasoning and Multimodal Architecture

The 31B Dense model achieves 89.2% on the AIME 2026 math benchmark. This performance relies on a new thinking mode that uses a dedicated <|channel>thought\n tag to output reasoning traces before generating a final response. For developers building systems that require autonomous execution, the models include native support for the system role and robust function calling capabilities.

Vision processing across all models relies on 2D spatial RoPE, which encodes image patch positions as specific x and y coordinates. Text generation uses a hybrid architecture that alternates between a sliding window and full attention at a 5:1 ratio. This structural design allows the Workstation tier to maintain its 256K context window while optimizing memory consumption.

The E4B and E2B models also process native audio through a conformer-based architecture. This allows the smaller models to perform offline edge ASR and translation directly on the device without routing through external text-to-speech APIs.

Framework Support and Ecosystem

Day-one support is available across major inference frameworks, including transformers, llama.cpp, MLX, and Unsloth. The community has already published 4-bit quantized versions (Q4_K_M) on Hugging Face.

The models are optimized for hardware ranging from the NVIDIA RTX 5090 and DGX Spark to the Apple Mac M3 Ultra and mobile platforms from Qualcomm and MediaTek. You can begin running these models locally immediately using standard open-source infrastructure. Early benchmark comparisons indicate the 31B model competes closely with Alibaba’s Qwen 3.5 27B on specific logic tasks. The E4B variant delivers high intelligence-per-parameter metrics, bringing frontier-level performance to consumer laptops.

If you build embedded AI applications or local desktop agents, the shift to Apache 2.0 simplifies your compliance requirements. You can now package and distribute the E2B and E4B models directly inside commercial mobile applications without relying on specialized enterprise licensing agreements.

Gemma 4 Arrives With Full Apache 2.0 License

Model Variants and Tiers

Reasoning and Multimodal Architecture

Framework Support and Ecosystem

Keep Reading

How to Expose Ephemeral vLLM Endpoints on Hugging Face Jobs

Gemini Omni Flash Unifies Video Generation at 10 Cents a Second

Google Drops Vision Encoders in Gemma 4 12B Multimodal Release

Google Ships 9 Gemini Omni Demos Alongside 3.5 Flash

Apache 2.0 Gets 218B Command A+ as Cohere Acquires Reliant AI