Google Releases Gemma 4 Under Apache 2.0 License, Making Its Most Capable Open Models Fully Redistributable for the First Time

Overview

Google DeepMind released Gemma 4 on April 2, introducing four model variants that range from lightweight edge deployments to a 31-billion-parameter dense model built on the same technology that powers the proprietary Gemini 3. The release marks a significant licensing shift: for the first time in the Gemma family’s history, all models ship under the OSI-approved Apache 2.0 license, replacing the Gemma Terms of Use that previously limited redistribution and restricted certain usage categories.

The move grants developers irrevocable rights to use, reproduce, modify, and distribute the models for any purpose, including commercial deployment, with no royalty requirements and no user limits. Since the original Gemma launch, the community has downloaded Gemma models over 400 million times and created more than 100,000 variants, a collection known informally as the Gemmaverse.

Model Lineup

Gemma 4 comprises four variants targeting different hardware tiers. The 31B dense model serves as the flagship, designed for workstations and servers, and currently ranks third among open models on the Arena AI text leaderboard. The 26B mixture-of-experts model, which activates only 3.8 billion parameters per inference pass, holds the sixth position on the same leaderboard and is optimized for personal computers and consumer GPUs.

Two edge-optimized variants round out the family: the E4B and E2B models, built for smartphones, Raspberry Pi hardware, and NVIDIA Jetson Orin Nano devices. These smaller models support 128K-token context windows, native audio input for speech recognition, and offline operation with near-zero latency. The larger models extend context to 256K tokens.

Performance

Benchmark results show substantial improvements over prior Gemma generations. On AIME 2026, a mathematics reasoning benchmark, the 31B model scores 89.2 percent compared to Gemma 3’s 20.8 percent. On MMLU, a multilingual question-answering benchmark, the 31B model achieves 85.2 percent while the lightweight E2B variant reaches 60 percent. The 31B model also records 85.7 percent on GPQA Diamond, a graduate-level reasoning evaluation.

Google claims the models outperform competitors up to 20 times their size. The unquantized 31B model fits on a single 80 GB NVIDIA H100 GPU in bfloat16 format, while quantized versions can run on consumer graphics cards.

Capabilities

All Gemma 4 models support multimodal processing, including native image and video understanding with optical character recognition and chart analysis. The release introduces native function-calling and structured JSON output for agentic workflows, allowing the models to serve as autonomous agents that can interact with external tools and APIs.

The models are trained on over 140 languages and include advanced reasoning capabilities described by Google as enabling multi-step planning and deep logic chains. Code generation is supported across all variants, with the edge models specifically designed for local-first, privacy-preserving AI development that operates entirely without cloud connectivity.

The Licensing Shift in Context

The Apache 2.0 transition positions Gemma 4 differently from many competing model families. While Meta’s Llama 4 and Alibaba’s Qwen 3.6 series also provide model weights, their licensing terms vary in the restrictions they impose on commercial use and redistribution. Apache 2.0 is one of the most permissive licenses recognized by the Open Source Initiative, granting complete developer flexibility over data, infrastructure, and deployment without requiring attribution of derivative works beyond license notice preservation.

Google frames the decision around three principles: autonomy for developers to build on and modify the models, control through local execution that does not depend on cloud infrastructure, and clarity through an industry-standard license. The company has highlighted existing Gemma deployments in automated state licensing in Ukraine and multilingual support across India’s 22 official languages as examples of the ecosystem the open licensing aims to expand.

Availability

Gemma 4 weights are available immediately through Hugging Face, Kaggle, and Ollama. The 31B and 26B models are accessible via Google AI Studio, while the E4B and E2B edge variants are distributed through Google AI Edge Gallery. Supported inference frameworks include Hugging Face Transformers, vLLM, llama.cpp, and NVIDIA platforms, with hardware compatibility spanning NVIDIA Jetson through Blackwell GPUs, AMD ROCm, and Google’s Trillium and Ironwood TPUs.