Tunix Hackathon Yields 1B-Parameter Gemma Reasoning Models
Google released the results of its Tunix hackathon, showcasing how developers trained small Gemma models to use reasoning traces on a strict compute budget.
On May 28, 2026, Google Developers AI published the results of the Google Tunix Hack, a competition where over 11,000 developers trained small base models to generate structured reasoning traces. The event required participants to transform Gemma-2-2B and Gemma-3-1B into models capable of producing internal logic inside <reasoning> tags before outputting a final answer.
Compute Constraints and Tooling
The competition enforced strict resource limits to simulate real-world developer constraints. Participants had a maximum of 9 hours to train their models using a single Kaggle TPU v5e-8 instance with 16G HBM per core. This forced submissions to optimize training efficiency rather than relying on brute-force parameter scaling. Developers relied on Tunix, Google’s JAX-native library for LLM post-training, to execute these pipelines.
Winning Reasoning Pipelines
The winning submissions demonstrated that complex reasoning capabilities can be distilled into 1B and 2B models using specific combinations of Group Relative Policy Optimization (GRPO) and Simple Preference Optimization (SimPO).
The first-place architecture, named G-RaR (Rubrics as Rewards), combined LoRA Supervised Fine-Tuning (SFT) with GRPO. Instead of a binary correctness check, G-RaR used a larger Gemma-3-12B model as a judge to score intermediate reasoning steps against task-specific rubrics. This method of evaluating AI output proved highly effective for structured logic tasks without requiring massive datasets.
The second-place entry, Pinocchio-1B, chained an “Act” pipeline consisting of SFT, SimPO, and GRPO, specifically optimized to complete within the 9-hour TPU session. The third-place IDEA-E Distillation project introduced a TF-IDF based reasoning reward. This reward mechanism ensured the text generated inside the reasoning tags remained substantively different from the final answer, preventing the model from simply leaking the conclusion early. An honorable mention, Gemmatron, utilized synthetic data generated by Gemini 2.5 Pro and Flash to build a cognitive backbone via SFT before applying GRPO.
Tunix Post-Training Capabilities
The hackathon served as a massive stress test for Tunix v0.1.0. Built for high-performance TPU execution and integration with MaxText, the library supports SFT, PPO, GRPO, GSPO-token, and DPO algorithms. Early benchmarking during the event showed that Tunix implementations improved the GSM8K pass@1 accuracy of Gemma-2-2B-IT by approximately 12%. This provides developers with a dedicated JAX alternative to other frameworks targeting post-training standardization.
The event also resulted in the public release of domain-specific reasoning models tailored for medical, chemistry, legal, and robotics use cases. The winning recipes, training datasets, and codebases are now available in the Tunix GitHub repository and on Kaggle. If you deploy lightweight models for domain-specific applications, you can use these open-source pipelines to add structured reasoning traces to your 1B and 2B parameter deployments without increasing inference hardware requirements.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
How to Fine-Tune Cosmos Predict 2.5 for Robotics With LoRA
Learn how to adapt NVIDIA's 2B and 14B Cosmos Predict 2.5 world foundation models using parameter-efficient fine-tuning methods like LoRA and DoRA.
AI Edge Gallery for Android Gains On-Device MCP and Gemma 4
Google updated the AI Edge Gallery Android app with experimental Model Context Protocol support, enabling on-device Gemma 4 models to use external web tools.
AutoScientist Automates Simultaneous Data and Weight Tuning
Adaption launched AutoScientist to automate model fine-tuning by optimizing training datasets and model weights simultaneously.
CyberSecQwen-4B Defeats Cisco 8B on CTI-MCQ Benchmark
Team athena19 fine-tuned a 4-billion parameter model on a single AMD MI300X GPU that outperforms Cisco's 8B model for defensive cyber threat intelligence.
How to Fine-Tune Qwen3 on AMD MI300X Using ROCm
Learn how to configure ROCm 6.1 environment variables and use the Hugging Face stack to fine-tune Qwen3-1.7B on AMD hardware without CUDA.