Tunix Hackathon Yields 1B-Parameter Gemma Reasoning Models

On May 28, 2026, Google Developers AI published the results of the Google Tunix Hack, a competition where over 11,000 developers trained small base models to generate structured reasoning traces. The event required participants to transform Gemma-2-2B and Gemma-3-1B into models capable of producing internal logic inside <reasoning> tags before outputting a final answer.

Compute Constraints and Tooling

The competition enforced strict resource limits to simulate real-world developer constraints. Participants had a maximum of 9 hours to train their models using a single Kaggle TPU v5e-8 instance with 16G HBM per core. This forced submissions to optimize training efficiency rather than relying on brute-force parameter scaling. Developers relied on Tunix, Google’s JAX-native library for LLM post-training, to execute these pipelines.

Winning Reasoning Pipelines

The winning submissions demonstrated that complex reasoning capabilities can be distilled into 1B and 2B models using specific combinations of Group Relative Policy Optimization (GRPO) and Simple Preference Optimization (SimPO).

The first-place architecture, named G-RaR (Rubrics as Rewards), combined LoRA Supervised Fine-Tuning (SFT) with GRPO. Instead of a binary correctness check, G-RaR used a larger Gemma-3-12B model as a judge to score intermediate reasoning steps against task-specific rubrics. This method of evaluating AI output proved highly effective for structured logic tasks without requiring massive datasets.

The second-place entry, Pinocchio-1B, chained an “Act” pipeline consisting of SFT, SimPO, and GRPO, specifically optimized to complete within the 9-hour TPU session. The third-place IDEA-E Distillation project introduced a TF-IDF based reasoning reward. This reward mechanism ensured the text generated inside the reasoning tags remained substantively different from the final answer, preventing the model from simply leaking the conclusion early. An honorable mention, Gemmatron, utilized synthetic data generated by Gemini 2.5 Pro and Flash to build a cognitive backbone via SFT before applying GRPO.

Tunix Post-Training Capabilities

The hackathon served as a massive stress test for Tunix v0.1.0. Built for high-performance TPU execution and integration with MaxText, the library supports SFT, PPO, GRPO, GSPO-token, and DPO algorithms. Early benchmarking during the event showed that Tunix implementations improved the GSM8K pass@1 accuracy of Gemma-2-2B-IT by approximately 12%. This provides developers with a dedicated JAX alternative to other frameworks targeting post-training standardization.

The event also resulted in the public release of domain-specific reasoning models tailored for medical, chemistry, legal, and robotics use cases. The winning recipes, training datasets, and codebases are now available in the Tunix GitHub repository and on Kaggle. If you deploy lightweight models for domain-specific applications, you can use these open-source pipelines to add structured reasoning traces to your 1B and 2B parameter deployments without increasing inference hardware requirements.

Tunix Hackathon Yields 1B-Parameter Gemma Reasoning Models

Compute Constraints and Tooling

Winning Reasoning Pipelines

Tunix Post-Training Capabilities

Keep Reading

How to Fine-Tune Cosmos Predict 2.5 for Robotics With LoRA

AI Edge Gallery for Android Gains On-Device MCP and Gemma 4

AutoScientist Automates Simultaneous Data and Weight Tuning

CyberSecQwen-4B Defeats Cisco 8B on CTI-MCQ Benchmark

How to Fine-Tune Qwen3 on AMD MI300X Using ROCm