Holo3 Open-Weight Model Tops GPT-5.4 on Computer Use Benchmarks
H Company launches Holo3, a Sparse MoE model family that sets new OSWorld records for autonomous digital navigation and agentic task execution.
H Company released Holo3, a family of Vision-Language Models engineered specifically for autonomous computer use. The models achieve state-of-the-art performance on GUI navigation and multi-step tasks at roughly 10% of the inference cost of proprietary frontier models. If you build digital agents, this shifts the baseline for what open-weight models can accomplish on the desktop.
Sparse MoE Architecture
Holo3 relies on a sparse Mixture-of-Experts architecture. This design balances high reasoning capabilities with low inference overhead. The release includes two distinct models targeting different deployment environments.
| Model | Total Parameters | Active Parameters | License | Access |
|---|---|---|---|---|
| Holo3-122B-A10B | 122B | 10B | Proprietary | API |
| Holo3-35B-A3B | 35B | 3B | Apache 2.0 | Weights / API |
The 35B variant is fine-tuned from Qwen/Qwen3.5-35B-A3B. It gives developers a highly capable foundation for an AI agent that runs locally without cloud dependencies.
Computer Use Benchmarks
The models are optimized to perceive screen elements and execute precise actions across web, desktop, and mobile environments. On the OSWorld-Verified benchmark, Holo3 sets a new performance ceiling.
| Model | OSWorld-Verified Score |
|---|---|
| Holo3-122B-A10B | 78.85% |
| Holo3-35B-A3B | 77.80% |
| GPT-5.4 | 75.00% |
H Company also tested real-world readiness using a proprietary suite of 486 multi-step tasks spanning e-commerce, collaboration, business software, and multi-app workflows. The models excel at grounding tasks measured by ScreenSpot-Pro and OSWorld-G. These specific benchmarks test the precise clicking of small, densely packed UI elements.
The Agentic Learning Flywheel
Model performance stems from a specialized training pipeline called the Agentic Learning Flywheel. The pipeline generates scenario-specific navigation examples using both human and AI instructions.
It programmatically augments out-of-domain scenarios to prepare the model for unexpected UI changes and legacy software. The final step applies curated reinforcement learning on human-annotated samples. This data filtering approach sharpens multi-step reasoning capabilities when coordinating information across multiple systems.
Hardware Requirements and Deployment
Both models are available through the H Company Inference API, which offers a free tier for the 35B model. The weights for Holo3-35B-A3B are hosted on Hugging Face.
Running the open-weight model locally is practical for developers with high-end consumer hardware. Using quantization, the 35B model runs on an RTX 4070 Ti paired with 64GB of system RAM. It achieves inference speeds of 25 to 30 tokens per second under these conditions.
If your application relies on cloud-based frontier models to drive browser automation or desktop tasks, test the Holo3-35B-A3B model in your pipeline. The Apache 2.0 license and low hardware requirements make it possible to run highly capable GUI agents entirely on-device without incurring continuous per-token API costs.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
How Cursor Built Composer 2 on Top of Kimi K2.5
Cursor's Composer 2 is built on Kimi K2.5. What continued pretraining, reinforcement learning, and self-summarization mean, and how they work.
H Company Releases Holotron-12B Computer-Use Agent on Hugging Face
H Company released Holotron-12B, a Nemotron-based multimodal computer-use model touting higher throughput and 80.5% on WebVoyager.
What Is Mixture-of-Experts (MoE) in AI?
MoE models have a trillion parameters but only activate a fraction per token. How expert routing works, why it matters for cost, and which major models use it.
How to Deploy Mistral Small 4 for Multimodal Reasoning and Coding
Learn how to deploy Mistral Small 4 with reasoning controls, multimodal input, and optimized serving on API, Hugging Face, or NVIDIA.
Okta Launches Identity Platform for AI Agents
Okta for AI Agents enters early access with shadow agent discovery, credential vaulting, and a kill switch for rogue agents.