Ai Agents 3 min read

Holo3 Open-Weight Model Tops GPT-5.4 on Computer Use Benchmarks

H Company launches Holo3, a Sparse MoE model family that sets new OSWorld records for autonomous digital navigation and agentic task execution.

H Company released Holo3, a family of Vision-Language Models engineered specifically for autonomous computer use. The models achieve state-of-the-art performance on GUI navigation and multi-step tasks at roughly 10% of the inference cost of proprietary frontier models. If you build digital agents, this shifts the baseline for what open-weight models can accomplish on the desktop.

Sparse MoE Architecture

Holo3 relies on a sparse Mixture-of-Experts architecture. This design balances high reasoning capabilities with low inference overhead. The release includes two distinct models targeting different deployment environments.

ModelTotal ParametersActive ParametersLicenseAccess
Holo3-122B-A10B122B10BProprietaryAPI
Holo3-35B-A3B35B3BApache 2.0Weights / API

The 35B variant is fine-tuned from Qwen/Qwen3.5-35B-A3B. It gives developers a highly capable foundation for an AI agent that runs locally without cloud dependencies.

Computer Use Benchmarks

The models are optimized to perceive screen elements and execute precise actions across web, desktop, and mobile environments. On the OSWorld-Verified benchmark, Holo3 sets a new performance ceiling.

ModelOSWorld-Verified Score
Holo3-122B-A10B78.85%
Holo3-35B-A3B77.80%
GPT-5.475.00%

H Company also tested real-world readiness using a proprietary suite of 486 multi-step tasks spanning e-commerce, collaboration, business software, and multi-app workflows. The models excel at grounding tasks measured by ScreenSpot-Pro and OSWorld-G. These specific benchmarks test the precise clicking of small, densely packed UI elements.

The Agentic Learning Flywheel

Model performance stems from a specialized training pipeline called the Agentic Learning Flywheel. The pipeline generates scenario-specific navigation examples using both human and AI instructions.

It programmatically augments out-of-domain scenarios to prepare the model for unexpected UI changes and legacy software. The final step applies curated reinforcement learning on human-annotated samples. This data filtering approach sharpens multi-step reasoning capabilities when coordinating information across multiple systems.

Hardware Requirements and Deployment

Both models are available through the H Company Inference API, which offers a free tier for the 35B model. The weights for Holo3-35B-A3B are hosted on Hugging Face.

Running the open-weight model locally is practical for developers with high-end consumer hardware. Using quantization, the 35B model runs on an RTX 4070 Ti paired with 64GB of system RAM. It achieves inference speeds of 25 to 30 tokens per second under these conditions.

If your application relies on cloud-based frontier models to drive browser automation or desktop tasks, test the Holo3-35B-A3B model in your pipeline. The Apache 2.0 license and low hardware requirements make it possible to run highly capable GUI agents entirely on-device without incurring continuous per-token API costs.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading