Holo3 Open-Weight Model Tops GPT-5.4 on Computer Use Benchmarks

H Company released Holo3, a family of Vision-Language Models engineered specifically for autonomous computer use. The models achieve state-of-the-art performance on GUI navigation and multi-step tasks at roughly 10% of the inference cost of proprietary frontier models. If you build digital agents, this shifts the baseline for what open-weight models can accomplish on the desktop.

Sparse MoE Architecture

Holo3 relies on a sparse Mixture-of-Experts architecture. This design balances high reasoning capabilities with low inference overhead. The release includes two distinct models targeting different deployment environments.

Model	Total Parameters	Active Parameters	License	Access
Holo3-122B-A10B	122B	10B	Proprietary	API
Holo3-35B-A3B	35B	3B	Apache 2.0	Weights / API

The 35B variant is fine-tuned from Qwen/Qwen3.5-35B-A3B. It gives developers a highly capable foundation for an AI agent that runs locally without cloud dependencies.

Computer Use Benchmarks

The models are optimized to perceive screen elements and execute precise actions across web, desktop, and mobile environments. On the OSWorld-Verified benchmark, Holo3 sets a new performance ceiling.

Model	OSWorld-Verified Score
Holo3-122B-A10B	78.85%
Holo3-35B-A3B	77.80%
GPT-5.4	75.00%

H Company also tested real-world readiness using a proprietary suite of 486 multi-step tasks spanning e-commerce, collaboration, business software, and multi-app workflows. The models excel at grounding tasks measured by ScreenSpot-Pro and OSWorld-G. These specific benchmarks test the precise clicking of small, densely packed UI elements.

The Agentic Learning Flywheel

Model performance stems from a specialized training pipeline called the Agentic Learning Flywheel. The pipeline generates scenario-specific navigation examples using both human and AI instructions.

It programmatically augments out-of-domain scenarios to prepare the model for unexpected UI changes and legacy software. The final step applies curated reinforcement learning on human-annotated samples. This data filtering approach sharpens multi-step reasoning capabilities when coordinating information across multiple systems.

Hardware Requirements and Deployment

Both models are available through the H Company Inference API, which offers a free tier for the 35B model. The weights for Holo3-35B-A3B are hosted on Hugging Face.

Running the open-weight model locally is practical for developers with high-end consumer hardware. Using quantization, the 35B model runs on an RTX 4070 Ti paired with 64GB of system RAM. It achieves inference speeds of 25 to 30 tokens per second under these conditions.

If your application relies on cloud-based frontier models to drive browser automation or desktop tasks, test the Holo3-35B-A3B model in your pipeline. The Apache 2.0 license and low hardware requirements make it possible to run highly capable GUI agents entirely on-device without incurring continuous per-token API costs.

Holo3 Open-Weight Model Tops GPT-5.4 on Computer Use Benchmarks

Sparse MoE Architecture

Computer Use Benchmarks

The Agentic Learning Flywheel

Hardware Requirements and Deployment

Keep Reading

How Cursor Built Composer 2 on Top of Kimi K2.5

OpenAI Releases 1.5B Privacy Filter MoE for PII Redaction

Holo3.1 Brings 140ms Local Computer Use Agents to 12GB GPUs

H Company Releases Holotron-12B Computer-Use Agent on Hugging Face

Varya 14B Distills Wan 2.2 for $0.005/Sec Video Generation