450ms Latency Desktop Automation Hits Gemini 3.5 Flash

On June 24, 2026, Google DeepMind launched a specialized computer use capability for Gemini 3.5 Flash in public preview. This release shifts the Flash variant into a high-speed agentic model designed to perceive standard digital environments. The system processes high-frequency screenshots to map UI elements to precise X,Y coordinates. It allows the model to observe a screen, move a cursor, and type text to execute complex workflows across independent applications.

The architecture focuses heavily on real-time interaction. DeepMind reports a thought-to-action latency of under 450ms. This speed makes the model viable for live desktop automation and high-volume enterprise RPA tasks that require synchronous feedback.

Technical Specifications and Benchmarks

Developers interact with the model via a new ComputerAction API. The action space supports distinct commands including mouse_move, left_click, right_click, drag_and_drop, and type_text. The model is fine-tuned to interpret dense interface elements like nested menus and small system icons.

Alongside the API, Google published results from the Gemini Computer Use Benchmark (GCUB). The test measures multi-app navigation success rates.

Model	GCUB Success Rate	Architecture
Gemini 3.5 Flash	74.2%	Agent-optimized Flash
Gemini 1.5 Pro	58.4%	General purpose (internal test)

The 15.8 percentage point improvement over previous internal generation tests reflects the specific fine-tuning applied to the 3.5 generation for translating visual context into coordinate-based actions.

Pricing and Availability

The capability is available immediately through Google Cloud Vertex AI in the us-central1 and europe-west4 regions. Developers can also test the integration using a dedicated sandbox environment in Google AI Studio.

Computer use tokens are billed at the standard Gemini 3.5 Flash rates during the preview period.

Token Type	Price per 1 Million Tokens
Input	$0.10
Output	$0.30

Industry analysts note the aggressive pricing strategy. While Anthropic previously added desktop control to Claude apps, Gemini 3.5 Flash positions itself as a lower-cost alternative for processing repetitive data entry tasks across legacy software systems lacking API support.

Agentic Guardrails and Security

Autonomous interaction with local environments introduces significant operational risks. DeepMind implemented several system-level safety controls to mitigate multi-agent AI risks.

The API includes a real-time monitoring feature that functions as a pause state. This allows developers to enforce human-in-the-loop verification before the model executes destructive or final actions, such as submitting forms or initiating purchases.

By default, the model operates under strict application and URL filters. It is prohibited from interacting with password management tools, financial platforms, and voting infrastructure. Furthermore, every screenshot processed and action executed is logged with a cryptographic Action Signature to support forensic auditing and detect manipulation.

If you build RPA workflows, evaluate the ComputerAction API for systems where traditional headless DOM navigation fails. The 450ms latency allows for synchronous operation, but implementing the human-in-the-loop pause feature is critical before deploying the model to production environments with write access.

450ms Latency Desktop Automation Hits Gemini 3.5 Flash

Technical Specifications and Benchmarks

Pricing and Availability

Agentic Guardrails and Security

Keep Reading

How to Chain Hugging Face Spaces Using the /agents.md Endpoint

Google Ships 9 Gemini Omni Demos Alongside 3.5 Flash

How to Automate Desktop Workflows With Claude Cowork

Android XR Launches With Gemini 3.5 Wearable Agent Support

Local Vision Agent IrisGo Automates Desktop Workflows via NPU