Ai Agents 3 min read

Scaling Compute for Depth with Google Deep Research Max

Google DeepMind's Deep Research Max leverages extended test-time compute and MCP support to automate high-fidelity, private data investigations.

Google DeepMind released Deep Research Max, an autonomous agent built on the Gemini 3.1 Pro architecture. The system uses extended test-time compute to run long-horizon iterative searches across the open web and private databases. For developers building autonomous research pipelines, the release provides a direct way to trade inference speed for comprehensive reasoning.

Test-Time Compute and Planning

Deep Research Max drops low-latency streaming to prioritize deep, asynchronous execution. The agent scales compute heavily during the reasoning phase. It iteratively searches sources, identifies knowledge gaps, and refines its understanding before generating a final report.

You can utilize a new collaborative planning feature to review the proposed research strategy. This human-in-the-loop control allows you to modify the investigation’s scope before the agent executes the expensive search steps.

Private Data Integration and Output

The agent natively supports the Model Context Protocol (MCP) for enterprise data ingestion. This allows you to securely route specialized data streams from financial terminals like FactSet, S&P Global, and PitchBook directly into the agent’s environment. Developers can deploy MCP server designs that allow analysts to perform automated due diligence securely over proprietary documents.

Output generation now includes visual data synthesis. The system produces presentation-ready charts and infographics embedded directly in the text. These visuals render in standard HTML or the new Nano Banana format. If you render agent data visually, this native capability removes the need to build and maintain a secondary visualization pipeline.

Benchmark Results

The Gemini 3.1 Pro architecture and extended compute yield substantial performance gains over the December 2025 release. The underlying base model drives these improvements, jumping from 31.1% on ARC-AGI-2 to 77.1%.

BenchmarkDeep Research Max (Apr 2026)Previous Version (Dec 2025)
DeepSearchQA93.3%66.1%
Humanity’s Last Exam54.6%46.4%
ARC-AGI-2 (Base Model)77.1%31.1%

Deep Research Max leads competitor models on DeepSearchQA and BrowseComp. GPT-5.4 maintains a slight edge specifically on the Humanity’s Last Exam benchmark.

Implementation and Pricing

Google exposes this functionality through the Interactions API using two distinct endpoints. You can route low-latency tasks to deep-research-preview-04-2026 or use deep-research-max-preview-04-2026 for asynchronous, background workflows.

Both endpoints support a 1 million token context window. This large capacity is necessary for the agent to ingest massive volumes of raw data retrieved during its extended search loops. Pricing is set at $2 per million input tokens and $2 per million output tokens. Access requires a paid tier on the Gemini API or Google AI Studio. The Max capabilities are currently restricted to developer platforms and are not available in the standard Gemini consumer application.

When updating your research tools, evaluate the latency constraints of your end users. If your architecture permits overnight or background execution, routing complex queries to the Max endpoint will improve accuracy without increasing your baseline per-token costs.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading