How to Use Subagents in Gemini CLI
Learn how to build and orchestrate specialized AI subagents in Gemini CLI to prevent context rot and improve development speed using isolated expert loops.
Google’s latest Gemini CLI update introduces subagents to solve context rot in long AI development sessions. Released on April 15, 2026, this architectural shift moves the CLI from a single-agent chat model to a hub-and-spoke system. You can now delegate specialized tasks to independent expert agents without bloating your primary conversation history. This guide covers how to configure custom agents, isolate tools, and manage parallel execution in your workspace.
The Hub-and-Spoke Architecture
The Gemini CLI now operates as a primary orchestrator managing specialized expert agents. Each subagent runs in an isolated context loop. When an agent reads dozens of files to map dependencies, those intermediate tool calls stay out of your main session.
This separation keeps the primary context window token-efficient and focused on your main objective. Deep file reads and complex tool iterations degrade an LLM’s ability to track the core prompt over time. By isolating these actions, subagents prevent the attention degradation common in long sessions. You can learn more about the mechanics of attention limits when evaluating context windows.
Version 0.37.2 introduced supporting infrastructure for this architecture. The release includes Chapters, a feature that provides tool-based topic grouping for complex workflows. It also added dynamic sandbox expansion for Linux and Windows environments, allowing subagents to scale their execution environments based on task requirements.
Using Built-in Agents
The CLI ships with pre-configured subagents designed for common development tasks. You can invoke them immediately to handle specialized workloads.
| Agent Name | Primary Use Case |
|---|---|
codebase_investigator | Reverse-engineering and mapping complex codebase dependencies. |
cli_help | Assisting with CLI-specific commands and troubleshooting. |
generalist | Default specialist for varied tasks. |
You can call an agent directly by prefixing your prompt with the @ symbol. Writing @codebase_investigator How does the auth system work? bypasses the primary agent and routes the request directly to the specialist.
The orchestrator can also automatically delegate tasks based on the descriptions of available subagents. If a prompt requires deep repository analysis, the orchestrator passes the context to the investigator agent without manual intervention.
Defining Custom Subagents
Custom subagents function as a contract between the orchestrator and the LLM. You define them using Markdown files with YAML frontmatter. These files specify the core parameters of the agent, including its name, description, assigned model, and allowed tools.
Agent definitions are stored in specific directories based on their intended scope. Store project-level agents in .gemini/agents/*.md to keep them version-controlled with your repository. Store user-level agents in ~/.gemini/agents/*.md for global availability across different projects. The Gemini CLI documentation provides the complete schema requirements for the YAML frontmatter.
You can assign different models based on the required task complexity. Gemini 3.1 Pro is optimized for architectural planning and complex reasoning. Gemini 3 Flash is better suited for faster, high-volume execution tasks where speed is the priority.
Tool Isolation and Permissions
Subagents support strict tool isolation. You can grant an agent a specific subset of standard tools or attach dedicated Model Context Protocol servers to specific agents.
This isolation prevents state contamination across different tasks. A subagent analyzing database schemas does not need access to your deployment credentials. Fine-grained permission control ensures agents only execute actions relevant to their defined scope. It creates a secure boundary when you implement multi-agent coordination patterns for sensitive workloads.
Parallel Execution and Delegation
The CLI supports running multiple subagents in parallel. This accelerates high-volume tasks like large-scale code reviews, dependency updates, or extensive repository research.
The system uses the Agent-to-Agent (A2A) protocol to manage these parallel workflows. The primary CLI orchestrator can delegate tasks to remote subagents rather than processing everything locally. This distributed approach prevents local resource bottlenecks during heavy computation.
Executing multiple agents concurrently consumes API credits rapidly. Each subagent maintains its own context window and token usage. Monitor your consumption closely when triggering parallel research tasks across large codebases. You may need to implement usage limits to reduce LLM API costs in production environments.
Start by mapping out the repetitive workflows in your development cycle. Create a single custom subagent in your user-level directory for your most time-consuming task to test the delegation handoff.
Get Insanely Good at AI
The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.
Keep Reading
Gemini 1.5 Flash Now Does Real-Time Voice
The new Multimodal Live API enables developers to build low-latency, expressive speech-to-speech applications with advanced emotional inflection.
How to Use Subagents in Claude Code
Learn how to use modular subagents in Claude Code to isolate context, delegate specialized tasks, and optimize costs with custom AI personas.
How to Deploy Enterprise MCP with Cloudflare Workers
Learn to secure and scale Model Context Protocol deployments using Cloudflare’s reference architecture for remote MCP servers and centralized portals.
How to Use the New Unified Cloudflare CLI and Local Explorer
Learn how to use Cloudflare's new cf CLI and Local Explorer to streamline cross-product development and debug local data for AI agents and human developers.
How to Implement Multi-Agent Coordination Patterns
Learn five production-grade architectural patterns for multi-agent systems to optimize performance, hierarchy, and context management in AI engineering.