Ai Engineering 4 min read

How to Use Subagents in Gemini CLI

Learn how to build and orchestrate specialized AI subagents in Gemini CLI to prevent context rot and improve development speed using isolated expert loops.

Google’s latest Gemini CLI update introduces subagents to solve context rot in long AI development sessions. Released on April 15, 2026, this architectural shift moves the CLI from a single-agent chat model to a hub-and-spoke system. You can now delegate specialized tasks to independent expert agents without bloating your primary conversation history. This guide covers how to configure custom agents, isolate tools, and manage parallel execution in your workspace.

The Hub-and-Spoke Architecture

The Gemini CLI now operates as a primary orchestrator managing specialized expert agents. Each subagent runs in an isolated context loop. When an agent reads dozens of files to map dependencies, those intermediate tool calls stay out of your main session.

This separation keeps the primary context window token-efficient and focused on your main objective. Deep file reads and complex tool iterations degrade an LLM’s ability to track the core prompt over time. By isolating these actions, subagents prevent the attention degradation common in long sessions. You can learn more about the mechanics of attention limits when evaluating context windows.

Version 0.37.2 introduced supporting infrastructure for this architecture. The release includes Chapters, a feature that provides tool-based topic grouping for complex workflows. It also added dynamic sandbox expansion for Linux and Windows environments, allowing subagents to scale their execution environments based on task requirements.

Using Built-in Agents

The CLI ships with pre-configured subagents designed for common development tasks. You can invoke them immediately to handle specialized workloads.

Agent NamePrimary Use Case
codebase_investigatorReverse-engineering and mapping complex codebase dependencies.
cli_helpAssisting with CLI-specific commands and troubleshooting.
generalistDefault specialist for varied tasks.

You can call an agent directly by prefixing your prompt with the @ symbol. Writing @codebase_investigator How does the auth system work? bypasses the primary agent and routes the request directly to the specialist.

The orchestrator can also automatically delegate tasks based on the descriptions of available subagents. If a prompt requires deep repository analysis, the orchestrator passes the context to the investigator agent without manual intervention.

Defining Custom Subagents

Custom subagents function as a contract between the orchestrator and the LLM. You define them using Markdown files with YAML frontmatter. These files specify the core parameters of the agent, including its name, description, assigned model, and allowed tools.

Agent definitions are stored in specific directories based on their intended scope. Store project-level agents in .gemini/agents/*.md to keep them version-controlled with your repository. Store user-level agents in ~/.gemini/agents/*.md for global availability across different projects. The Gemini CLI documentation provides the complete schema requirements for the YAML frontmatter.

You can assign different models based on the required task complexity. Gemini 3.1 Pro is optimized for architectural planning and complex reasoning. Gemini 3 Flash is better suited for faster, high-volume execution tasks where speed is the priority.

Tool Isolation and Permissions

Subagents support strict tool isolation. You can grant an agent a specific subset of standard tools or attach dedicated Model Context Protocol servers to specific agents.

This isolation prevents state contamination across different tasks. A subagent analyzing database schemas does not need access to your deployment credentials. Fine-grained permission control ensures agents only execute actions relevant to their defined scope. It creates a secure boundary when you implement multi-agent coordination patterns for sensitive workloads.

Parallel Execution and Delegation

The CLI supports running multiple subagents in parallel. This accelerates high-volume tasks like large-scale code reviews, dependency updates, or extensive repository research.

The system uses the Agent-to-Agent (A2A) protocol to manage these parallel workflows. The primary CLI orchestrator can delegate tasks to remote subagents rather than processing everything locally. This distributed approach prevents local resource bottlenecks during heavy computation.

Executing multiple agents concurrently consumes API credits rapidly. Each subagent maintains its own context window and token usage. Monitor your consumption closely when triggering parallel research tasks across large codebases. You may need to implement usage limits to reduce LLM API costs in production environments.

Start by mapping out the repetitive workflows in your development cycle. Create a single custom subagent in your user-level directory for your most time-consuming task to test the delegation handoff.

Get Insanely Good at AI

Get Insanely Good at AI

The book for developers who want to understand how AI actually works. LLMs, prompt engineering, RAG, AI agents, and production systems.

Keep Reading