Gemini CLI: A Deep Dive into Google's Open-Source AI Coding Agent
A comprehensive exploration of Google's Gemini CLI architecture—examining its TypeScript-based core, local agent executor, tool system, MCP integration, policy engine, and the modular design that enables extensible AI coding assistance.
Table of Contents
Introduction
Google's Gemini CLI represents Google's entry into the open-source AI coding agent space. Built entirely in TypeScript, it provides a terminal-based interface for interacting with Gemini models to accomplish coding tasks. The architecture emphasizes modularity, extensibility through MCP (Model Context Protocol), and a robust policy system for controlling agent behavior.
This post explores Gemini CLI's internal architecture by examining its TypeScript implementation. We'll understand how the agent executor manages the conversation loop, how tools are registered and invoked, how the policy engine governs behavior, and how the modular design enables both CLI and IDE integration.
Project Structure
Gemini CLI is organized as a monorepo with multiple packages. The core package contains the agent logic, tool implementations, configuration management, and utilities. The cli package provides the terminal interface using Ink (React for CLIs). Additional packages include a2a-server for agent-to-agent communication, vscode-ide-companion for IDE integration, and test-utils for testing infrastructure.
The separation between core and cli allows the same agent logic to power both terminal and IDE experiences. The core package exports everything needed to build custom integrations, while the cli package focuses specifically on the terminal user experience.
The Local Agent Executor
The LocalAgentExecutor is the heart of Gemini CLI's agent behavior. It implements a turn-based execution loop where the agent repeatedly calls the model, executes tool calls, and continues until a termination condition is met.
Agent Definition
Agents are defined through LocalAgentDefinition objects that specify the agent's name, system instruction (prompt), available tools, model configuration, and output schema. The output schema uses Zod for type-safe validation of the agent's final output.
Each agent has a toolConfig specifying which tools it can access. Tools can be referenced by name (using tools from the parent registry) or provided as complete tool definitions. This allows agents to have scoped access to tools appropriate for their purpose.
Execution Loop
The run method drives the main agent loop. It initializes a GeminiChat instance, sets up timeout handling, and iterates through turns until completion. Each turn involves calling the model with the current conversation state, processing any tool calls in the response, and determining whether to continue or stop.
The executeTurn method handles a single iteration: it optionally compresses the chat history if approaching token limits, calls the model, and processes the response. If the model returns tool calls, they're executed and results fed back for the next turn. If no tool calls are returned and the agent hasn't explicitly completed, this is treated as an error.
Completion Handling
Agents signal completion by calling a special complete_task tool. This tool isn't a traditional tool that executes code—it's a structured way for the agent to signal it's done and provide its output. The output must conform to the agent's output schema, ensuring type-safe results.
The termination can also occur through timeout (configurable per agent), external abort signals, or errors. The AgentTerminateMode enum captures these different termination reasons.
Activity Tracking
The optional ActivityCallback provides visibility into agent execution. Events include tool call starts and completions, errors, and completion. This enables UIs to display progress and logging systems to track agent behavior.
Tool System
Gemini CLI includes a comprehensive set of built-in tools and extensibility through MCP servers.
Built-in Tools
The tools directory contains implementations for common operations. File operations include read-file for reading single files with line range support, read-many-files for batch reading multiple files, write-file for creating or overwriting files, and edit for making targeted changes to existing files. The smart-edit tool provides an intelligent editing capability that can handle more complex modifications.
Search tools include grep for text pattern searching, ripGrep for ripgrep integration, glob for file pattern matching, and ls for directory listing with filtering.
Shell execution is provided by the shell tool, which runs commands in a pseudo-terminal (PTY) for proper handling of interactive programs. The tool captures output and handles timeouts.
Web tools include web-fetch for retrieving web content and web-search for performing searches. The memory tool provides persistent storage for information the agent should remember across sessions.
Tool Registry
The ToolRegistry manages tool discovery and access. Tools are registered with the registry and can be looked up by name. Each agent instance gets its own isolated registry populated with the tools it needs, preventing unintended access to tools.
The registry handles tool validation, ensuring tools meet required interfaces and are safe for the execution context (interactive vs. non-interactive mode has different safety requirements).
Tool Interface
Tools implement a consistent interface with a name, description (used by the model to understand the tool's purpose), parameter schema (JSON Schema for validation), and a build function that creates the actual tool implementation. The build function receives runtime context including configuration and services.
Tool execution returns structured results that include success/failure status, output content, and any error information. The consistent interface enables uniform handling across all tools regardless of their specific functionality.
Modifiable Tools
The modifiable-tool abstraction provides a pattern for tools that make changes requiring confirmation. When confirmation is enabled, these tools emit confirmation requests through the message bus before making changes. This integrates with the policy system to enable user control over modifications.
MCP Integration
Model Context Protocol (MCP) enables Gemini CLI to connect with external servers that provide additional tools and resources.
MCP Client
The mcp-client module implements the client-side of MCP. It manages connections to MCP servers, handles capability negotiation, and provides methods for invoking tools and accessing resources.
Connection setup involves spawning the MCP server process (typically via stdio transport), performing the initialization handshake, and discovering available capabilities. The client maintains the connection and handles reconnection if needed.
MCP Tool Wrapping
Tools from MCP servers are wrapped to present the same interface as built-in tools. The mcp-tool module handles this wrapping, converting between Gemini CLI's tool interface and MCP's tool protocol.
When the agent calls an MCP tool, the wrapper serializes the arguments, sends them to the MCP server, awaits the response, and converts the result back to the expected format. Error handling ensures MCP server failures are properly reported.
MCP Client Manager
The mcp-client-manager handles lifecycle for multiple MCP connections. Configuration specifies which MCP servers to connect to, and the manager initializes connections at startup. Tools from all connected servers are aggregated into the tool registry.
OAuth for MCP
Some MCP servers require authentication. The mcp/oauth-provider module implements OAuth2 flows for authenticating with MCP servers. Token storage persists credentials, and the refresh flow handles token expiration.
Policy Engine
The policy system governs what actions the agent can take, providing fine-grained control over behavior.
Policy Structure
Policies are defined in TOML files with rules specifying allowed and denied actions. Rules match against action types (file operations, shell commands, etc.) and can use patterns for paths and commands.
The toml-loader parses policy files and constructs the policy engine. Multiple policy files can be combined with clear precedence rules.
Policy Evaluation
When an action needs policy evaluation, the policy-engine checks it against all applicable rules. Rules can explicitly allow, deny, or require confirmation. The evaluation returns whether the action should proceed, be blocked, or prompt for user confirmation.
Approval Modes
The ApprovalMode enum captures different approval behaviors: always allow, always deny, or ask for confirmation. Tools that modify files or execute commands check the policy before proceeding, and the UI displays confirmation prompts when needed.
Confirmation Bus
The message-bus provides a channel for confirmation requests. Tools emit requests when they need approval, and the UI listens for these requests and displays prompts. Responses flow back through the bus to allow/deny the action.
This decoupling allows different UIs (CLI, IDE) to handle confirmations appropriately for their context while the core logic remains unchanged.
Configuration System
Gemini CLI has a flexible configuration system supporting multiple sources.
Config Structure
The Config class holds all runtime configuration: model settings, API credentials, working directory, available tools, policy settings, and more. Configuration is built from multiple layers that override each other.
Configuration Sources
Configuration can come from environment variables, configuration files (in the user's home directory and working directory), command-line arguments, and programmatic settings. The layering allows global defaults to be overridden per-project or per-session.
Model Configuration
The defaultModelConfigs module defines configurations for different Gemini models. Each configuration specifies the model name, context window size, token limits, and capabilities. The system can select appropriate models based on task requirements.
Storage
The Storage class handles persistent configuration storage. It manages the configuration directory structure, reads/writes configuration files, and handles migration between configuration versions.
Core Services
Several services provide shared functionality across the application.
GeminiChat
The GeminiChat class wraps communication with the Gemini API. It manages conversation history, handles streaming responses, and provides methods for sending messages and receiving responses.
Streaming is handled through the StreamEventType enum, with events for content chunks, tool calls, and completion. The chat instance maintains conversation state including message history and token counts.
Chat Compression Service
Long conversations eventually exceed token limits. The ChatCompressionService provides conversation compression, summarizing earlier parts of the conversation to reduce token count while preserving important context.
Compression is triggered automatically when approaching limits. The service extracts key information from older messages and replaces them with a condensed summary.
File Discovery Service
The fileDiscoveryService helps the agent understand the project structure. It scans directories, respects gitignore patterns, and provides the agent with contextual information about available files.
Shell Execution Service
The shellExecutionService provides a consistent interface for executing shell commands. It handles PTY allocation, output capture, timeout management, and cross-platform differences.
Telemetry
Gemini CLI includes comprehensive telemetry for understanding agent behavior and performance.
Event Logging
Events are logged throughout the system: agent starts and completions, tool calls, errors, and user interactions. The telemetry module provides typed event logging with AgentStartEvent, AgentFinishEvent, and other event types.
Prompt ID Context
The promptIdContext provides request tracing through AsyncLocalStorage. Each agent turn gets a unique prompt ID that propagates through all operations in that turn, enabling correlation of logs and metrics.
Debug Logging
The debugLogger provides verbose logging controlled by environment variables. This enables detailed tracing during development and debugging without impacting normal operation.
Hooks System
Hooks enable customization and extension of agent behavior without modifying core code.
Hook Interface
Hooks are functions that run at specific points in the agent lifecycle. They can observe events, modify behavior, or add functionality. The hooks index exports the hook system for external use.
Hook Registration
Hooks are registered through configuration and loaded at startup. The extensionLoader module handles discovering and loading hook implementations.
IDE Integration
Beyond the CLI, Gemini CLI supports IDE integration through the vscode-ide-companion package and IDE-specific infrastructure.
IDE Client
The ide-client module provides communication between the core agent and IDE extensions. It handles the protocol for commands, responses, and events.
IDE Context
The ideContext module manages IDE-specific information: open files, cursor position, selected text, and project structure. This context informs the agent about the user's current focus.
IDE Detection
The detect-ide module identifies which IDE is running and provides IDE-specific information. This enables tailored behavior for VS Code, Cursor, and other supported IDEs.
Agent-to-Agent Communication
The a2a-server package enables communication between multiple Gemini CLI agents.
A2A Protocol
Agent-to-agent communication follows a defined protocol for delegating tasks, sharing context, and coordinating work. One agent can spawn sub-agents for specialized tasks.
A2A Client Manager
The a2a-client-manager handles connections to other agents. It provides methods for invoking remote agents and receiving their results.
Frequently Asked Questions
Related Articles
Cline: Deep Dive into the Open-Source AI Coding Agent
A comprehensive technical analysis of Cline—the open-source AI coding agent for VS Code. Understanding its agentic loop architecture, Plan/Act modes, 40+ LLM providers, Model Context Protocol integration, and how it orchestrates autonomous coding tasks with human oversight.
Building AI Coding Agents: From Code Understanding to Autonomous Development
A comprehensive guide to building AI coding agents—code understanding, edit planning, test generation, iterative debugging, sandboxed execution, and production patterns for autonomous software development.
Building MCP Servers: Custom Tool Integrations for AI Agents
A comprehensive guide to building Model Context Protocol (MCP) servers—from basic tool exposure to production-grade integrations with authentication, streaming, and error handling.
Google ADK: Building Multi-Agent Systems with Agent Development Kit
A comprehensive guide to Google's Agent Development Kit (ADK)—building agents, creating tools, orchestrating multi-agent systems with subagents, and deploying to production. Includes real examples from the official adk-samples repository.
Agent Evaluation and Testing: From Development to Production
A comprehensive guide to evaluating AI agents—task success metrics, trajectory analysis, tool use correctness, sandboxing, and building robust testing pipelines for production agent systems.