How does Gemini CLI differ from other AI coding agents?

Gemini CLI is built entirely in TypeScript with a strong emphasis on modularity and extensibility. The policy system provides fine-grained control, MCP integration enables unlimited tool extensions, and the same core powers both CLI and IDE experiences.

Can I use Gemini CLI with local models?

Currently, Gemini CLI is designed for Google's Gemini models through their API. The model configuration system could potentially support other providers, but this would require community contribution.

How does the confirmation system work?

When the agent attempts an action that requires confirmation (based on policy), it emits a request through the confirmation bus. The UI displays the request and collects the user's response, which flows back to allow or deny the action.

What's the difference between core and cli packages?

The core package contains all the agent logic, tools, and services—everything needed to run the agent. The cli package is specifically the terminal interface built with Ink. This separation allows building other interfaces (like IDE extensions) using the same core.

How can I add custom tools?

Custom tools can be added through MCP servers or by registering tools with the ToolRegistry. The tool interface requires a name, description, parameter schema, and build function that returns the tool implementation.

How does chat compression work?

When conversation history approaches token limits, the ChatCompressionService summarizes older messages. Key information is extracted and the original messages are replaced with condensed summaries, maintaining context while reducing tokens.

Gemini CLI: A Deep Dive into Google's Open-Source AI Coding Agent | Enrico Piovano

Introduction

Google's Gemini CLI represents Google's entry into the open-source AI coding agent space. Built entirely in TypeScript, it provides a terminal-based interface for interacting with Gemini models to accomplish coding tasks. The architecture emphasizes modularity, extensibility through MCP (Model Context Protocol), and a robust policy system for controlling agent behavior.

This post explores Gemini CLI's internal architecture by examining its TypeScript implementation. We'll understand how the agent executor manages the conversation loop, how tools are registered and invoked, how the policy engine governs behavior, and how the modular design enables both CLI and IDE integration.

Project Structure

Gemini CLI is organized as a monorepo with multiple packages. The core package contains the agent logic, tool implementations, configuration management, and utilities. The cli package provides the terminal interface using Ink (React for CLIs). Additional packages include a2a-server for agent-to-agent communication, vscode-ide-companion for IDE integration, and test-utils for testing infrastructure.

The separation between core and cli allows the same agent logic to power both terminal and IDE experiences. The core package exports everything needed to build custom integrations, while the cli package focuses specifically on the terminal user experience.

The Local Agent Executor

The LocalAgentExecutor is the heart of Gemini CLI's agent behavior. It implements a turn-based execution loop where the agent repeatedly calls the model, executes tool calls, and continues until a termination condition is met.

Agent Definition

Agents are defined through LocalAgentDefinition objects that specify the agent's name, system instruction (prompt), available tools, model configuration, and output schema. The output schema uses Zod for type-safe validation of the agent's final output.

Each agent has a toolConfig specifying which tools it can access. Tools can be referenced by name (using tools from the parent registry) or provided as complete tool definitions. This allows agents to have scoped access to tools appropriate for their purpose.

Execution Loop

The run method drives the main agent loop. It initializes a GeminiChat instance, sets up timeout handling, and iterates through turns until completion. Each turn involves calling the model with the current conversation state, processing any tool calls in the response, and determining whether to continue or stop.

The executeTurn method handles a single iteration: it optionally compresses the chat history if approaching token limits, calls the model, and processes the response. If the model returns tool calls, they're executed and results fed back for the next turn. If no tool calls are returned and the agent hasn't explicitly completed, this is treated as an error.

Completion Handling

Agents signal completion by calling a special complete_task tool. This tool isn't a traditional tool that executes code—it's a structured way for the agent to signal it's done and provide its output. The output must conform to the agent's output schema, ensuring type-safe results.

The termination can also occur through timeout (configurable per agent), external abort signals, or errors. The AgentTerminateMode enum captures these different termination reasons.

Activity Tracking

The optional ActivityCallback provides visibility into agent execution. Events include tool call starts and completions, errors, and completion. This enables UIs to display progress and logging systems to track agent behavior.

Tool System

Gemini CLI includes a comprehensive set of built-in tools and extensibility through MCP servers.

Built-in Tools

The tools directory contains implementations for common operations. File operations include read-file for reading single files with line range support, read-many-files for batch reading multiple files, write-file for creating or overwriting files, and edit for making targeted changes to existing files. The smart-edit tool provides an intelligent editing capability that can handle more complex modifications.

Search tools include grep for text pattern searching, ripGrep for ripgrep integration, glob for file pattern matching, and ls for directory listing with filtering.

Shell execution is provided by the shell tool, which runs commands in a pseudo-terminal (PTY) for proper handling of interactive programs. The tool captures output and handles timeouts.

Web tools include web-fetch for retrieving web content and web-search for performing searches. The memory tool provides persistent storage for information the agent should remember across sessions.

Tool Registry

The ToolRegistry manages tool discovery and access. Tools are registered with the registry and can be looked up by name. Each agent instance gets its own isolated registry populated with the tools it needs, preventing unintended access to tools.

The registry handles tool validation, ensuring tools meet required interfaces and are safe for the execution context (interactive vs. non-interactive mode has different safety requirements).

Tool Interface

Tools implement a consistent interface with a name, description (used by the model to understand the tool's purpose), parameter schema (JSON Schema for validation), and a build function that creates the actual tool implementation. The build function receives runtime context including configuration and services.

Tool execution returns structured results that include success/failure status, output content, and any error information. The consistent interface enables uniform handling across all tools regardless of their specific functionality.

Modifiable Tools

The modifiable-tool abstraction provides a pattern for tools that make changes requiring confirmation. When confirmation is enabled, these tools emit confirmation requests through the message bus before making changes. This integrates with the policy system to enable user control over modifications.

MCP Integration

Model Context Protocol (MCP) enables Gemini CLI to connect with external servers that provide additional tools and resources.

MCP Client

The mcp-client module implements the client-side of MCP. It manages connections to MCP servers, handles capability negotiation, and provides methods for invoking tools and accessing resources.

Connection setup involves spawning the MCP server process (typically via stdio transport), performing the initialization handshake, and discovering available capabilities. The client maintains the connection and handles reconnection if needed.

MCP Tool Wrapping

Tools from MCP servers are wrapped to present the same interface as built-in tools. The mcp-tool module handles this wrapping, converting between Gemini CLI's tool interface and MCP's tool protocol.

When the agent calls an MCP tool, the wrapper serializes the arguments, sends them to the MCP server, awaits the response, and converts the result back to the expected format. Error handling ensures MCP server failures are properly reported.

MCP Client Manager

The mcp-client-manager handles lifecycle for multiple MCP connections. Configuration specifies which MCP servers to connect to, and the manager initializes connections at startup. Tools from all connected servers are aggregated into the tool registry.

OAuth for MCP

Some MCP servers require authentication. The mcp/oauth-provider module implements OAuth2 flows for authenticating with MCP servers. Token storage persists credentials, and the refresh flow handles token expiration.

Policy Engine

The policy system governs what actions the agent can take, providing fine-grained control over behavior.

Policy Structure

Policies are defined in TOML files with rules specifying allowed and denied actions. Rules match against action types (file operations, shell commands, etc.) and can use patterns for paths and commands.

The toml-loader parses policy files and constructs the policy engine. Multiple policy files can be combined with clear precedence rules.

Policy Evaluation

When an action needs policy evaluation, the policy-engine checks it against all applicable rules. Rules can explicitly allow, deny, or require confirmation. The evaluation returns whether the action should proceed, be blocked, or prompt for user confirmation.

Approval Modes

The ApprovalMode enum captures different approval behaviors: always allow, always deny, or ask for confirmation. Tools that modify files or execute commands check the policy before proceeding, and the UI displays confirmation prompts when needed.

Confirmation Bus

The message-bus provides a channel for confirmation requests. Tools emit requests when they need approval, and the UI listens for these requests and displays prompts. Responses flow back through the bus to allow/deny the action.

This decoupling allows different UIs (CLI, IDE) to handle confirmations appropriately for their context while the core logic remains unchanged.

Configuration System

Gemini CLI has a flexible configuration system supporting multiple sources.

Config Structure

The Config class holds all runtime configuration: model settings, API credentials, working directory, available tools, policy settings, and more. Configuration is built from multiple layers that override each other.

Configuration Sources

Configuration can come from environment variables, configuration files (in the user's home directory and working directory), command-line arguments, and programmatic settings. The layering allows global defaults to be overridden per-project or per-session.

Model Configuration

The defaultModelConfigs module defines configurations for different Gemini models. Each configuration specifies the model name, context window size, token limits, and capabilities. The system can select appropriate models based on task requirements.

Storage

The Storage class handles persistent configuration storage. It manages the configuration directory structure, reads/writes configuration files, and handles migration between configuration versions.

Core Services

Several services provide shared functionality across the application.

GeminiChat

The GeminiChat class wraps communication with the Gemini API. It manages conversation history, handles streaming responses, and provides methods for sending messages and receiving responses.

Streaming is handled through the StreamEventType enum, with events for content chunks, tool calls, and completion. The chat instance maintains conversation state including message history and token counts.

Chat Compression Service

Long conversations eventually exceed token limits. The ChatCompressionService provides conversation compression, summarizing earlier parts of the conversation to reduce token count while preserving important context.

Compression is triggered automatically when approaching limits. The service extracts key information from older messages and replaces them with a condensed summary.

File Discovery Service

The fileDiscoveryService helps the agent understand the project structure. It scans directories, respects gitignore patterns, and provides the agent with contextual information about available files.

Shell Execution Service

The shellExecutionService provides a consistent interface for executing shell commands. It handles PTY allocation, output capture, timeout management, and cross-platform differences.

Telemetry

Gemini CLI includes comprehensive telemetry for understanding agent behavior and performance.

Event Logging

Events are logged throughout the system: agent starts and completions, tool calls, errors, and user interactions. The telemetry module provides typed event logging with AgentStartEvent, AgentFinishEvent, and other event types.

Prompt ID Context

The promptIdContext provides request tracing through AsyncLocalStorage. Each agent turn gets a unique prompt ID that propagates through all operations in that turn, enabling correlation of logs and metrics.

Debug Logging

The debugLogger provides verbose logging controlled by environment variables. This enables detailed tracing during development and debugging without impacting normal operation.

Hooks System

Hooks enable customization and extension of agent behavior without modifying core code.

Hook Interface

Hooks are functions that run at specific points in the agent lifecycle. They can observe events, modify behavior, or add functionality. The hooks index exports the hook system for external use.

Hook Registration

Hooks are registered through configuration and loaded at startup. The extensionLoader module handles discovering and loading hook implementations.

IDE Integration

Beyond the CLI, Gemini CLI supports IDE integration through the vscode-ide-companion package and IDE-specific infrastructure.

IDE Client

The ide-client module provides communication between the core agent and IDE extensions. It handles the protocol for commands, responses, and events.

IDE Context

The ideContext module manages IDE-specific information: open files, cursor position, selected text, and project structure. This context informs the agent about the user's current focus.

IDE Detection

The detect-ide module identifies which IDE is running and provides IDE-specific information. This enables tailored behavior for VS Code, Cursor, and other supported IDEs.

Agent-to-Agent Communication

The a2a-server package enables communication between multiple Gemini CLI agents.

A2A Protocol

Agent-to-agent communication follows a defined protocol for delegating tasks, sharing context, and coordinating work. One agent can spawn sub-agents for specialized tasks.

A2A Client Manager

The a2a-client-manager handles connections to other agents. It provides methods for invoking remote agents and receiving their results.

Table of Contents