AI Coding Assistants 2025: Cursor vs Copilot vs Windsurf vs Claude Code
A comprehensive comparison of AI coding assistants in 2025—Cursor, GitHub Copilot, Windsurf, Claude Code, and more. Features, pricing, use cases, and how to maximize productivity with each tool.
Table of Contents
The AI Coding Revolution
AI coding assistants have become indispensable for modern software development. In 2025, these tools have evolved from simple autocomplete to autonomous agents that can understand entire codebases, implement features across multiple files, run tests, and create pull requests.
From industry surveys: "78% of developers now use AI coding tools regularly, with productivity gains of 30-50% reported across various tasks."
This guide provides a comprehensive comparison of the leading AI coding assistants, helping you choose the right tool for your workflow.
Two Categories: IDE vs Terminal
AI coding assistants fall into two distinct categories:
IDE-Based Tools
Full graphical IDE experience with visual diff previews, inline suggestions, and GUI-based interactions.
| Tool | Base | Key Advantage |
|---|---|---|
| Cursor | VS Code fork | Composer multi-file editing |
| GitHub Copilot | VS Code/JetBrains plugin | GitHub PR integration |
| Windsurf | VS Code fork | Cascade session memory |
Best for: Visual developers, those who prefer GUI interactions, teams needing shared IDE settings.
Terminal-Based Tools
CLI-native interfaces for developers who live in the terminal.
| Tool | Interface | Key Advantage |
|---|---|---|
| OpenAI Codex | CLI + IDE Extension | GPT-5.2-Codex model, parallel agents |
| Claude Code | CLI + REPL | 200K context, deep reasoning |
| Gemini CLI | CLI | Free tier, multimodal |
| Aider | CLI | Git-native workflow |
Best for: DevOps engineers, vim/emacs users, automation scripts, CI/CD integration.
Open-Source Agentic Extensions
VS Code extensions providing autonomous coding with any model provider.
| Tool | Based On | Key Advantage |
|---|---|---|
| Cline | Original | Plan/Act modes, MCP support, browser use |
| Roo Code | Cline fork | Multi-mode (Code/Architect/Debug), boomerang tasks |
| Kilo Code | Roo Code fork | Orchestrator mode, Memory Bank, $20 free credits |
Best for: Developers wanting open-source flexibility with any AI provider.
Specialized Code Models
Open-source models optimized specifically for code generation.
| Model | Parameters | SWE-bench | Best For |
|---|---|---|---|
| Qwen 2.5 Coder | 0.5B-32B | 69.6% (32B) | Local deployment, 92 languages |
| DeepSeek Coder V2 | 16B/236B | 68.4% | Cost-effective, MoE |
| CodeLlama | 7B-70B | ~45% | Meta ecosystem |
Quick Comparison
| Tool | Type | Best For | Model Access | Pricing | Key Feature |
|---|---|---|---|---|---|
| Cursor | IDE | Power users, multi-file edits | Claude, GPT-4o, custom | $20/month | Composer mode |
| GitHub Copilot | IDE | GitHub integration, teams | GPT-4o, Claude | $10/month | Agent mode + PR creation |
| Windsurf | IDE | Session continuity, budget | GPT-4o, Claude | $15/month | Cascade memory system |
| OpenAI Codex | Terminal/IDE | OpenAI ecosystem | GPT-5.2-Codex | ChatGPT Plus | Parallel cloud agents |
| Claude Code | Terminal | Large codebases, DevOps | Claude 4 | Usage-based | 200K context, CLI-native |
| Gemini CLI | Terminal | Google ecosystem, free use | Gemini 2.5/3 | Free tier | Multimodal support |
| Cline | VS Code Ext | Open-source flexibility | Any (configurable) | Free (OSS) | Plan/Act modes |
| Roo Code | VS Code Ext | Multi-mode workflows | Any (configurable) | Free (OSS) | Boomerang tasks |
| Kilo Code | VS Code Ext | Free credits, orchestration | 500+ models | Free + credits | Memory Bank |
| Aider | Terminal | Git workflows, open source | Any (configurable) | Free (OSS) | Git-native commits |
Cursor
Overview
Cursor is a VS Code fork that pioneered IDE-native AI with deep codebase understanding. Its Composer mode enables multi-file, autonomous code generation that goes far beyond autocomplete.
Key stats:
- 40,000+ companies using Cursor
- Autocomplete powered by Supermaven (fastest in class)
- Multi-model support: Claude 4, GPT-4o, and custom models
Features Deep Dive
Composer Mode
Composer is Cursor's flagship feature—an agentic coding assistant that can:
- Understand feature requests in natural language
- Modify multiple files simultaneously
- Run terminal commands
- Execute tests and iterate on failures
- Create entire features autonomously
User: "Add user authentication with JWT tokens, including
login/logout endpoints, middleware, and tests"
Composer:
1. Creates auth/jwt.ts with token generation/verification
2. Adds routes/auth.ts with login/logout endpoints
3. Creates middleware/auth.ts for protected routes
4. Updates app.ts to register new routes
5. Generates tests/auth.test.ts
6. Runs tests, fixes any failures
Tab Autocomplete
Cursor's autocomplete (powered by Supermaven) predicts multi-line completions:
# Type: "def calculate_"
# Cursor suggests:
def calculate_total_price(items: list[dict], tax_rate: float = 0.08) -> float:
"""Calculate total price including tax."""
subtotal = sum(item['price'] * item['quantity'] for item in items)
return subtotal * (1 + tax_rate)
Codebase Indexing
Cursor indexes your entire codebase for context-aware suggestions:
# Cursor understands your existing patterns
# If your codebase uses a specific ORM pattern:
# Type: "def get_user_by_"
# Cursor suggests based on YOUR existing code:
def get_user_by_email(email: str) -> User | None:
"""Fetch user by email using existing repository pattern."""
return UserRepository.find_one({"email": email})
Chat with @ References
Reference specific files, functions, or documentation:
@auth.ts @middleware.ts How does the current auth flow work?
@docs Can you explain how to use the payment API?
@git-diff Review these changes for security issues
Cursor Configuration
Configure Cursor's behavior through settings. These control indexing depth, model selection, and context management. Getting these right significantly impacts response quality—more context helps the model understand your codebase, but too much can slow things down.
// .cursor/settings.json
{
"cursor.cpp.enableIndexing": true,
"cursor.general.gitGraphEnabled": true,
"cursor.chat.showSuggestedFiles": true,
"cursor.composer.enabled": true,
"cursor.autocomplete.enabled": true,
"cursor.autocomplete.useSupermaven": true,
// Model preferences
"cursor.models.default": "claude-sonnet-4",
"cursor.models.composer": "claude-sonnet-4",
// Context settings
"cursor.context.maxFiles": 20,
"cursor.context.includeOpenTabs": true
}
Key settings explained:
- enableIndexing: Indexes your codebase for semantic search. Essential for "find similar code" and context-aware suggestions. Disable only for huge monorepos where indexing is slow.
- useSupermaven: Uses Supermaven's fast autocomplete model. Significantly faster than GPT-based completion but may be less accurate for complex patterns.
- models.default vs models.composer: Use a faster model (Sonnet) for chat, and the same or stronger for Composer's multi-file edits where accuracy matters more than speed.
- maxFiles: How many files Cursor includes as context. Higher values give better understanding but use more tokens and cost more.
Custom Instructions
The .cursorrules file is your secret weapon for consistent code generation. It tells Cursor about your project's conventions, tech stack, and coding standards. The model reads this before every interaction, so be specific—vague rules get vague results.
Create .cursorrules for project-specific behavior:
# .cursorrules
## Project Context
This is a TypeScript backend using:
- Express.js for routing
- Prisma for database ORM
- Jest for testing
- Zod for validation
## Code Style
- Use functional programming patterns where possible
- Prefer `const` over `let`
- Use early returns to reduce nesting
- All functions must have JSDoc comments
- Error handling with custom AppError class
## Testing
- Every new function needs unit tests
- Use test factories for mock data
- Integration tests for API endpoints
## Avoid
- Any use of `any` type
- Console.log in production code
- Hardcoded strings (use constants)
Pricing
| Plan | Price | Features |
|---|---|---|
| Free | $0 | 2000 completions, 50 slow requests |
| Pro | $20/month | Unlimited completions, 500 fast requests |
| Business | $40/user/month | Team features, admin controls, SSO |
GitHub Copilot
Overview
GitHub Copilot is the most widely adopted AI coding assistant, with deep GitHub integration and a new agent mode that can autonomously create pull requests.
Key stats:
- 1.8 million paid subscribers
- Used by 77,000+ organizations
- Powers Copilot Workspace and agent mode
Features Deep Dive
Agent Mode
Copilot's agent mode can take a GitHub issue and autonomously:
- Analyze the issue requirements
- Explore the codebase for context
- Write the implementation
- Create and run tests
- Open a pull request
# Trigger agent mode from CLI
gh copilot agent --issue 123
# Or from GitHub UI:
# Click "Start agent" on any issue
Inline Suggestions
Context-aware completions as you type:
// Type a comment describing what you want:
// Function to validate email addresses using regex
// Copilot suggests:
function validateEmail(email: string): boolean {
const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
return emailRegex.test(email);
}
Copilot Chat
Integrated chat with codebase awareness:
/explain What does this function do?
/fix There's a bug in the sorting logic
/tests Generate unit tests for the selected code
/doc Add documentation to this class
Pull Request Summaries
Auto-generate PR descriptions:
## Summary
This PR adds user authentication with JWT tokens.
## Changes
- Added `auth/jwt.ts` for token management
- Created login/logout API endpoints
- Implemented auth middleware
- Added comprehensive test coverage
## Testing
- [x] Unit tests pass
- [x] Integration tests pass
- [x] Manual testing completed
Copilot Configuration
Fine-tune Copilot's behavior through VS Code settings. The most important setting is which file types get suggestions—enable for code, often disable for prose where suggestions can be distracting.
// VS Code settings.json
{
"github.copilot.enable": {
"*": true,
"markdown": true,
"plaintext": false,
"yaml": true
},
"github.copilot.advanced": {
"temperature": 0.3,
"top_p": 0.95,
"max_tokens": 500
},
"github.copilot.chat.localeOverride": "en",
"github.copilot.editor.enableAutoCompletions": true
}
Understanding the settings:
- enable by file type: The
"*": trueenables for all files, then you can override specific types. Disable forplaintextto avoid suggestions in notes/docs. - temperature: Controls randomness. Lower (0.1-0.3) for more deterministic code; higher (0.7+) for creative suggestions. 0.3 is a good default for most coding.
- top_p: Nucleus sampling parameter. 0.95 means consider tokens in the top 95% probability mass. Lower values make output more focused.
- max_tokens: Maximum length of suggestions. 500 is enough for most functions; increase for longer code blocks.
Custom Instructions
Copilot reads .github/copilot-instructions.md to understand your project's conventions. Unlike Cursor's .cursorrules, this file lives in the .github folder (so it can be repo-wide) and follows Markdown format. Include your tech stack, coding standards, and patterns you want Copilot to follow.
Create .github/copilot-instructions.md:
# Copilot Instructions
## Language & Framework
- TypeScript with strict mode
- React 18 with hooks
- TanStack Query for data fetching
- Tailwind CSS for styling
## Patterns
- Use custom hooks for reusable logic
- Prefer composition over inheritance
- Use discriminated unions for state
- Error boundaries for error handling
## Code Generation
- Include TypeScript types for all functions
- Add JSDoc for public APIs
- Generate tests using Vitest
- Use `describe`/`it` test structure
Copilot CLI
# Install
gh extension install github/gh-copilot
# Explain a command
gh copilot explain "git rebase -i HEAD~3"
# Suggest a command
gh copilot suggest "find large files in git history"
# Ask questions
gh copilot ask "How do I squash commits?"
Pricing
| Plan | Price | Features |
|---|---|---|
| Free | $0 | 2000 completions/month, limited chat |
| Individual | $10/month | Unlimited completions, chat, CLI |
| Business | $19/user/month | Organization policies, audit logs |
| Enterprise | $39/user/month | SSO, IP indemnity, advanced security |
Windsurf
Overview
Windsurf (formerly Codeium) was acquired by Cognition (makers of Devin) in 2025 and offers the best session memory with its Cascade system. It's ideal for developers who value context continuity across sessions.
Key stats:
- Acquired by Cognition
- Best-in-class session memory
- Clear diff previews before applying changes
Features Deep Dive
Cascade System
Cascade maintains context across your entire coding session:
Session 1 (Morning):
"I'm building a REST API for a blog platform"
[Discusses architecture, creates initial routes]
Session 2 (Afternoon):
"Let's add comments to posts"
[Cascade remembers the blog context, existing routes, patterns]
Session 3 (Next day):
"Add authentication"
[Still has full context of blog platform, existing models]
Flows
Pre-built workflows for common tasks:
/flow create-api
→ Walks through: endpoint design, validation, testing, docs
/flow refactor
→ Analyzes code, suggests improvements, applies changes
/flow debug
→ Examines error, traces cause, suggests fix
Diff Preview
See exactly what will change before applying:
// Windsurf shows clear diffs:
- function getUser(id) {
- return users.find(u => u.id === id);
- }
+ function getUser(id: string): User | undefined {
+ if (!id) {
+ throw new ValidationError('User ID is required');
+ }
+ return users.find(u => u.id === id);
+ }
Windsurf Configuration
Windsurf's configuration controls the Cascade memory system and completion behavior. The key differentiator is memoryDuration—set to "session" for conversation continuity across reopening the editor, or "project" to persist across sessions (useful for long-running projects).
// windsurf.config.json
{
"cascade": {
"enabled": true,
"memoryDuration": "session",
"contextWindow": 128000
},
"completions": {
"enabled": true,
"delay": 200,
"multiline": true
},
"chat": {
"model": "gpt-4o",
"temperature": 0.2
},
"flows": {
"enableBuiltIn": true,
"customFlowsPath": ".windsurf/flows"
}
}
Custom Flows
Create custom workflows:
# .windsurf/flows/create-component.yaml
name: Create React Component
description: Generate a new React component with tests and stories
steps:
- prompt: "What should this component do?"
variable: purpose
- prompt: "What props does it need?"
variable: props
- action: generate
template: |
Create a React component that:
- Purpose: {{purpose}}
- Props: {{props}}
- Include TypeScript types
- Add unit tests
- Add Storybook story
- action: create_files
files:
- "src/components/{{name}}/index.tsx"
- "src/components/{{name}}/{{name}}.test.tsx"
- "src/components/{{name}}/{{name}}.stories.tsx"
Pricing
| Plan | Price | Features |
|---|---|---|
| Free | $0 | Unlimited basic completions |
| Pro | $15/month | Cascade, Flows, premium models |
| Team | $30/user/month | Shared memory, team flows, admin |
Claude Code
Overview
Claude Code is Anthropic's terminal-native coding assistant. With a 200K token context window and deep reasoning capabilities, it excels at understanding entire codebases and complex refactoring tasks.
Key stats:
- 80.9% on SWE-bench (state-of-the-art)
- 200K token context (entire large codebases)
- Terminal-first interface
Features Deep Dive
Codebase Understanding
Claude Code can ingest entire repositories:
# Initialize Claude Code in your project
claude-code init
# Ask about the codebase
claude-code ask "How does the authentication system work?"
# Get architecture overview
claude-code ask "Explain the overall architecture and key components"
Multi-File Editing
Edit multiple files with a single command:
claude-code edit "Rename the User model to Account across the entire codebase"
# Claude Code:
# 1. Finds all references to User
# 2. Updates model definition
# 3. Updates all imports
# 4. Updates all usages
# 5. Updates tests
# 6. Shows diff for approval
Agentic Tasks
Run complex multi-step tasks:
claude-code task "Add a caching layer for the API responses"
# Claude Code autonomously:
# 1. Analyzes current API structure
# 2. Determines best caching strategy
# 3. Implements Redis caching layer
# 4. Adds cache invalidation
# 5. Updates existing endpoints
# 6. Adds configuration options
# 7. Creates tests
# 8. Documents the changes
Claude Code Configuration
Configure Claude Code through a YAML file at your project root. The context section is critical—it determines which files Claude Code indexes and can access. Be strategic: include source and test files, exclude dependencies and build artifacts. The hooks section automates pre/post-edit checks, catching type errors before you see the diff.
# claude-code.yaml
project:
name: my-app
type: typescript
context:
include:
- "src/**/*.ts"
- "tests/**/*.ts"
- "*.json"
exclude:
- "node_modules"
- "dist"
- "*.log"
preferences:
model: claude-sonnet-4
max_tokens: 8192
temperature: 0.2
style:
typescript:
strict: true
prefer_interfaces: true
use_type_imports: true
testing:
framework: vitest
coverage_threshold: 80
documentation:
style: jsdoc
require_for_public: true
hooks:
pre_edit:
- "npm run typecheck"
post_edit:
- "npm run lint:fix"
- "npm run test"
Claude Code API Integration
Use Claude Code programmatically:
from claude_code import ClaudeCode
# Initialize
cc = ClaudeCode(
api_key="your-api-key",
project_path="/path/to/project"
)
# Index the codebase
cc.index()
# Ask questions
response = cc.ask(
"What are the main API endpoints and their purposes?"
)
print(response.answer)
print(response.relevant_files)
# Edit files
result = cc.edit(
instruction="Add input validation to all API endpoints",
files=["src/routes/*.ts"],
dry_run=True # Preview changes first
)
for change in result.changes:
print(f"File: {change.file}")
print(f"Diff:\n{change.diff}")
# Apply changes
if input("Apply changes? (y/n): ") == "y":
cc.apply(result)
Shell Integration
# Add to .bashrc/.zshrc
alias cc="claude-code"
alias cca="claude-code ask"
alias cce="claude-code edit"
alias cct="claude-code task"
# Quick commands
cca "What does this function do?" -f src/utils/parser.ts
cce "Add error handling" -f src/api/routes.ts
cct "Write tests for the auth module"
Pricing
Claude Code uses usage-based pricing:
| Usage | Cost |
|---|---|
| Input tokens | $3/million tokens |
| Output tokens | $15/million tokens |
| Typical session | $0.10-0.50 |
OpenAI Codex
Overview
OpenAI Codex represents OpenAI's ambitious entry into the agentic coding space. Originally launched as a code completion model (the engine behind GitHub Copilot), Codex has evolved into a full-fledged autonomous coding agent that can understand entire codebases, execute multi-step tasks, and even run multiple parallel agents in the cloud.
The current iteration, powered by GPT-5.2-Codex, is specifically optimized for long-horizon coding tasks—meaning it excels at multi-step problems that require sustained context and planning over many tool calls and file modifications.
Key stats:
- Cloud-based parallel agent execution (up to 5 concurrent agents)
- Powered by GPT-5.2-Codex (optimized for long-horizon coding)
- Available as CLI, VS Code extension, and integrated in Cursor/Windsurf
- Included in ChatGPT Plus/Pro/Team/Enterprise subscriptions
- Sandbox execution environment for safe code running
- Native integration with GitHub for PR creation
The Evolution of Codex
Understanding Codex's history helps contextualize its current capabilities:
2021 - Codex Original: Launched as a fine-tuned GPT-3 model for code completion. Powered the original GitHub Copilot. Limited to autocomplete suggestions.
2023 - GPT-4 Code Interpreter: Code execution capabilities added via ChatGPT. Users could upload files and run Python code in a sandbox.
2024 - ChatGPT Canvas: Introduced a dedicated coding interface within ChatGPT with side-by-side editing and iterative refinement.
2025 - Codex Agent (Current): Full autonomous coding agent with:
- Cloud-based execution environment
- Parallel agent support
- Native IDE integration
- Git/GitHub workflow integration
- Persistent project understanding
Architecture Deep Dive
Codex operates fundamentally differently from local coding assistants:
Traditional (Cursor, Cline):
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Your IDE │───>│ LLM API │───>│ Response │
│ (local) │ │ (cloud) │ │ (local) │
└─────────────┘ └─────────────┘ └─────────────┘
^ │
└───────────────────────────────────────┘
(runs locally)
Codex Architecture:
┌─────────────┐ ┌─────────────────────────────┐
│ Your IDE │───>│ Codex Cloud │
│ (local) │ │ ┌─────────────────────┐ │
└─────────────┘ │ │ GPT-5.2-Codex │ │
^ │ │ ┌───────────────┐ │ │
│ │ │ │ Sandbox VM │ │ │
│ │ │ │ - Your code │ │ │
│ │ │ │ - Git clone │ │ │
│ │ │ │ - Test runner │ │ │
│ │ │ └───────────────┘ │ │
│ │ └─────────────────────┘ │
│ └─────────────────────────────┘
└──────── (PR/diff returned) ──────────────┘
This cloud-native architecture enables several unique capabilities:
- Parallel execution: Multiple agents can work simultaneously without competing for local resources
- Isolated environments: Each task runs in a fresh sandbox with your code cloned
- Full execution: Agents can actually run tests, builds, and other commands
- No local compute: Your machine isn't taxed during heavy AI processing
Features Deep Dive
Parallel Cloud Agents
Codex's most distinctive feature is running multiple agents simultaneously in the cloud. Each agent gets its own isolated environment:
# Start multiple agents working in parallel
codex task "Implement user authentication with JWT" --background &
codex task "Write comprehensive unit tests for the User model" --background &
codex task "Update API documentation with new endpoints" --background &
# Check status of all running agents
codex status
# Output:
# Agent 1: "Implement user authentication" - 67% complete
# └─ Created: auth/jwt.ts, auth/middleware.ts
# └─ Currently: Writing login route tests
#
# Agent 2: "Unit tests for User model" - 45% complete
# └─ Created: tests/user.model.test.ts
# └─ Currently: Testing edge cases
#
# Agent 3: "API documentation" - 89% complete
# └─ Updated: docs/api.md, docs/auth.md
# └─ Currently: Generating OpenAPI spec
# View specific agent details
codex agent view agent-1
# Cancel an agent
codex agent cancel agent-3
# Wait for all agents to complete
codex wait --all
Each agent works independently and produces either:
- A pull request with all changes
- A diff for local review
- Applied changes (if auto-apply is enabled)
The Sandbox Environment
Every Codex agent runs in a sandboxed Linux VM with:
Sandbox Specifications:
OS: Ubuntu 22.04 LTS
CPU: 4 vCPUs
RAM: 16GB
Storage: 50GB ephemeral
Network: Configurable (default: restricted)
Timeout: 30 minutes (default), up to 2 hours
Pre-installed:
- Node.js 20.x, 18.x
- Python 3.11, 3.10
- Go 1.21
- Rust 1.75
- Java 21
- .NET 8.0
- Docker (rootless)
- Common build tools (make, cmake, etc.)
Available on request:
- Database servers (PostgreSQL, MySQL, Redis)
- Custom Docker images
- Specific language versions
The sandbox can:
- Clone your repository (via GitHub integration or uploaded zip)
- Install dependencies (npm install, pip install, etc.)
- Run tests
- Execute build scripts
- Start development servers for testing
- Make HTTP requests (if network enabled)
# Enable network access for agents that need external APIs
codex task "Integrate Stripe payments" --sandbox-network=true
# Use a specific sandbox configuration
codex task "Build mobile app" --sandbox-config=mobile.yaml
# Custom sandbox with Docker
codex task "Test in production-like environment" \
--sandbox-docker="postgres:15,redis:7"
Agent Skills System
Skills are reusable instruction packages that standardize common operations:
# .codex/skills/security-review.yaml
name: security-review
description: Review code for security vulnerabilities
version: "1.0"
instructions: |
Perform a comprehensive security review:
1. OWASP Top 10 check:
- SQL Injection
- XSS
- CSRF
- Authentication issues
- Access control
- Security misconfiguration
- Cryptographic failures
2. Dependency vulnerabilities:
- Run npm audit / pip-audit / cargo audit
- Check for known CVEs
3. Secret detection:
- Scan for hardcoded credentials
- Check for API keys in code
- Verify .env files are gitignored
4. Code patterns:
- Unsafe deserialization
- Command injection risks
- Path traversal vulnerabilities
scripts:
pre:
- npm audit
- npx secret-scanner
post:
- npm run lint:security
output:
format: markdown
sections:
- severity_high
- severity_medium
- severity_low
- recommendations
triggers:
- on_pr: true
- on_push_to: [main, develop]
# .codex/skills/migration-helper.yaml
name: migration-helper
description: Help migrate between versions/frameworks
parameters:
from_version:
type: string
required: true
description: Current version or framework
to_version:
type: string
required: true
description: Target version or framework
instructions: |
Migrate codebase from {{from_version}} to {{to_version}}:
1. Analyze current usage patterns
2. Identify breaking changes between versions
3. Create migration plan
4. Implement changes systematically
5. Update dependencies
6. Fix type errors and deprecation warnings
7. Run test suite, fix failures
8. Update documentation
Approach:
- Make atomic commits for each logical change
- Preserve git history where possible
- Add migration notes as comments where behavior changed
Using skills:
# Run a skill
codex --skill security-review
# Run with parameters
codex --skill migration-helper \
--param from_version="React 17" \
--param to_version="React 19"
# List available skills
codex skills list
# Create skill from successful task
codex skills create-from-task task-abc123 --name my-new-skill
Interactive Mode
Codex's interactive REPL provides a conversational coding experience:
$ codex
Welcome to Codex! Type 'help' for commands.
codex> What does the payment processing module do?
The payment processing module (src/payments/) handles:
- Payment intent creation via Stripe
- Webhook handling for async payment events
- Refund processing
- Subscription management
Key files:
- stripe.ts: Stripe SDK wrapper
- webhooks.ts: Event handlers
- subscriptions.ts: Recurring billing logic
codex> There's a bug where refunds sometimes fail silently. Can you investigate?
I'll investigate the refund handling. Let me:
1. Read the refund code in src/payments/refunds.ts
2. Check the error handling patterns
3. Look at recent related commits
[Reading files...]
I found the issue. In `processRefund()` on line 45, errors from the Stripe
API are caught but not properly propagated:
```typescript
// Current (problematic):
try {
await stripe.refunds.create({ payment_intent: intentId });
} catch (e) {
console.error(e); // Error logged but not thrown
}
// Should be:
try {
await stripe.refunds.create({ payment_intent: intentId });
} catch (e) {
console.error('Refund failed:', e);
throw new RefundError(e.message, { intentId, originalError: e });
}
Would you like me to fix this and add proper error handling throughout the payments module?
codex> Yes, fix it and add tests
Starting task in background... Agent ID: agent-7f3a2b Use 'codex status' to monitor progress.
codex> /status
Agent agent-7f3a2b: "Fix refund error handling" Status: In Progress (43%) ├─ ✅ Fixed src/payments/refunds.ts ├─ ✅ Added RefundError class ├─ 🔄 Writing tests for refund.test.ts └─ ⏳ Pending: Integration test updates
codex> /quit
#### CLI Commands Reference
```bash
# Authentication
codex auth login # Sign in with ChatGPT account
codex auth logout # Sign out
codex auth status # Check authentication status
# Basic operations
codex "your prompt" # Quick one-shot task
codex ask "question" # Ask about codebase (no changes)
codex edit "instruction" # Edit specific files
codex task "description" # Full autonomous task
# Agent management
codex status # List all agents
codex agent view <id> # View agent details
codex agent logs <id> # Stream agent logs
codex agent cancel <id> # Cancel running agent
codex wait <id> # Wait for agent completion
codex wait --all # Wait for all agents
# Skills
codex --skill <name> # Run a skill
codex skills list # List available skills
codex skills show <name> # Show skill details
codex skills create # Create new skill
codex skills delete <name> # Delete a skill
# Configuration
codex config show # Show current config
codex config set <k> <v> # Set config value
codex init # Initialize project
# Git integration
codex pr create # Create PR from agent result
codex pr list # List Codex-created PRs
# Advanced
codex --model <name> # Use specific model
codex --sandbox-network # Enable network in sandbox
codex --timeout <mins> # Set task timeout
codex --verbose # Verbose output
codex --dry-run # Preview without executing
Configuration
Codex supports both global and project-level configuration:
# .codex/config.yaml (project-level)
# Model selection
model: gpt-5.2-codex # Default model
model_fallback: gpt-4o # Fallback if primary unavailable
# Approval modes
approval_mode: suggest # suggest | auto-edit | full-auto
# suggest: Shows diff, waits for approval
# auto-edit: Applies non-breaking changes automatically
# full-auto: Applies all changes (use with caution)
# Context configuration
context:
include:
- "src/**/*"
- "tests/**/*"
- "docs/**/*.md"
- "*.json"
- "*.yaml"
exclude:
- "node_modules/**"
- "dist/**"
- "*.log"
- ".env*"
max_files: 100 # Maximum files to include
max_file_size: 100KB # Skip files larger than this
# Skills configuration
skills:
directory: .codex/skills
enabled:
- security-review
- migration-helper
disabled:
- experimental-feature
# Sandbox configuration
sandbox:
enabled: true
network: false # Disable network by default
timeout: 1800 # 30 minutes
memory: 16GB
docker_images: # Pre-pull these images
- node:20
- postgres:15
# Git integration
git:
auto_commit: false # Auto-commit changes
commit_prefix: "[codex]" # Prefix for commits
branch_prefix: "codex/" # Prefix for branches
create_pr: prompt # prompt | auto | never
# Output preferences
output:
format: detailed # minimal | detailed | verbose
show_thinking: false # Show agent reasoning
syntax_highlight: true
# Safety settings
safety:
max_files_modified: 20 # Warn if more files changed
require_tests: true # Require tests for new code
no_force_push: true # Prevent force pushes
protected_files: # Never modify these
- ".env"
- "secrets.yaml"
# ~/.codex/config.yaml (global defaults)
# Global preferences
default_model: gpt-5.2-codex
theme: dark
editor: code # Editor for viewing diffs
# Authentication
api_base_url: https://api.openai.com/v1
# Usage limits (for cost control)
limits:
daily_tasks: 50
concurrent_agents: 3
warn_at_cost: 10.00
# Telemetry
telemetry: false
IDE Integration
VS Code Extension
The Codex VS Code extension provides a rich GUI experience:
// VS Code settings.json
{
"codex.enabled": true,
"codex.model": "gpt-5.2-codex",
// Inline suggestions (like Copilot)
"codex.inlineSuggestions.enabled": true,
"codex.inlineSuggestions.debounceMs": 200,
// Agent panel
"codex.agentPanel.position": "right",
"codex.agentPanel.showOnStart": true,
// Auto-apply settings
"codex.autoApply.readOnlyOperations": true,
"codex.autoApply.formatting": true,
"codex.autoApply.imports": true,
"codex.autoApply.codeChanges": false,
// Keybindings
"codex.keybindings.triggerAgent": "ctrl+shift+c",
"codex.keybindings.explainSelection": "ctrl+shift+e",
"codex.keybindings.fixError": "ctrl+shift+f"
}
VS Code features:
- Agent Panel: View and manage running agents
- Inline Diff Preview: See changes before applying
- Context Menu: Right-click to trigger Codex actions
- Problems Integration: Codex can auto-fix diagnostics
- Terminal Integration: Run Codex commands in integrated terminal
Cursor Integration
Codex works alongside Cursor's native features:
// Cursor settings
{
// Use Codex for agentic tasks, Cursor for autocomplete
"cursor.autocomplete.enabled": true,
"codex.inlineSuggestions.enabled": false,
// Trigger Codex from Cursor
"codex.cursorIntegration": true,
"codex.cursorHotkey": "ctrl+shift+o"
}
Advanced Usage Patterns
Batch Operations
Process multiple tasks efficiently:
# Create a batch file
cat > tasks.yaml << EOF
tasks:
- name: "Add TypeScript types"
files: ["src/legacy/*.js"]
instruction: "Convert to TypeScript with strict types"
- name: "Add error handling"
files: ["src/api/*.ts"]
instruction: "Add proper error handling with custom error classes"
- name: "Generate tests"
files: ["src/utils/*.ts"]
instruction: "Generate comprehensive unit tests"
depends_on: ["Add TypeScript types"]
EOF
# Run batch
codex batch tasks.yaml
# Monitor batch progress
codex batch status
CI/CD Integration
Use Codex in automated pipelines:
# .github/workflows/codex-review.yaml
name: Codex Code Review
on:
pull_request:
types: [opened, synchronize]
jobs:
review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run Codex Security Review
uses: openai/codex-action@v2
with:
skill: security-review
api-key: ${{ secrets.OPENAI_API_KEY }}
- name: Run Codex Code Quality
uses: openai/codex-action@v2
with:
prompt: |
Review this PR for:
- Code quality issues
- Performance problems
- Missing tests
- Documentation gaps
comment-on-pr: true
# GitLab CI integration
codex-review:
stage: review
image: openai/codex-cli:latest
script:
- codex auth login --token $OPENAI_API_KEY
- codex --skill code-review --output=report.md
- codex --skill security-review --output=security.md
artifacts:
reports:
codequality: report.md
Custom Model Configuration
Use Codex with specific model settings:
# Use a specific model version
codex --model gpt-5.2-codex-20250115 "task description"
# Adjust generation parameters
codex --temperature 0.2 --max-tokens 8192 "precise task"
# Use with fine-tuned model
codex --model ft:gpt-5.2-codex:my-org:my-finetune "specialized task"
Pricing
Codex is included in ChatGPT subscriptions with varying limits:
| Plan | Monthly Cost | Agent Minutes | Parallel Agents | Features |
|---|---|---|---|---|
| ChatGPT Plus | $20/month | 600 min | 2 | CLI + VS Code |
| ChatGPT Pro | $200/month | Unlimited | 5 | Priority, all features |
| ChatGPT Team | $25/user/month | 1000 min/user | 3/user | Admin controls |
| Enterprise | Custom | Unlimited | 10+/user | SSO, audit logs, SLA |
API pricing (for programmatic access):
| Model | Input | Output |
|---|---|---|
| gpt-5.2-codex | $3.00/1M tokens | $12.00/1M tokens |
| gpt-5.2-codex-mini | $0.50/1M tokens | $2.00/1M tokens |
Sandbox compute:
- Included in subscription limits
- Additional compute: $0.02/minute
Codex vs Other Tools
| Capability | Codex | Claude Code | Cursor | Cline |
|---|---|---|---|---|
| Execution model | Cloud sandbox | Local | Local | Local |
| Parallel agents | ✅ 2-10 | ❌ No | ❌ No | ❌ No |
| Test execution | ✅ In sandbox | ✅ Local | ✅ Local | ✅ Local |
| Context window | 128K | 200K | 128K | Model-dependent |
| Offline use | ❌ No | ✅ Yes | ⚠️ Limited | ✅ Yes |
| IDE integration | ✅ Good | ⚠️ CLI-focused | ✅ Native | ✅ VS Code |
| Cost model | Subscription | Pay-per-use | Subscription | Pay-per-use |
When to choose Codex:
- You need parallel agent execution
- You want isolated sandbox environments
- You're already paying for ChatGPT Plus/Pro
- You need GitHub integration for PR creation
- You prefer subscription pricing over pay-per-use
When to choose alternatives:
- You need offline capability
- You require 200K+ token context
- You prefer local execution for privacy
- You want to use Claude or other non-OpenAI models
Gemini CLI
Overview
Gemini CLI is Google's terminal-first coding assistant, offering access to Gemini 2.5/3 models with a generous free tier and deep integration with Google Cloud.
Key stats:
- Free tier with Gemini 2.5 Flash
- 1M token context window
- Native multimodal support (images, diagrams)
Features Deep Dive
Interactive Mode
# Start interactive session
gemini
> Explain the architecture of this project
[Analyzes codebase, provides overview]
> Generate a REST API for user management
[Creates complete API with routes, controllers, models]
> /image architecture.png What improvements would you suggest?
[Analyzes architecture diagram, provides recommendations]
Code Generation
# Generate code from description
gemini generate "A TypeScript function that validates credit card numbers using the Luhn algorithm"
# Output:
function validateCreditCard(cardNumber: string): boolean {
// Remove spaces and dashes
const cleaned = cardNumber.replace(/[\s-]/g, '');
// Check if only digits
if (!/^\d+$/.test(cleaned)) {
return false;
}
// Luhn algorithm
let sum = 0;
let isEven = false;
for (let i = cleaned.length - 1; i >= 0; i--) {
let digit = parseInt(cleaned[i], 10);
if (isEven) {
digit *= 2;
if (digit > 9) {
digit -= 9;
}
}
sum += digit;
isEven = !isEven;
}
return sum % 10 === 0;
}
File Operations
# Explain a file
gemini explain src/complex-algorithm.ts
# Review code
gemini review --security src/auth/
# Refactor
gemini refactor "Convert callbacks to async/await" src/legacy/
# Generate tests
gemini test src/utils/validation.ts -o tests/
Gemini CLI Configuration
# ~/.gemini/config.yaml
model: gemini-2.5-flash # or gemini-3-flash for paid
temperature: 0.2
max_tokens: 8192
context:
auto_include_open_files: true
max_context_files: 50
output:
format: markdown
syntax_highlighting: true
project_detection:
enabled: true
config_files:
- package.json
- pyproject.toml
- Cargo.toml
- go.mod
Pricing
| Tier | Model | Price |
|---|---|---|
| Free | Gemini 2.5 Flash | $0 (rate limited) |
| Pay-as-you-go | Gemini 2.5 Pro | $0.075/1K input tokens |
| Pay-as-you-go | Gemini 3 Flash | $0.50/1M input tokens |
Aider (Open Source)
Overview
Aider is the leading open-source terminal-based coding assistant. It's git-native, meaning it automatically commits changes with descriptive messages.
Key stats:
- 100% open source (Apache 2.0)
- Works with any LLM (Claude, GPT-4, Llama, etc.)
- Git-native: automatic commits with good messages
- Active community and frequent updates
Features Deep Dive
Git-Native Workflow
Aider automatically commits each change:
# Start aider
aider
> Add input validation to the login form
# Aider:
# 1. Analyzes the codebase
# 2. Makes changes to relevant files
# 3. Automatically creates a git commit:
# "feat: Add input validation to login form
# - Added email format validation
# - Added password strength requirements
# - Added error message display"
Multi-Model Support
# Use Claude
aider --model claude-3-5-sonnet
# Use GPT-4
aider --model gpt-4o
# Use local Ollama model
aider --model ollama/llama3.1
# Use DeepSeek
aider --model deepseek/deepseek-chat
Watch Mode
Aider watches for file changes and responds:
# Start in watch mode
aider --watch
# Now edit files manually or with another tool
# Aider sees changes and can help integrate them
Architect Mode
For planning before coding:
aider --architect
> Plan a microservices architecture for an e-commerce platform
# Aider creates a plan, then you approve before implementation
Aider Configuration
# .aider.conf.yml
model: claude-3-5-sonnet
auto-commits: true
commit-prompt: conventional # conventional commits format
# File handling
auto-lint: true
lint-cmd: npm run lint
test-cmd: npm test
# Git settings
attribute-author: true
attribute-committer: true
dirty-commits: false
# Context
map-tokens: 1024
map-refresh: auto
subtree-only: false
Usage Examples
# Add files to chat context
aider src/auth/*.ts tests/auth/*.ts
# Ask questions
aider --message "Explain how the auth flow works"
# Make changes
aider --message "Add rate limiting to the login endpoint"
# Review and fix
aider --message "Review this code for security issues and fix them"
# Generate tests
aider --message "Add comprehensive tests for the User model"
Aider Scripting
#!/bin/bash
# automated-review.sh
# Run aider in non-interactive mode for CI
aider --yes --message "
Review the changes in this PR for:
1. Security vulnerabilities
2. Performance issues
3. Code style violations
Fix any issues found.
" --model claude-3-5-sonnet
# Check if aider made changes
if [ -n "$(git status --porcelain)" ]; then
echo "Aider found and fixed issues"
git push
fi
Pricing
Aider itself is free. You pay for the LLM API:
| Model | Cost per 1M tokens |
|---|---|
| Claude 3.5 Sonnet | 15 output |
| GPT-4o | 10 output |
| Local (Ollama) | Free |
| DeepSeek | 0.28 output |
Cline
Overview
Cline is the original open-source autonomous coding agent for VS Code. It pioneered the human-in-the-loop approach where every action requires approval, making it safe for production use.
Key stats:
- 100% open-source (Apache 2.0)
- Works with any API provider (OpenAI, Anthropic, local models)
- Plan/Act mode separation for strategic thinking
- MCP (Model Context Protocol) support for extensibility
Features Deep Dive
Plan & Act Mode
Cline's unique approach separates thinking from doing:
Plan Mode (Read-Only):
├── Explore codebase
├── Analyze architecture
├── Create implementation strategy
└── No file modifications allowed
Act Mode (Execute):
├── Implement planned changes
├── Run commands
├── Create/modify files
└── Human approval for each action
# Configure different models for each mode
# Plan: Use cheaper/faster model for exploration
# Act: Use more capable model for implementation
# In Cline settings:
Plan Model: deepseek-chat (cost-effective)
Act Model: claude-sonnet-4 (high quality)
Browser Integration
Cline can interact with web browsers for testing:
// Cline can:
// 1. Launch a headless browser
// 2. Navigate to your local dev server
// 3. Click, type, scroll
// 4. Capture screenshots
// 5. Read console logs
// 6. Fix runtime errors it discovers
// Example task:
"Test the login flow and fix any errors you find"
// Cline will:
// - Start dev server
// - Open browser to localhost:3000
// - Try to log in
// - Capture any errors
// - Fix the code
// - Verify the fix works
MCP Tool Creation
Extend Cline with custom tools:
{
"mcpServers": {
"database": {
"command": "node",
"args": ["./mcp-servers/database.js"],
"tools": [
{
"name": "query_database",
"description": "Execute read-only SQL queries",
"inputSchema": {
"type": "object",
"properties": {
"query": { "type": "string" }
}
}
}
]
}
}
}
Configuration
// .vscode/settings.json
{
"cline.apiProvider": "anthropic",
"cline.apiKey": "${env:ANTHROPIC_API_KEY}",
"cline.model": "claude-sonnet-4",
"cline.planMode": {
"enabled": true,
"model": "deepseek-chat"
},
"cline.autoApprove": {
"readFiles": true,
"listFiles": true,
"writeFiles": false,
"executeCommands": false
},
"cline.contextWindow": 128000,
"cline.maxTokens": 8192
}
Pricing
Cline is free. You pay only for API usage:
| Provider | Model | Cost |
|---|---|---|
| Anthropic | Claude Sonnet 4 | 15 per 1M tokens |
| OpenAI | GPT-4o | 10 per 1M tokens |
| DeepSeek | DeepSeek Chat | 0.28 per 1M tokens |
| Local | Ollama/LM Studio | Free |
Roo Code
Overview
Roo Code (formerly Roo Cline) is a fork of Cline with enhanced multi-mode capabilities. It provides specialized "personas" for different tasks and innovative features like boomerang tasks.
Key stats:
- 900K+ VS Code Marketplace installs
- Multi-mode: Code, Architect, Ask, Debug, Custom
- Boomerang tasks for complex workflows
- Stable codebase indexing
Features Deep Dive
Multi-Mode System
Different modes for different tasks:
Modes:
├── Code Mode
│ └── Implementation focused, writes code
├── Architect Mode
│ └── Planning focused, designs solutions
├── Ask Mode
│ └── Q&A focused, explains code
├── Debug Mode
│ └── Bug-finding focused, traces issues
└── Custom Modes
└── User-defined personas
// Configure different models per mode
const modeConfig = {
architect: {
model: "o3", // Best for planning
systemPrompt: "You are a senior software architect..."
},
code: {
model: "claude-sonnet-4", // Best for implementation
systemPrompt: "You are an expert programmer..."
},
debug: {
model: "gpt-4o", // Good at finding issues
systemPrompt: "You are a debugging expert..."
}
};
Boomerang Tasks
Chain complex workflows across modes:
Boomerang Task Flow:
1. User: "Add user authentication"
↓
2. Architect Mode: Plans the implementation
- Designs auth flow
- Identifies files to create/modify
- Creates task breakdown
↓
3. Code Mode: Implements each task
- Creates auth middleware
- Adds login/logout routes
- Updates database schema
↓
4. Debug Mode: Verifies implementation
- Runs tests
- Checks for security issues
- Validates edge cases
↓
5. Returns to User: Complete with summary
Codebase Indexing
Persistent understanding of your project:
# Roo Code indexes your codebase
# - Function definitions
# - Class hierarchies
# - Import relationships
# - Test coverage mapping
# Query the index
"What functions call the validateUser method?"
"Show me all API endpoints that don't have tests"
"Find all places where we handle authentication errors"
Configuration
// .roo/config.json
{
"modes": {
"architect": {
"model": "anthropic/claude-sonnet-4",
"temperature": 0.3
},
"code": {
"model": "anthropic/claude-sonnet-4",
"temperature": 0.2
},
"debug": {
"model": "openai/gpt-4o",
"temperature": 0.1
}
},
"boomerang": {
"enabled": true,
"autoTransition": true
},
"indexing": {
"enabled": true,
"excludePaths": ["node_modules", "dist", ".git"]
}
}
Pricing
Free and open-source. Pay only for AI provider APIs.
Kilo Code
Overview
Kilo Code is a fork of Roo Code with additional enterprise features, including the Orchestrator mode, Memory Bank, and generous free credits for new users.
Key stats:
- #1 on OpenRouter (6.1 trillion tokens/month)
- 750K+ active users
- $20 free credits for new users
- Access to 500+ AI models
Features Deep Dive
Orchestrator Mode
Automatically chains specialized agents:
Orchestrator Flow:
User: "Build a REST API for a todo app"
Orchestrator:
├── 1. Architect Agent
│ └── Designs API structure, endpoints, database schema
├── 2. Code Agent
│ └── Implements routes, controllers, models
├── 3. Test Agent
│ └── Writes unit and integration tests
├── 4. Documentation Agent
│ └── Generates API docs and README
└── 5. Review Agent
└── Checks for issues, suggests improvements
Memory Bank
Persistent project context across sessions:
// Memory Bank stores:
// - Project architecture decisions
// - Coding conventions used
// - Previous task context
// - User preferences
// Example: Memory Bank remembers your patterns
Session 1: "Use Prisma for database"
Session 2: "Add a new model"
// Kilo automatically uses Prisma, not raw SQL
// Memory Bank file structure
.kilo/
├── memory/
│ ├── architecture.md # Design decisions
│ ├── conventions.md # Code style
│ ├── context.md # Ongoing work
│ └── preferences.md # User preferences
Cross-Platform Sync
Work across devices:
# Start on mobile (Kilo mobile app)
"Plan a caching system for the API"
# Continue on desktop (VS Code)
"Implement the caching plan"
# Finish on laptop (Cursor)
"Add tests for the cache"
# All context is preserved across devices
Configuration
// kilo.config.json
{
"orchestrator": {
"enabled": true,
"agents": ["architect", "code", "test", "docs", "review"]
},
"memoryBank": {
"enabled": true,
"syncToCloud": true
},
"models": {
"default": "claude-sonnet-4",
"planning": "gemini-3-pro",
"coding": "claude-sonnet-4",
"review": "gpt-4o"
},
"credits": {
"warnAt": 5.00,
"hardLimit": null
}
}
Pricing
| Tier | Cost | Features |
|---|---|---|
| Free Start | 20 credit | Full access while credits last |
| Pay-as-you-go | Provider rates | No markup, transparent pricing |
| Promotions | Up to $120 free | Check kilo.ai for current offers |
Qwen Coder
Overview
Qwen 2.5 Coder is Alibaba's state-of-the-art open-source code model, available in sizes from 0.5B to 32B parameters. It's not a tool itself, but powers many tools and can be run locally.
Key stats:
- 69.6% on SWE-bench (32B model)
- 92 programming languages
- 128K context window
- Apache 2.0 license
Using Qwen Coder
With Ollama (Local)
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull Qwen Coder
ollama pull qwen2.5-coder:32b
# Run interactively
ollama run qwen2.5-coder:32b
# Or use with other tools
With Aider
# Use Qwen Coder with Aider
aider --model ollama/qwen2.5-coder:32b
# Or via OpenRouter
aider --model openrouter/qwen/qwen-2.5-coder-32b-instruct
With Cline/Roo Code/Kilo Code
// Configure in VS Code settings
{
"cline.apiProvider": "ollama",
"cline.ollamaBaseUrl": "http://localhost:11434",
"cline.model": "qwen2.5-coder:32b"
}
Direct API Usage
from openai import OpenAI
# Via OpenRouter
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="your-openrouter-key"
)
response = client.chat.completions.create(
model="qwen/qwen-2.5-coder-32b-instruct",
messages=[
{
"role": "system",
"content": "You are an expert programmer."
},
{
"role": "user",
"content": "Write a Python function to validate email addresses"
}
]
)
print(response.choices[0].message.content)
Model Variants
| Model | Parameters | VRAM Required | Best For |
|---|---|---|---|
| qwen2.5-coder:0.5b | 0.5B | 1GB | Edge devices |
| qwen2.5-coder:1.5b | 1.5B | 2GB | Light tasks |
| qwen2.5-coder:7b | 7B | 8GB | Good balance |
| qwen2.5-coder:14b | 14B | 16GB | Strong performance |
| qwen2.5-coder:32b | 32B | 32GB | Best quality |
Benchmark Performance
| Benchmark | Qwen 2.5 Coder 32B | GPT-4o | Claude Sonnet |
|---|---|---|---|
| HumanEval | 92.7% | 90.2% | 92.0% |
| MBPP | 90.2% | 88.1% | 89.5% |
| SWE-bench | 69.6% | 68.4% | 72.7% |
| MultiPL-E | 75.2% | 73.8% | 74.1% |
Pricing
| Method | Cost |
|---|---|
| Local (Ollama) | Free (your hardware) |
| OpenRouter | 0.18 per 1M tokens |
| Together AI | 0.24 per 1M tokens |
| Alibaba Cloud | Variable |
Head-to-Head Comparison
Autocomplete Speed
Based on benchmarks (lower is better):
| Tool | Time to First Suggestion | Multi-line Support |
|---|---|---|
| Cursor (Supermaven) | ~50ms | Excellent |
| GitHub Copilot | ~150ms | Good |
| Windsurf | ~120ms | Good |
| Cline/Roo/Kilo | ~200ms (depends on model) | Good |
| Claude Code | N/A (not autocomplete-focused) | N/A |
| Gemini CLI | ~200ms | Good |
Agentic Capabilities
| Capability | Cursor | Copilot | Windsurf | Cline/Roo/Kilo | Claude Code | Codex |
|---|---|---|---|---|---|---|
| Multi-file editing | ✅ Excellent | ✅ Good | ✅ Good | ✅ Excellent | ✅ Excellent | ✅ Excellent |
| Autonomous PR creation | ⚠️ Limited | ✅ Excellent | ⚠️ Limited | ⚠️ Limited | ✅ Good | ✅ Excellent |
| Test generation & running | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes |
| Codebase understanding | ✅ Good | ✅ Good | ✅ Good | ✅ Excellent | ✅ Excellent | ✅ Good |
| Terminal command execution | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Yes | ✅ Native | ✅ Yes |
| Session memory | ⚠️ Limited | ⚠️ Limited | ✅ Excellent | ✅ Good (Kilo best) | ⚠️ Session | ⚠️ Limited |
| Browser testing | ❌ No | ❌ No | ❌ No | ✅ Yes (Cline) | ❌ No | ❌ No |
| MCP support | ⚠️ Limited | ❌ No | ❌ No | ✅ Full | ❌ No | ❌ No |
| Plan/Act separation | ❌ No | ❌ No | ❌ No | ✅ Yes | ❌ No | ❌ No |
Benchmark Performance
On SWE-bench Verified (higher is better):
| Tool/Model | Score | Notes |
|---|---|---|
| Claude Code | 80.9% | State-of-the-art |
| Cursor (Claude) | 78.2% | With Composer |
| GitHub Copilot | 72.5% | Agent mode |
| Windsurf | 71.8% | With Cascade |
| Cline (Claude Sonnet 4) | 72.7% | Depends on model used |
| Qwen 2.5 Coder 32B | 69.6% | Open-source local |
| OpenAI Codex | 69.1% | GPT-5.2-Codex |
| Gemini CLI | 68.4% | Gemini 3 Flash |
Open-Source Tool Comparison
| Feature | Cline | Roo Code | Kilo Code | Aider |
|---|---|---|---|---|
| Base | Original | Cline fork | Roo fork | Original |
| Plan/Act modes | ✅ Yes | ✅ Yes | ✅ Yes | ❌ No |
| Multi-mode personas | ❌ No | ✅ Yes | ✅ Yes | ✅ Architect mode |
| Memory Bank | ❌ No | ⚠️ Basic | ✅ Full | ❌ No |
| Orchestrator | ❌ No | ⚠️ Boomerang | ✅ Full | ❌ No |
| Browser use | ✅ Yes | ✅ Yes | ✅ Yes | ❌ No |
| MCP support | ✅ Full | ✅ Full | ✅ Full | ❌ No |
| Git integration | ⚠️ Basic | ⚠️ Basic | ⚠️ Basic | ✅ Native |
| Free credits | ❌ No | ❌ No | ✅ $20 | ❌ No |
| Marketplace installs | 2M+ | 900K+ | 750K+ | CLI only |
Cost Analysis
For a typical developer (100 requests/day, 20 working days/month):
| Tool | Monthly Cost | Cost per Request |
|---|---|---|
| Cursor Pro | $20 | ~$0.01 |
| Copilot Individual | $10 | ~$0.005 |
| Windsurf Pro | $15 | ~$0.0075 |
| Claude Code | ~$30-60 | ~$0.015-0.03 |
| Gemini CLI (Free) | $0 | $0 |
Choosing the Right Tool
Decision Framework
START
│
├─ Do you need GitHub integration (PRs, issues)?
│ └─ YES → GitHub Copilot
│
├─ Do you work in terminal primarily?
│ └─ YES → Claude Code or Gemini CLI
│
├─ Do you need session memory across days?
│ └─ YES → Windsurf
│
├─ Do you do complex multi-file refactoring?
│ └─ YES → Cursor or Claude Code
│
├─ Are you budget-constrained?
│ └─ YES → Gemini CLI (free) or Copilot ($10)
│
└─ DEFAULT → Cursor (best all-around)
Best Tool by Use Case
| Use Case | Recommended | Why |
|---|---|---|
| Daily coding | Cursor | Fast autocomplete + Composer |
| Open source contributions | Copilot | PR agent, GitHub integration |
| Large codebase refactoring | Claude Code | 200K context, deep understanding |
| Team collaboration | Copilot or Windsurf | Team features, shared context |
| Budget-conscious | Gemini CLI | Free tier is generous |
| DevOps/Infrastructure | Claude Code | Understands K8s, Terraform |
| Learning/Students | Copilot Free | Best free tier for IDE |
Multi-Tool Strategy
Many teams use multiple tools strategically:
# Recommended multi-tool setup
daily_development:
primary: Cursor
reason: "Fast autocomplete, Composer for features"
code_review:
primary: Claude Code
reason: "Deep codebase understanding for reviews"
pull_requests:
primary: GitHub Copilot
reason: "Native PR creation and GitHub integration"
exploration:
primary: Gemini CLI
reason: "Free, multimodal for architecture diagrams"
Advanced Tips
Cursor Power Tips
// 1. Use multi-cursor with AI
// Select multiple similar patterns, trigger AI to transform all
// 2. Chain Composer commands
/*
Composer:
1. Create a new API endpoint for /products
2. Add validation using Zod
3. Write integration tests
4. Update OpenAPI spec
*/
// 3. Use @codebase for project-wide context
// "@codebase What's the pattern for error handling here?"
Copilot Power Tips
# 1. Use workspace agents
gh copilot workspace "Add dark mode to the entire app"
# 2. Inline chat with /commands
# Select code, then:
# /explain - Understand code
# /fix - Fix bugs
# /simplify - Reduce complexity
# /optimize - Improve performance
# 3. Use Copilot in commits
git commit # Copilot suggests message based on diff
Claude Code Power Tips
# 1. Use --depth for analysis depth
claude-code ask --depth deep "Analyze security vulnerabilities"
# 2. Chain commands with pipes
claude-code explain src/auth.ts | claude-code task "Add the missing error handling"
# 3. Use templates for common tasks
claude-code task --template api-endpoint "Create /users/search"
Building Custom Integrations
VS Code Extension API
import * as vscode from 'vscode';
export function activate(context: vscode.ExtensionContext) {
// Register custom AI command
const disposable = vscode.commands.registerCommand(
'myExtension.aiRefactor',
async () => {
const editor = vscode.window.activeTextEditor;
if (!editor) return;
const selection = editor.document.getText(editor.selection);
// Call your preferred AI API
const response = await callAI({
prompt: `Refactor this code for better readability:\n\n${selection}`,
model: 'claude-sonnet-4'
});
// Replace selection with AI response
editor.edit(editBuilder => {
editBuilder.replace(editor.selection, response.code);
});
}
);
context.subscriptions.push(disposable);
}
async function callAI(params: { prompt: string; model: string }) {
// Implement API call to Claude, OpenAI, etc.
const response = await fetch('https://api.anthropic.com/v1/messages', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': process.env.ANTHROPIC_API_KEY!,
'anthropic-version': '2024-01-01'
},
body: JSON.stringify({
model: params.model,
max_tokens: 4096,
messages: [{ role: 'user', content: params.prompt }]
})
});
return response.json();
}
CLI Wrapper
#!/usr/bin/env python3
"""
Custom AI coding assistant CLI that combines multiple providers.
"""
import click
import os
from anthropic import Anthropic
from openai import OpenAI
anthropic = Anthropic()
openai = OpenAI()
@click.group()
def cli():
"""Multi-provider AI coding assistant."""
pass
@cli.command()
@click.argument('file')
@click.option('--provider', '-p', default='claude', help='AI provider')
def explain(file: str, provider: str):
"""Explain a code file."""
with open(file, 'r') as f:
code = f.read()
prompt = f"Explain this code in detail:\n\n```\n{code}\n```"
if provider == 'claude':
response = anthropic.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
messages=[{"role": "user", "content": prompt}]
)
click.echo(response.content[0].text)
elif provider == 'openai':
response = openai.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": prompt}]
)
click.echo(response.choices[0].message.content)
@cli.command()
@click.argument('instruction')
@click.argument('files', nargs=-1)
@click.option('--dry-run', is_flag=True, help='Show changes without applying')
def edit(instruction: str, files: tuple, dry_run: bool):
"""Edit files based on instruction."""
for file in files:
with open(file, 'r') as f:
original = f.read()
prompt = f"""Edit this code according to the instruction.
Instruction: {instruction}
Code:
{original}
Return only the modified code, no explanations."""
response = anthropic.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
messages=[{"role": "user", "content": prompt}]
)
new_code = response.content[0].text
if dry_run:
click.echo(f"=== Changes for {file} ===")
# Show diff
import difflib
diff = difflib.unified_diff(
original.splitlines(keepends=True),
new_code.splitlines(keepends=True),
fromfile=f'{file} (original)',
tofile=f'{file} (modified)'
)
click.echo(''.join(diff))
else:
with open(file, 'w') as f:
f.write(new_code)
click.echo(f"Updated {file}")
@cli.command()
@click.argument('description')
@click.option('--output', '-o', default='.', help='Output directory')
def generate(description: str, output: str):
"""Generate code from description."""
prompt = f"""Generate production-ready code for:
{description}
Include:
- TypeScript types
- Error handling
- JSDoc comments
- Unit tests
Format response as:
=== filename.ts ===
<code>
=== filename.test.ts ===
<code>
"""
response = anthropic.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=8192,
messages=[{"role": "user", "content": prompt}]
)
content = response.content[0].text
# Parse and create files
import re
files = re.split(r'=== (.+?) ===', content)
for i in range(1, len(files), 2):
filename = files[i].strip()
code = files[i + 1].strip()
filepath = os.path.join(output, filename)
os.makedirs(os.path.dirname(filepath) or '.', exist_ok=True)
with open(filepath, 'w') as f:
f.write(code)
click.echo(f"Created {filepath}")
if __name__ == '__main__':
cli()
Future Outlook
2025 Trends
- Deeper autonomy: Agents that can handle entire feature requests end-to-end
- Better context: 1M+ token context becoming standard
- Specialized models: Fine-tuned models for specific frameworks/languages
- Voice integration: Code by talking to your assistant
- Multi-agent collaboration: Multiple AI agents working together
What to Watch
- OpenAI Codex successor: GPT-5's coding capabilities
- Claude Code improvements: Longer context, faster inference
- Cursor evolution: More agentic capabilities
- Open source alternatives: StarCoder 2, CodeLlama improvements
Conclusion
AI coding assistants have transformed software development. The choice between tools depends on your specific needs:
- Cursor for power users wanting the best IDE experience
- GitHub Copilot for teams deeply integrated with GitHub
- Windsurf for those who value session continuity
- Claude Code for complex codebases and terminal workflows
- Gemini CLI for budget-conscious developers and Google ecosystem
Most developers will benefit from trying multiple tools and potentially using them in combination. The productivity gains from AI coding assistants—30-50% reported by most users—make them essential for modern development.
Frequently Asked Questions
Related Articles
LLM Frameworks: LangChain, LlamaIndex, LangGraph, and Beyond
A comprehensive comparison of LLM application frameworks—LangChain, LlamaIndex, LangGraph, Haystack, and alternatives. When to use each, how to combine them, and practical implementation patterns.
Building Agentic AI Systems: A Complete Implementation Guide
A comprehensive guide to building AI agents—tool use, ReAct pattern, planning, memory, context management, MCP integration, and multi-agent orchestration. With full prompt examples and production patterns.
LLM Inference Optimization: From Quantization to Speculative Decoding
A comprehensive guide to optimizing LLM inference for production—covering quantization, attention optimization, batching strategies, and deployment frameworks.