OpenClaw Architecture Deep Dive: How a Personal AI Assistant Actually Works
A technical deep dive into OpenClaw's architecture—hub-and-spoke gateway, agent runtime loop, pluggable channel adapters, markdown-based memory, skills system, Docker-sandboxed tool execution, Canvas rendering, and multi-model failover with auth rotation.
Table of Contents
Why Architecture Matters for Personal AI
Building an AI chatbot is straightforward. Building an always-on personal AI assistant that connects to a dozen messaging platforms, maintains persistent memory, executes tools safely, supports multiple LLM providers with failover, and scales from a Raspberry Pi to a multi-user deployment—that requires serious architecture.
OpenClaw is the fastest-growing open-source AI project in history (165,000+ GitHub stars in two months), and its architecture is a significant reason why. The system is built as a modular, event-driven platform where each subsystem—messaging, agent runtime, memory, skills, tools, and model integration—operates independently but composes into a cohesive whole.
This post examines every major subsystem in OpenClaw's TypeScript codebase. We'll trace the full lifecycle of a message from arrival on WhatsApp to agent reasoning to response delivery, and we'll explore how each component is designed for reliability, extensibility, and privacy. For the history and overview, see OpenClaw: The Open-Source AI Assistant That Stormed the Internet.
What this covers:
- Gateway WebSocket control plane and message routing
- Channel adapter pattern and the ChannelPlugin interface
- Agent runtime loop (context assembly, model invocation, tool execution, persistence)
- Session management and JSONL transcript format
- Memory architecture (working, short-term, long-term, vector search)
- Skills system and ClawHub integration
- Tool execution and Docker sandboxing
- Canvas live rendering surface
- Multi-model integration with auth rotation and failover
- Security model and threat defense layers
The Big Picture: System Architecture
Before examining individual subsystems, here's how everything fits together:
┌─────────────────────────────────────────────────────────────────────────┐
│ MESSAGING CHANNELS │
│ WhatsApp Telegram Slack Discord Signal iMessage Teams Web │
│ (Baileys) (grammY) (Bolt) (djs) (cli) (BlueBub) (API) (HTTP) │
└────────┬──────┬────────┬──────┬───────┬───────┬────────┬───────┬───────┘
│ │ │ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────────────┐
│ GATEWAY WEBSOCKET CONTROL PLANE │
│ (127.0.0.1:18789) │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌────────────┐ │
│ │ Channel │ │ Session │ │ Auth & │ │ Heartbeat │ │
│ │ Managers │ │ Router │ │ Pairing │ │ Daemon │ │
│ └──────┬──────┘ └──────┬───────┘ └──────────────┘ └─────┬──────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ AGENT RUNTIME (Pi Agent Core) │ │
│ │ │ │
│ │ ┌───────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │ │
│ │ │ Context │ │ Model │ │ Tool │ │ State │ │ │
│ │ │ Assembly │ │ Invocatn │ │ Execution│ │ Persistence │ │ │
│ │ └───────────┘ └──────────┘ └──────────┘ └──────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │ │ │ │
└─────────┼────────────────┼────────────────┼─────────────────────────────┘
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Workspace │ │ LLM Providers│ │ Docker │ │ Canvas │
│ │ │ │ │ Sandbox │ │ Host │
│ • SOUL.md │ │ • Cloud APIs │ │ │ │ (port 18793)│
│ • MEMORY.md │ │ • Self-hosted│ │ • Shell exec │ │ │
│ • Sessions │ │ • Proxies │ │ • Browser │ │ • A2UI JSON │
│ • Skills │ │ • Enterprise │ │ • File I/O │ │ • WebSocket │
│ • Daily logs│ │ • Local LLMs │ │ │ │ • Live UI │
│ │ │ • 12+ total │ │ │ │ │
└──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
The architecture follows a clear data flow: messages arrive from external platforms, get normalized by channel adapters, route through the gateway to the agent runtime, and produce responses that flow back through the same path. Every component communicates through the gateway's WebSocket control plane.
The Gateway: Hub-and-Spoke Message Routing
The gateway is OpenClaw's central nervous system. Implemented in src/gateway/server.impl.ts, it's a WebSocket server bound to 127.0.0.1:18789 by default that owns all session state, manages channel connections, and coordinates the agent runtime.
WebSocket Control Plane
All communication flows through a single WebSocket endpoint using a request-response protocol with event streaming:
// Client sends a request
{ type: "req", id: "abc123", method: "agent", params: { sessionKey: "...", message: "..." } }
// Gateway sends a response
{ type: "res", id: "abc123", ok: true, payload: { runId: "...", acceptedAt: "..." } }
// Gateway streams events
{ type: "event", event: "agent", payload: { stream: "assistant", delta: "..." }, seq: 1 }
{ type: "event", event: "agent", payload: { stream: "tool", name: "read", ... }, seq: 2 }
{ type: "event", event: "agent", payload: { phase: "end" }, seq: 3 }
Clients connect with a role: operator (CLI, macOS app, web UI) or node (iOS/Android devices providing camera, location, and notification capabilities). The gateway authenticates connections via optional token-based auth (OPENCLAW_GATEWAY_TOKEN environment variable) with challenge-nonce signing for remote connections.
Message Routing
The gateway's routing logic handles a surprising amount of complexity:
┌────────────────────────────────────────────────────────────────┐
│ MESSAGE ROUTING │
│ │
│ Inbound Message │
│ │ │
│ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────────────┐ │
│ │ Channel │───►│ Pairing │───►│ Session Key │ │
│ │ Adapter │ │ Check │ │ Resolution │ │
│ │ │ │ │ │ │ │
│ │ Normalize│ │ Allowed? │ │ agent:<id>:<scope> │ │
│ │ message │ │ Paired? │ │ │ │
│ └──────────┘ └──────────┘ └──────────┬─────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ Session Lane │ │
│ │ Queue │ │
│ │ │ │
│ │ collect/process │ │
│ └────────┬─────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ Agent Runtime │ │
│ └──────────────────┘ │
└────────────────────────────────────────────────────────────────┘
When a message arrives, the channel adapter normalizes it into a common internal format. The pairing system checks whether the sender is authorized—new senders must be explicitly approved through a pairing code unless the channel is configured with an open DM policy. The session router determines which session the message belongs to based on the configured scope (main, per-peer, per-channel-peer, or per-account-channel-peer). Finally, the message queues into the session lane for agent processing.
Gateway Methods
The gateway exposes a comprehensive API through named methods (src/gateway/server-methods.ts):
health— Gateway status, model health, channel connectivitystatus— Active channels, sessions, agent statesend— Deliver messages to specific channelsagent/agent.wait— Trigger agent turns (fire-and-forget or wait for completion)sessions.list/sessions.reset— Session lifecycle managementcron.list/cron.pause— Scheduled job controlconfig.set/config.reload— Live configuration updates without restartexec.approve— Approve sandboxed or elevated command execution
The live config reload capability is particularly valuable—you can change model providers, adjust heartbeat intervals, or modify channel settings without restarting the gateway.
Channel Adapters: The ChannelPlugin Interface
OpenClaw's multi-channel capability rests on a clean adapter pattern. Each messaging platform implements a common interface that handles the platform-specific details of connecting, receiving messages, and sending responses.
The Interface
interface ChannelPlugin {
// Lifecycle
start(): Promise<void>
stop(): Promise<void>
// Inbound: platform → gateway
onMessage(handler: (msg: NormalizedMessage) => void): void
// Outbound: gateway → platform
send(target: ChannelTarget, message: OutboundMessage): Promise<SendResult>
// Capabilities
capabilities(): ChannelCapabilities
}
The NormalizedMessage type abstracts away platform differences—whether a message came from WhatsApp, Telegram, or Discord, the agent runtime sees the same structure: sender identity, message content, attachments, thread context, and channel metadata.
Channel Adapter Capabilities
Each adapter exposes different capabilities based on what the underlying platform supports:
| Channel | Transport | Threading | Reactions | Attachments | Voice | Groups | Multi-Account |
|---|---|---|---|---|---|---|---|
| Baileys (unofficial) | Reply quotes | Yes | Images, docs, audio | Yes | Yes | Single | |
| Telegram | grammY framework | Forum topics | Yes | All types | Yes | Yes, forums | Multi-bot |
| Slack | Bolt framework | Native threads | Yes | Files, snippets | No | Channels, DMs | Multi-workspace |
| Discord | discord.js | Threads | Yes | Files, embeds | No | Guilds, DMs | Multi-guild |
| Signal | signal-cli subprocess | Reply quotes | Yes | Images, files | No | Yes | Single |
| iMessage | BlueBubbles HTTP | Reply quotes | Tapbacks | Images, files | No | Yes | Single |
| Teams | Direct API | Thread replies | Yes | Files | No | Channels, chats | Multi-tenant |
| WebChat | Built-in HTTP/WS | No | No | File upload | No | No | N/A |
Extension channels add Matrix (with full federation), Zalo, IRC (including Twitch), Feishu/Lark, Mattermost, and Nextcloud Talk. The plugin architecture means adding a new channel requires implementing the ChannelPlugin interface—the gateway handles everything else.
Channel Configuration
Each channel is configured independently with fine-grained control:
{
"channels": {
"whatsapp": {
"enabled": true,
"allowFrom": ["+1234567890"],
"dm": { "policy": "pairing" },
"queueMode": "collect",
"groups": { "allowFrom": ["*"] }
},
"slack": {
"enabled": true,
"accounts": {
"workspace-id": {
"token": "xoxb-...",
"allowFrom": ["U01234567"]
}
}
}
}
}
The queueMode: "collect" setting is worth noting—it batches rapid-fire messages (common on WhatsApp) into a single agent turn rather than triggering separate runs for each message. This prevents the agent from responding to a half-typed thought.
The Agent Runtime: Pi Agent Core
The agent runtime is where messages become intelligent responses. OpenClaw uses an embedded instance of Pi Agent Core (@mariozechner/pi-coding-agent) as its AI execution engine, wrapped with OpenClaw-specific tooling, context management, and session handling.
The Agent Loop
┌─────────────────────────────────────────────────────────────────────────┐
│ AGENT RUNTIME LOOP │
│ │
│ ┌──────────────────┐ │
│ │ 1. SESSION │ Resolve session key → load JSONL transcript │
│ │ RESOLUTION │ Acquire write lock │
│ └────────┬─────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ 2. CONTEXT │ Load bootstrap files (SOUL.md, IDENTITY.md, │
│ │ ASSEMBLY │ USER.md, MEMORY.md, HEARTBEAT.md) │
│ │ │ Inject skills, tool definitions │
│ │ │ Compute token budget, apply compaction │
│ └────────┬─────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ 3. MODEL │ Send assembled context to LLM provider │
│ │ INVOCATION │ Stream response deltas │
│ │ │ Parse tool calls from response │
│ └────────┬─────────┘ │
│ │ │
│ ├───────────── No tool calls? ──► Done (deliver response) │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ 4. TOOL │ Execute requested tools (read, write, exec, │
│ │ EXECUTION │ browser, canvas, channel actions) │
│ │ │ Apply tool policy (allowlist/denylist) │
│ │ │ Sandbox if configured │
│ └────────┬─────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ 5. STATE │ Persist tool results to JSONL │
│ │ PERSISTENCE │ Update session metadata │
│ │ │ Check compaction threshold │
│ └────────┬─────────┘ │
│ │ │
│ └───────────── Loop back to step 3 │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Context Assembly
Context assembly is the most nuanced phase of the agent loop. The system must construct a prompt that fits within the model's context window while including all relevant information.
The assembly process loads in priority order:
- System prompt with tool definitions and behavioral rules
- Bootstrap files — SOUL.md, IDENTITY.md, USER.md, TOOLS.md (trimmed to 20k characters each)
- Memory — MEMORY.md (private sessions only) and today's/yesterday's daily logs
- Skills — Active skills loaded from workspace, shared, and bundled directories
- HEARTBEAT.md — If this is a heartbeat-triggered turn
- Session transcript — The JSONL conversation history
Token budgets are computed dynamically. The system reserves 20,000 tokens by default for the model's response and tool results. If the session transcript exceeds available space, auto-compaction triggers: a silent agent turn writes durable notes to memory files, then older conversation turns are pruned. This ensures the agent never loses critical information—it just moves from working memory (context) to long-term memory (files).
Model Invocation
The runtime calls the configured LLM provider through Pi Agent Core's abstraction layer. Response streaming is handled through events:
Gateway receives streaming events:
{ stream: "assistant", delta: "Let me check..." } ← text chunks
{ stream: "tool", name: "read", input: {...} } ← tool invocation
{ stream: "tool", result: "file contents..." } ← tool result
{ phase: "end" } ← turn complete
The streaming architecture means users see responses appearing in real-time on their messaging platform, rather than waiting for the full response to generate.
Tool Execution
When the model requests tool use, the runtime applies the tool policy (checking allowlists, denylists, and group-specific restrictions), determines whether sandboxing applies, executes the tool, and feeds the result back into the conversation for the next model turn.
State Persistence
Every turn—user messages, assistant responses, tool calls, tool results—is immediately appended to the session's JSONL file. This append-only design provides durability (a crash mid-response only loses the current incomplete turn) and streamability (new clients can replay the transcript to catch up on conversation state).
Session and Memory Management
JSONL Session Transcripts
Sessions are stored as line-delimited JSON files where each line represents a single message or event:
{"role":"user","content":"What's on my calendar today?","ts":"2026-02-16T09:00:00Z","channel":"whatsapp","peer":"alice"}
{"role":"assistant","content":"Let me check your calendar.","ts":"2026-02-16T09:00:01Z"}
{"role":"assistant","tool_calls":[{"name":"exec","input":{"command":"gcalcli today"}}],"ts":"2026-02-16T09:00:01Z"}
{"role":"tool","name":"exec","content":"09:30 Team standup\n14:00 Design review\n16:00 1:1 with Jordan","ts":"2026-02-16T09:00:03Z"}
{"role":"assistant","content":"You have three events today:\n- 09:30 Team standup\n- 14:00 Design review\n- 16:00 1:1 with Jordan","ts":"2026-02-16T09:00:04Z"}
The JSONL format is chosen deliberately: it's append-friendly (no need to parse the entire file to add an entry), crash-safe (partial writes only affect the last line), and human-readable (you can inspect sessions with standard text tools). Session metadata—token usage, update timestamps, channel origin—is tracked separately in sessions.json.
Session Scoping
Session keys determine conversation isolation:
| Scope | Key Format | Use Case |
|---|---|---|
main | agent:<id>:main | Single shared session across all platforms |
per-peer | agent:<id>:dm:<peerId> | Separate conversation per person |
per-channel-peer | agent:<id>:<channel>:dm:<peerId> | Separate per person per platform |
per-account-channel-peer | agent:<id>:<channel>:<account>:dm:<peerId> | Full isolation including multi-account |
For groups and channels, keys follow a similar pattern: agent:<id>:<channel>:group:<groupId>. Cron jobs and webhooks get their own session namespaces (cron:<jobId>, hook:<uuid>).
Identity links bridge sessions across platforms. If Alice messages from both Telegram and Discord, you can configure an identity link that maps both sender IDs to the same session—so context flows between platforms naturally.
Memory Architecture
┌─────────────────────────────────────────────────────────────────────────┐
│ MEMORY ARCHITECTURE │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ WORKING MEMORY (Context Window) │ │
│ │ │ │
│ │ Current session transcript, bootstrap files, active skills │ │
│ │ Token-limited, auto-compacted when full │ │
│ │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │
│ │ Compaction triggers memory flush (silent agentic turn) │ │
│ └──────────────────────────────┬──────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ SHORT-TERM MEMORY (Daily Logs) │ │
│ │ │ │
│ │ memory/2026-02-16.md ← today (loaded at session start) │ │
│ │ memory/2026-02-15.md ← yesterday (loaded at session start) │ │
│ │ memory/2026-02-14.md ← older (searchable via vector index) │ │
│ │ │ │
│ │ Append-only daily files, auto-created by agent │ │
│ └──────────────────────────────┬──────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ LONG-TERM MEMORY │ │
│ │ │ │
│ │ MEMORY.md ← curated, persistent, loaded in private sessions │ │
│ │ │ │
│ │ Vector index (SQLite + sqlite-vec) │ │
│ │ ├─ Indexes MEMORY.md + all daily logs │ │
│ │ ├─ Embeddings via cloud APIs or local models │ │
│ │ ├─ Semantic search across full memory corpus │ │
│ │ └─ File watcher auto-reindexes on changes │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
The three-tier design balances immediacy with durability. Working memory (the context window) gives the agent instant access to the current conversation. Short-term memory (daily logs) provides recent context without consuming the full context window. Long-term memory (MEMORY.md and vector search) enables recall across the agent's entire history.
The automatic memory flush before compaction is critical: when the context window fills up, the agent first writes important information to durable files before older conversation turns are pruned. This prevents information loss during routine context management.
An experimental QMD backend provides local-first search combining BM25 full-text search, vector similarity, and reranking—all without requiring external embedding APIs.
Skills and Extensions
Skill File Format
Skills are Markdown documents with YAML frontmatter. Here's a representative example:
---
name: image-gen
description: Generate or edit images using a multimodal model
metadata:
openclaw:
requires:
env:
- IMAGE_API_KEY
bins:
- uv
primaryEnv: IMAGE_API_KEY
user-invocable: true
---
## Image Generation
When the user asks you to generate, edit, or modify images, use the
image generation model via the `uv` tool runner.
### Generating a new image
Run the following command, replacing the prompt with the user's request:
```bash
uv run --with image-sdk image-gen.py --prompt "the user's description"
```
### Editing an existing image
If the user provides an image and asks for modifications, include the source
image path:
```bash
uv run --with image-sdk image-gen.py --edit --source /path/to/image.png --prompt "modifications"
```
### Important notes
- Always confirm the image was generated successfully before responding
- If generation fails, check that IMAGE_API_KEY is set correctly
- Large images may take 10-30 seconds to generate
The frontmatter's requires block defines gating rules: the skill only loads if IMAGE_API_KEY is set in the environment and the uv binary is on PATH. If either requirement fails, the skill silently doesn't load—no errors, no broken functionality.
Skill Loading and Precedence
Skills load from three directories in priority order:
- Workspace skills (
~/.openclaw/agents/<agentId>/workspace/skills/) — Highest priority, per-agent overrides - Managed skills (
~/.openclaw/skills/) — Shared across agents, installed via ClawHub - Bundled skills (shipped with OpenClaw package) — Lowest priority, 54 directories covering 1Password, Bear, Discord, GitHub, and more
When multiple skills share a name, the highest-priority version wins. This layering lets you customize bundled skills without modifying the OpenClaw installation.
The Extension System
Extensions live in the extensions/ directory and provide additional channel adapters and capabilities. With 37 extension directories, the community has built adapters for Matrix, Zalo, IRC, Feishu, Mattermost, Nextcloud Talk, and more.
Extensions are loaded as plugins at gateway startup. The plugin architecture (src/plugins/) provides hooks for extending the gateway, adding tools, registering channels, and modifying agent behavior.
ClawHub
ClawHub (clawhub.ai) is the central registry for skills and extensions. Installation is a single command:
clawhub install <skill-slug>
clawhub update --all
clawhub sync --all
The registry handles versioning, dependency resolution, and updates. It creates a network effect similar to npm or VS Code's extension marketplace—but for AI agent capabilities.
Tool Execution and Sandboxing
Built-in Tools
OpenClaw's tool system provides the agent with the ability to interact with the world. The core tools (src/agents/pi-tools.ts, agents/bash-tools.ts):
File operations:
read— Read file contents (text or binary)write— Create or overwrite filesedit— Apply semantic diffs viaapply_patch
Shell execution:
exec— Run commands with PTY support, optional approval gatesprocess— Background process managementcd— Change working directory (per-session state)
Browser automation:
browser.action— Click, type, navigate, waitbrowser.snapshot— Capture screenshots (integrates with vision models)
Canvas:
canvas.push— Send A2UI content to the live canvascanvas.reset— Clear canvascanvas.eval— Execute JavaScript on the canvas
Channel actions:
send— Route messages to specific channels- Platform-specific tools for Discord guild actions, WhatsApp contacts, Slack reactions
Node tools (for connected mobile devices):
camera.snap— Take photosscreen.record— Record screenlocation.get— Get device GPS locationnotify— Send push notifications
Docker Sandboxing
For security-sensitive deployments, OpenClaw supports Docker-based sandboxing for tool execution:
┌─────────────────────────────────────────────────────────────────────────┐
│ EXECUTION SANDBOXING │
│ │
│ ┌─────────────────────────────────┐ ┌─────────────────────────────┐ │
│ │ TRUSTED ZONE │ │ SANDBOXED ZONE │ │
│ │ (Host) │ │ (Docker Container) │ │
│ │ │ │ │ │
│ │ • Gateway process │ │ • Minimal Debian base │ │
│ │ • Agent runtime │ │ • Non-root user │ │
│ │ • Session state │ │ • Tool execution │ │
│ │ • Memory files │ │ • Browser (optional) │ │
│ │ • Configuration │ │ • Ephemeral filesystem │ │
│ │ │ │ │ │
│ │ Controls what enters │ │ Workspace access: │ │
│ │ the sandbox via: │ │ • none (sandbox-rooted) │ │
│ │ • Tool policy │ │ • ro (read-only) │ │
│ │ • Approval gates │ │ • rw (read-write) │ │
│ │ • Bind mounts │ │ │ │
│ └─────────────────────────────────┘ └─────────────────────────────┘ │
│ │
│ Sandbox configuration: │
│ • mode: off | non-main | all │
│ • scope: session | agent | shared │
│ • workspaceAccess: none | ro | rw │
│ • docker.binds: ["/src:/src:ro"] │
│ │
└─────────────────────────────────────────────────────────────────────────┘
The sandbox uses a dedicated Dockerfile.sandbox that builds a minimal Debian image with a non-root sandbox user. The mode setting controls when sandboxing applies: non-main (default) sandboxes non-main sessions while allowing the primary session to execute on the host, all sandboxes everything, and off disables sandboxing entirely.
The scope setting controls container lifecycle: session creates a fresh container per session (most isolated), agent shares one container across all sessions for an agent, and shared uses a single container for everything (least isolated but most efficient).
Tool policy (agents/pi-tools.policy.ts) provides an additional layer of control:
{
"agents": {
"list": [{
"id": "main",
"tools": {
"allowlist": ["read", "write", "edit", "exec", "browser.*"],
"groupPolicy": {
"allowlist": ["read", "send"]
}
}
}]
}
}
This configuration lets the agent read, write, edit, execute commands, and use the browser in DM conversations, but restricts it to read-only and message-sending in group chats—a sensible default that prevents the agent from running commands on behalf of group members.
Canvas: Live Agent-Controlled Rendering
Canvas provides the agent with a visual output surface beyond text. The Canvas host (src/canvas-host/server.ts) runs as an HTTP server on port 18793, serving a static HTML page that maintains a WebSocket connection for real-time updates.
The agent controls Canvas through four tools:
canvas.push— Sends an A2UI (Agent-to-UI) JSON payload that renders as interactive UI. A2UI supports forms, buttons, text, images, charts, and custom layouts.canvas.reset— Clears the canvas to a blank state.canvas.eval— Executes arbitrary JavaScript on the canvas page, enabling dynamic behavior beyond what A2UI templates support.canvas.snapshot— Captures the current canvas state as an image, useful for the agent to verify its own rendering or for sharing with users.
Canvas clients include the macOS app (sidebar or fullscreen view), iOS and Android native apps, and any browser pointed at the canvas host URL. Updates stream in real-time over WebSocket, so changes appear instantly.
Use cases range from practical (live dashboards, data visualization, interactive forms for structured input) to creative (game boards, drawing surfaces, collaborative whiteboards). Canvas transforms OpenClaw from a text-only assistant into something that can present rich, interactive interfaces.
Model Integration and Failover
Multi-Provider Architecture
OpenClaw supports over a dozen LLM providers through a unified abstraction layer. Model selection is configurable per-agent, and the system supports failover chains for reliability.
| Provider Type | Auth | Streaming | Key Feature |
|---|---|---|---|
| Cloud LLM APIs (multiple major providers) | API key, OAuth | Yes | Primary usage, thinking/reasoning support |
| Enterprise cloud (managed model hosting) | IAM credentials | Yes | Enterprise integration, compliance |
| Model routers (multi-provider proxies) | API key | Yes | Access to 50+ models through one key |
| Open-source model hosts | API key | Yes | Cost-effective open models |
| Local model servers (Ollama, etc.) | HTTP endpoint | Yes | Fully offline operation, maximum privacy |
| Universal proxy layers | API key | Yes | Route to 50+ providers through one interface |
| Edge deployment providers | Platform token | Yes | Low-latency edge inference |
| Web-grounded providers | API key | Yes | Responses grounded in live web data |
Auth Profile Rotation
OpenClaw supports multiple API keys per provider with intelligent rotation (src/agents/auth-profiles.ts). Keys are load-balanced round-robin during normal operation. When a key hits a rate limit or returns an error, it enters a cooldown period with exponential backoff. The system automatically rotates to the next available key.
For providers that support OAuth, token refresh is handled automatically—tokens are refreshed before expiry, with graceful fallback to other keys during the refresh window.
Failover Chains
Model failover ensures the agent stays responsive even when a primary provider has issues:
{
"agents": {
"list": [{
"id": "main",
"model": {
"primary": "provider-a/model-large",
"fallbacks": ["provider-a/model-medium", "provider-b/model-large"]
}
}]
}
}
If the primary model returns an error (rate limit, server error, timeout), the runtime automatically retries with the next model in the fallback chain. This happens transparently—the user sees a response, potentially from a different model, without knowing about the failover.
Provider-specific streaming parameters are handled through extra-params.ts, which maps the unified configuration to each provider's API format—token budgets, streaming completion tokens, and content deltas are all translated automatically.
Security Model
OpenClaw's security is designed in layers, each addressing a different threat vector.
Defense Layers
Layer 1: DM Pairing (Channel-level)
New senders must be explicitly approved before they can interact with the agent. In pairing mode (the default), unknown senders receive a pairing code that the owner must approve through the CLI or native app. This prevents random contacts from consuming API credits or accessing the agent's capabilities.
Layer 2: Channel Allowlists (Session-level)
Per-channel allowFrom lists restrict which users or groups can reach the agent. Combined with per-agent routing, this enables multi-tenant deployments where different agents serve different audiences.
Layer 3: Tool Policy (Agent-level)
Allowlists and denylists control which tools each agent can use, with separate policies for DM versus group contexts. A conservative default: full tool access in DMs, read-only in groups.
Layer 4: Sandboxing (Execution-level)
Docker containers isolate tool execution from the host system. Configurable workspace access (none, read-only, read-write) limits what sandboxed tools can see.
Layer 5: Approval Gates (User interaction)
Elevated tool executions (like running shell commands) can require explicit user approval before proceeding. The approval manager tracks pending requests and supports approval from any connected client.
Security Comparison
| Aspect | OpenClaw (Self-Hosted) | Cloud AI Assistants |
|---|---|---|
| Data location | Your hardware, your network | Provider's servers |
| Message transit | Direct to messaging APIs | Through provider's infrastructure |
| Memory storage | Local Markdown files | Cloud databases (opaque) |
| Execution control | Full (sandbox, policy, approval) | Provider-determined |
| Audit trail | Full JSONL transcripts on disk | Provider logs (limited access) |
| Model provider | Your choice, changeable anytime | Locked to provider |
| Access control | Pairing + allowlists + tool policy | Account-based |
| Code inspection | Full source available | Closed source |
The threat model documentation (docs/security/THREAT-MODEL-ATLAS.md) covers additional attack vectors: inbound DM injection (where a malicious sender tries to manipulate the agent through crafted messages), channel-based privilege escalation (exploiting differences between DM and group policies), and tool misuse (the agent being tricked into harmful actions). The layered defense approach means compromising any single layer doesn't grant full access.
Sources
Project and Documentation
- OpenClaw — Official website
- OpenClaw GitHub Repository — Full source code and documentation
- OpenClaw Documentation — Getting started, architecture, and configuration guides
Dependencies
- Baileys — WhatsApp Web API library
- grammY — Telegram Bot framework
- BlueBubbles — iMessage bridge
- sqlite-vec — SQLite vector search extension used for memory indexing
Ecosystem
- ClawHub — Skills and extensions marketplace
Frequently Asked Questions
Related Articles
Building Agentic AI Systems: A Complete Implementation Guide
Hands-on guide to building AI agents—tool use, ReAct pattern, planning, memory, context management, MCP integration, and multi-agent orchestration. With full prompt examples and production patterns.
OpenClaw: The Open-Source AI Assistant That Stormed the Internet
The story behind OpenClaw—the self-hosted AI assistant that went from zero to 165,000 GitHub stars in two months. What it is, why it went viral, and what concepts like SOUL.md, Heartbeat, and multi-channel architecture mean for the future of personal AI.
Cline: Deep Dive into the Open-Source AI Coding Agent
In-depth technical analysis of Cline—the open-source AI coding agent for VS Code. Understanding its agentic loop architecture, Plan/Act modes, 40+ LLM providers, Model Context Protocol integration, and how it orchestrates autonomous coding tasks with human oversight.
Human-in-the-Loop UX: Designing Control Surfaces for AI Agents
Design patterns for human oversight of AI agents—pause mechanisms, approval workflows, progressive autonomy, and the UX of agency. How to build systems where humans stay in control.
LLM Memory Systems: From MemGPT to Long-Term Agent Memory
Understanding memory architectures for LLM agents—MemGPT's hierarchical memory, Letta's agent framework, and patterns for building agents that learn and remember across conversations.
Conversation State Management for LLM Applications
Field guide to managing conversation state in LLM applications. Covers memory architectures, context window management, summarization strategies, long-term memory systems, and 2025 approaches including Mem0 and hierarchical memory.
Building MCP Servers: Custom Tool Integrations for AI Agents
Field guide to building Model Context Protocol (MCP) servers—from basic tool exposure to production-grade integrations with authentication, streaming, and error handling.
LLM Application Security: Practical Defense Patterns for Production
End-to-end guide to securing LLM applications in production. Covers the OWASP Top 10 for LLMs 2025, prompt injection defense strategies, PII protection with Microsoft Presidio, guardrails with NeMo and Lakera, output validation, and defense-in-depth architecture.