Skip to main content
Back to Blog

OpenClaw Architecture Deep Dive: How a Personal AI Assistant Actually Works

A technical deep dive into OpenClaw's architecture—hub-and-spoke gateway, agent runtime loop, pluggable channel adapters, markdown-based memory, skills system, Docker-sandboxed tool execution, Canvas rendering, and multi-model failover with auth rotation.

8 min read
Share:

Why Architecture Matters for Personal AI

Building an AI chatbot is straightforward. Building an always-on personal AI assistant that connects to a dozen messaging platforms, maintains persistent memory, executes tools safely, supports multiple LLM providers with failover, and scales from a Raspberry Pi to a multi-user deployment—that requires serious architecture.

OpenClaw is the fastest-growing open-source AI project in history (165,000+ GitHub stars in two months), and its architecture is a significant reason why. The system is built as a modular, event-driven platform where each subsystem—messaging, agent runtime, memory, skills, tools, and model integration—operates independently but composes into a cohesive whole.

This post examines every major subsystem in OpenClaw's TypeScript codebase. We'll trace the full lifecycle of a message from arrival on WhatsApp to agent reasoning to response delivery, and we'll explore how each component is designed for reliability, extensibility, and privacy. For the history and overview, see OpenClaw: The Open-Source AI Assistant That Stormed the Internet.

What this covers:

  • Gateway WebSocket control plane and message routing
  • Channel adapter pattern and the ChannelPlugin interface
  • Agent runtime loop (context assembly, model invocation, tool execution, persistence)
  • Session management and JSONL transcript format
  • Memory architecture (working, short-term, long-term, vector search)
  • Skills system and ClawHub integration
  • Tool execution and Docker sandboxing
  • Canvas live rendering surface
  • Multi-model integration with auth rotation and failover
  • Security model and threat defense layers

The Big Picture: System Architecture

Before examining individual subsystems, here's how everything fits together:

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                         MESSAGING CHANNELS                              │
│  WhatsApp  Telegram  Slack  Discord  Signal  iMessage  Teams  Web      │
│  (Baileys) (grammY)  (Bolt) (djs)   (cli)  (BlueBub) (API)  (HTTP)   │
└────────┬──────┬────────┬──────┬───────┬───────┬────────┬───────┬───────┘
         │      │        │      │       │       │        │       │
         ▼      ▼        ▼      ▼       ▼       ▼        ▼       ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                    GATEWAY WEBSOCKET CONTROL PLANE                       │
│                         (127.0.0.1:18789)                               │
│                                                                         │
│  ┌─────────────┐  ┌──────────────┐  ┌──────────────┐  ┌────────────┐  │
│  │  Channel    │  │   Session    │  │    Auth &    │  │  Heartbeat │  │
│  │  Managers   │  │   Router     │  │   Pairing    │  │   Daemon   │  │
│  └──────┬──────┘  └──────┬───────┘  └──────────────┘  └─────┬──────┘  │
│         │                │                                    │         │
│         ▼                ▼                                    ▼         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                     AGENT RUNTIME (Pi Agent Core)               │   │
│  │                                                                 │   │
│  │  ┌───────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────┐  │   │
│  │  │  Context  │  │  Model   │  │   Tool   │  │    State     │  │   │
│  │  │  Assembly │  │ Invocatn │  │ Execution│  │ Persistence  │  │   │
│  │  └───────────┘  └──────────┘  └──────────┘  └──────────────┘  │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│         │                │                │                             │
└─────────┼────────────────┼────────────────┼─────────────────────────────┘
          │                │                │
          ▼                ▼                ▼
┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│   Workspace  │  │ LLM Providers│  │    Docker    │  │    Canvas    │
│              │  │              │  │   Sandbox    │  │    Host      │
│ • SOUL.md   │  │ • Cloud APIs │  │              │  │  (port 18793)│
│ • MEMORY.md │  │ • Self-hosted│  │ • Shell exec │  │              │
│ • Sessions  │  │ • Proxies    │  │ • Browser    │  │ • A2UI JSON  │
│ • Skills    │  │ • Enterprise │  │ • File I/O   │  │ • WebSocket  │
│ • Daily logs│  │ • Local LLMs │  │              │  │ • Live UI    │
│             │  │ • 12+ total  │  │              │  │              │
└──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘

The architecture follows a clear data flow: messages arrive from external platforms, get normalized by channel adapters, route through the gateway to the agent runtime, and produce responses that flow back through the same path. Every component communicates through the gateway's WebSocket control plane.

The Gateway: Hub-and-Spoke Message Routing

The gateway is OpenClaw's central nervous system. Implemented in src/gateway/server.impl.ts, it's a WebSocket server bound to 127.0.0.1:18789 by default that owns all session state, manages channel connections, and coordinates the agent runtime.

WebSocket Control Plane

All communication flows through a single WebSocket endpoint using a request-response protocol with event streaming:

TypeScript
// Client sends a request
{ type: "req", id: "abc123", method: "agent", params: { sessionKey: "...", message: "..." } }

// Gateway sends a response
{ type: "res", id: "abc123", ok: true, payload: { runId: "...", acceptedAt: "..." } }

// Gateway streams events
{ type: "event", event: "agent", payload: { stream: "assistant", delta: "..." }, seq: 1 }
{ type: "event", event: "agent", payload: { stream: "tool", name: "read", ... }, seq: 2 }
{ type: "event", event: "agent", payload: { phase: "end" }, seq: 3 }

Clients connect with a role: operator (CLI, macOS app, web UI) or node (iOS/Android devices providing camera, location, and notification capabilities). The gateway authenticates connections via optional token-based auth (OPENCLAW_GATEWAY_TOKEN environment variable) with challenge-nonce signing for remote connections.

Message Routing

The gateway's routing logic handles a surprising amount of complexity:

Code
┌────────────────────────────────────────────────────────────────┐
│                    MESSAGE ROUTING                              │
│                                                                │
│  Inbound Message                                               │
│       │                                                        │
│       ▼                                                        │
│  ┌──────────┐    ┌──────────┐    ┌──────────────────────┐     │
│  │ Channel  │───►│  Pairing │───►│  Session Key          │     │
│  │ Adapter  │    │  Check   │    │  Resolution            │     │
│  │          │    │          │    │                        │     │
│  │ Normalize│    │ Allowed? │    │ agent:<id>:<scope>    │     │
│  │ message  │    │ Paired?  │    │                        │     │
│  └──────────┘    └──────────┘    └──────────┬─────────────┘     │
│                                              │                  │
│                                              ▼                  │
│                                   ┌──────────────────┐         │
│                                   │   Session Lane   │         │
│                                   │   Queue          │         │
│                                   │                  │         │
│                                   │  collect/process │         │
│                                   └────────┬─────────┘         │
│                                            │                   │
│                                            ▼                   │
│                                   ┌──────────────────┐         │
│                                   │   Agent Runtime  │         │
│                                   └──────────────────┘         │
└────────────────────────────────────────────────────────────────┘

When a message arrives, the channel adapter normalizes it into a common internal format. The pairing system checks whether the sender is authorized—new senders must be explicitly approved through a pairing code unless the channel is configured with an open DM policy. The session router determines which session the message belongs to based on the configured scope (main, per-peer, per-channel-peer, or per-account-channel-peer). Finally, the message queues into the session lane for agent processing.

Gateway Methods

The gateway exposes a comprehensive API through named methods (src/gateway/server-methods.ts):

  • health — Gateway status, model health, channel connectivity
  • status — Active channels, sessions, agent state
  • send — Deliver messages to specific channels
  • agent / agent.wait — Trigger agent turns (fire-and-forget or wait for completion)
  • sessions.list / sessions.reset — Session lifecycle management
  • cron.list / cron.pause — Scheduled job control
  • config.set / config.reload — Live configuration updates without restart
  • exec.approve — Approve sandboxed or elevated command execution

The live config reload capability is particularly valuable—you can change model providers, adjust heartbeat intervals, or modify channel settings without restarting the gateway.

Channel Adapters: The ChannelPlugin Interface

OpenClaw's multi-channel capability rests on a clean adapter pattern. Each messaging platform implements a common interface that handles the platform-specific details of connecting, receiving messages, and sending responses.

The Interface

TypeScript
interface ChannelPlugin {
  // Lifecycle
  start(): Promise<void>
  stop(): Promise<void>

  // Inbound: platform → gateway
  onMessage(handler: (msg: NormalizedMessage) => void): void

  // Outbound: gateway → platform
  send(target: ChannelTarget, message: OutboundMessage): Promise<SendResult>

  // Capabilities
  capabilities(): ChannelCapabilities
}

The NormalizedMessage type abstracts away platform differences—whether a message came from WhatsApp, Telegram, or Discord, the agent runtime sees the same structure: sender identity, message content, attachments, thread context, and channel metadata.

Channel Adapter Capabilities

Each adapter exposes different capabilities based on what the underlying platform supports:

ChannelTransportThreadingReactionsAttachmentsVoiceGroupsMulti-Account
WhatsAppBaileys (unofficial)Reply quotesYesImages, docs, audioYesYesSingle
TelegramgrammY frameworkForum topicsYesAll typesYesYes, forumsMulti-bot
SlackBolt frameworkNative threadsYesFiles, snippetsNoChannels, DMsMulti-workspace
Discorddiscord.jsThreadsYesFiles, embedsNoGuilds, DMsMulti-guild
Signalsignal-cli subprocessReply quotesYesImages, filesNoYesSingle
iMessageBlueBubbles HTTPReply quotesTapbacksImages, filesNoYesSingle
TeamsDirect APIThread repliesYesFilesNoChannels, chatsMulti-tenant
WebChatBuilt-in HTTP/WSNoNoFile uploadNoNoN/A

Extension channels add Matrix (with full federation), Zalo, IRC (including Twitch), Feishu/Lark, Mattermost, and Nextcloud Talk. The plugin architecture means adding a new channel requires implementing the ChannelPlugin interface—the gateway handles everything else.

Channel Configuration

Each channel is configured independently with fine-grained control:

JSON
{
  "channels": {
    "whatsapp": {
      "enabled": true,
      "allowFrom": ["+1234567890"],
      "dm": { "policy": "pairing" },
      "queueMode": "collect",
      "groups": { "allowFrom": ["*"] }
    },
    "slack": {
      "enabled": true,
      "accounts": {
        "workspace-id": {
          "token": "xoxb-...",
          "allowFrom": ["U01234567"]
        }
      }
    }
  }
}

The queueMode: "collect" setting is worth noting—it batches rapid-fire messages (common on WhatsApp) into a single agent turn rather than triggering separate runs for each message. This prevents the agent from responding to a half-typed thought.

The Agent Runtime: Pi Agent Core

The agent runtime is where messages become intelligent responses. OpenClaw uses an embedded instance of Pi Agent Core (@mariozechner/pi-coding-agent) as its AI execution engine, wrapped with OpenClaw-specific tooling, context management, and session handling.

The Agent Loop

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                        AGENT RUNTIME LOOP                               │
│                                                                         │
│  ┌──────────────────┐                                                   │
│  │  1. SESSION       │  Resolve session key → load JSONL transcript     │
│  │     RESOLUTION    │  Acquire write lock                              │
│  └────────┬─────────┘                                                   │
│           │                                                             │
│           ▼                                                             │
│  ┌──────────────────┐                                                   │
│  │  2. CONTEXT       │  Load bootstrap files (SOUL.md, IDENTITY.md,    │
│  │     ASSEMBLY      │  USER.md, MEMORY.md, HEARTBEAT.md)              │
│  │                   │  Inject skills, tool definitions                  │
│  │                   │  Compute token budget, apply compaction           │
│  └────────┬─────────┘                                                   │
│           │                                                             │
│           ▼                                                             │
│  ┌──────────────────┐                                                   │
│  │  3. MODEL         │  Send assembled context to LLM provider          │
│  │     INVOCATION    │  Stream response deltas                          │
│  │                   │  Parse tool calls from response                   │
│  └────────┬─────────┘                                                   │
│           │                                                             │
│           ├───────────── No tool calls? ──► Done (deliver response)     │
│           │                                                             │
│           ▼                                                             │
│  ┌──────────────────┐                                                   │
│  │  4. TOOL          │  Execute requested tools (read, write, exec,    │
│  │     EXECUTION     │  browser, canvas, channel actions)               │
│  │                   │  Apply tool policy (allowlist/denylist)           │
│  │                   │  Sandbox if configured                            │
│  └────────┬─────────┘                                                   │
│           │                                                             │
│           ▼                                                             │
│  ┌──────────────────┐                                                   │
│  │  5. STATE         │  Persist tool results to JSONL                   │
│  │     PERSISTENCE   │  Update session metadata                         │
│  │                   │  Check compaction threshold                       │
│  └────────┬─────────┘                                                   │
│           │                                                             │
│           └───────────── Loop back to step 3                            │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Context Assembly

Context assembly is the most nuanced phase of the agent loop. The system must construct a prompt that fits within the model's context window while including all relevant information.

The assembly process loads in priority order:

  1. System prompt with tool definitions and behavioral rules
  2. Bootstrap files — SOUL.md, IDENTITY.md, USER.md, TOOLS.md (trimmed to 20k characters each)
  3. Memory — MEMORY.md (private sessions only) and today's/yesterday's daily logs
  4. Skills — Active skills loaded from workspace, shared, and bundled directories
  5. HEARTBEAT.md — If this is a heartbeat-triggered turn
  6. Session transcript — The JSONL conversation history

Token budgets are computed dynamically. The system reserves 20,000 tokens by default for the model's response and tool results. If the session transcript exceeds available space, auto-compaction triggers: a silent agent turn writes durable notes to memory files, then older conversation turns are pruned. This ensures the agent never loses critical information—it just moves from working memory (context) to long-term memory (files).

Model Invocation

The runtime calls the configured LLM provider through Pi Agent Core's abstraction layer. Response streaming is handled through events:

Code
Gateway receives streaming events:
  { stream: "assistant", delta: "Let me check..." }  ← text chunks
  { stream: "tool", name: "read", input: {...} }     ← tool invocation
  { stream: "tool", result: "file contents..." }     ← tool result
  { phase: "end" }                                    ← turn complete

The streaming architecture means users see responses appearing in real-time on their messaging platform, rather than waiting for the full response to generate.

Tool Execution

When the model requests tool use, the runtime applies the tool policy (checking allowlists, denylists, and group-specific restrictions), determines whether sandboxing applies, executes the tool, and feeds the result back into the conversation for the next model turn.

State Persistence

Every turn—user messages, assistant responses, tool calls, tool results—is immediately appended to the session's JSONL file. This append-only design provides durability (a crash mid-response only loses the current incomplete turn) and streamability (new clients can replay the transcript to catch up on conversation state).

Session and Memory Management

JSONL Session Transcripts

Sessions are stored as line-delimited JSON files where each line represents a single message or event:

JSON
{"role":"user","content":"What's on my calendar today?","ts":"2026-02-16T09:00:00Z","channel":"whatsapp","peer":"alice"}
{"role":"assistant","content":"Let me check your calendar.","ts":"2026-02-16T09:00:01Z"}
{"role":"assistant","tool_calls":[{"name":"exec","input":{"command":"gcalcli today"}}],"ts":"2026-02-16T09:00:01Z"}
{"role":"tool","name":"exec","content":"09:30 Team standup\n14:00 Design review\n16:00 1:1 with Jordan","ts":"2026-02-16T09:00:03Z"}
{"role":"assistant","content":"You have three events today:\n- 09:30 Team standup\n- 14:00 Design review\n- 16:00 1:1 with Jordan","ts":"2026-02-16T09:00:04Z"}

The JSONL format is chosen deliberately: it's append-friendly (no need to parse the entire file to add an entry), crash-safe (partial writes only affect the last line), and human-readable (you can inspect sessions with standard text tools). Session metadata—token usage, update timestamps, channel origin—is tracked separately in sessions.json.

Session Scoping

Session keys determine conversation isolation:

ScopeKey FormatUse Case
mainagent:<id>:mainSingle shared session across all platforms
per-peeragent:<id>:dm:<peerId>Separate conversation per person
per-channel-peeragent:<id>:<channel>:dm:<peerId>Separate per person per platform
per-account-channel-peeragent:<id>:<channel>:<account>:dm:<peerId>Full isolation including multi-account

For groups and channels, keys follow a similar pattern: agent:<id>:<channel>:group:<groupId>. Cron jobs and webhooks get their own session namespaces (cron:<jobId>, hook:<uuid>).

Identity links bridge sessions across platforms. If Alice messages from both Telegram and Discord, you can configure an identity link that maps both sender IDs to the same session—so context flows between platforms naturally.

Memory Architecture

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                         MEMORY ARCHITECTURE                             │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │  WORKING MEMORY (Context Window)                                │   │
│  │                                                                 │   │
│  │  Current session transcript, bootstrap files, active skills     │   │
│  │  Token-limited, auto-compacted when full                       │   │
│  │  ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │   │
│  │  Compaction triggers memory flush (silent agentic turn)        │   │
│  └──────────────────────────────┬──────────────────────────────────┘   │
│                                 │                                      │
│                                 ▼                                      │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │  SHORT-TERM MEMORY (Daily Logs)                                 │   │
│  │                                                                 │   │
│  │  memory/2026-02-16.md  ← today (loaded at session start)      │   │
│  │  memory/2026-02-15.md  ← yesterday (loaded at session start)  │   │
│  │  memory/2026-02-14.md  ← older (searchable via vector index)  │   │
│  │                                                                 │   │
│  │  Append-only daily files, auto-created by agent                │   │
│  └──────────────────────────────┬──────────────────────────────────┘   │
│                                 │                                      │
│                                 ▼                                      │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │  LONG-TERM MEMORY                                               │   │
│  │                                                                 │   │
│  │  MEMORY.md  ← curated, persistent, loaded in private sessions │   │
│  │                                                                 │   │
│  │  Vector index (SQLite + sqlite-vec)                            │   │
│  │  ├─ Indexes MEMORY.md + all daily logs                        │   │
│  │  ├─ Embeddings via cloud APIs or local models                 │   │
│  │  ├─ Semantic search across full memory corpus                 │   │
│  │  └─ File watcher auto-reindexes on changes                   │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

The three-tier design balances immediacy with durability. Working memory (the context window) gives the agent instant access to the current conversation. Short-term memory (daily logs) provides recent context without consuming the full context window. Long-term memory (MEMORY.md and vector search) enables recall across the agent's entire history.

The automatic memory flush before compaction is critical: when the context window fills up, the agent first writes important information to durable files before older conversation turns are pruned. This prevents information loss during routine context management.

An experimental QMD backend provides local-first search combining BM25 full-text search, vector similarity, and reranking—all without requiring external embedding APIs.

Skills and Extensions

Skill File Format

Skills are Markdown documents with YAML frontmatter. Here's a representative example:

Markdown
---
name: image-gen
description: Generate or edit images using a multimodal model
metadata:
  openclaw:
    requires:
      env:
        - IMAGE_API_KEY
      bins:
        - uv
    primaryEnv: IMAGE_API_KEY
user-invocable: true
---

## Image Generation

When the user asks you to generate, edit, or modify images, use the
image generation model via the `uv` tool runner.

### Generating a new image

Run the following command, replacing the prompt with the user's request:

```bash
uv run --with image-sdk image-gen.py --prompt "the user's description"
```

### Editing an existing image

If the user provides an image and asks for modifications, include the source
image path:

```bash
uv run --with image-sdk image-gen.py --edit --source /path/to/image.png --prompt "modifications"
```

### Important notes

- Always confirm the image was generated successfully before responding
- If generation fails, check that IMAGE_API_KEY is set correctly
- Large images may take 10-30 seconds to generate

The frontmatter's requires block defines gating rules: the skill only loads if IMAGE_API_KEY is set in the environment and the uv binary is on PATH. If either requirement fails, the skill silently doesn't load—no errors, no broken functionality.

Skill Loading and Precedence

Skills load from three directories in priority order:

  1. Workspace skills (~/.openclaw/agents/<agentId>/workspace/skills/) — Highest priority, per-agent overrides
  2. Managed skills (~/.openclaw/skills/) — Shared across agents, installed via ClawHub
  3. Bundled skills (shipped with OpenClaw package) — Lowest priority, 54 directories covering 1Password, Bear, Discord, GitHub, and more

When multiple skills share a name, the highest-priority version wins. This layering lets you customize bundled skills without modifying the OpenClaw installation.

The Extension System

Extensions live in the extensions/ directory and provide additional channel adapters and capabilities. With 37 extension directories, the community has built adapters for Matrix, Zalo, IRC, Feishu, Mattermost, Nextcloud Talk, and more.

Extensions are loaded as plugins at gateway startup. The plugin architecture (src/plugins/) provides hooks for extending the gateway, adding tools, registering channels, and modifying agent behavior.

ClawHub

ClawHub (clawhub.ai) is the central registry for skills and extensions. Installation is a single command:

Bash
clawhub install <skill-slug>
clawhub update --all
clawhub sync --all

The registry handles versioning, dependency resolution, and updates. It creates a network effect similar to npm or VS Code's extension marketplace—but for AI agent capabilities.

Tool Execution and Sandboxing

Built-in Tools

OpenClaw's tool system provides the agent with the ability to interact with the world. The core tools (src/agents/pi-tools.ts, agents/bash-tools.ts):

File operations:

  • read — Read file contents (text or binary)
  • write — Create or overwrite files
  • edit — Apply semantic diffs via apply_patch

Shell execution:

  • exec — Run commands with PTY support, optional approval gates
  • process — Background process management
  • cd — Change working directory (per-session state)

Browser automation:

  • browser.action — Click, type, navigate, wait
  • browser.snapshot — Capture screenshots (integrates with vision models)

Canvas:

  • canvas.push — Send A2UI content to the live canvas
  • canvas.reset — Clear canvas
  • canvas.eval — Execute JavaScript on the canvas

Channel actions:

  • send — Route messages to specific channels
  • Platform-specific tools for Discord guild actions, WhatsApp contacts, Slack reactions

Node tools (for connected mobile devices):

  • camera.snap — Take photos
  • screen.record — Record screen
  • location.get — Get device GPS location
  • notify — Send push notifications

Docker Sandboxing

For security-sensitive deployments, OpenClaw supports Docker-based sandboxing for tool execution:

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                        EXECUTION SANDBOXING                             │
│                                                                         │
│  ┌─────────────────────────────────┐  ┌─────────────────────────────┐  │
│  │       TRUSTED ZONE              │  │     SANDBOXED ZONE          │  │
│  │       (Host)                    │  │     (Docker Container)      │  │
│  │                                 │  │                             │  │
│  │  • Gateway process              │  │  • Minimal Debian base     │  │
│  │  • Agent runtime                │  │  • Non-root user           │  │
│  │  • Session state                │  │  • Tool execution          │  │
│  │  • Memory files                 │  │  • Browser (optional)      │  │
│  │  • Configuration                │  │  • Ephemeral filesystem    │  │
│  │                                 │  │                             │  │
│  │  Controls what enters           │  │  Workspace access:         │  │
│  │  the sandbox via:               │  │  • none (sandbox-rooted)   │  │
│  │  • Tool policy                  │  │  • ro (read-only)          │  │
│  │  • Approval gates               │  │  • rw (read-write)         │  │
│  │  • Bind mounts                  │  │                             │  │
│  └─────────────────────────────────┘  └─────────────────────────────┘  │
│                                                                         │
│  Sandbox configuration:                                                 │
│  • mode: off | non-main | all                                          │
│  • scope: session | agent | shared                                     │
│  • workspaceAccess: none | ro | rw                                     │
│  • docker.binds: ["/src:/src:ro"]                                      │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

The sandbox uses a dedicated Dockerfile.sandbox that builds a minimal Debian image with a non-root sandbox user. The mode setting controls when sandboxing applies: non-main (default) sandboxes non-main sessions while allowing the primary session to execute on the host, all sandboxes everything, and off disables sandboxing entirely.

The scope setting controls container lifecycle: session creates a fresh container per session (most isolated), agent shares one container across all sessions for an agent, and shared uses a single container for everything (least isolated but most efficient).

Tool policy (agents/pi-tools.policy.ts) provides an additional layer of control:

JSON
{
  "agents": {
    "list": [{
      "id": "main",
      "tools": {
        "allowlist": ["read", "write", "edit", "exec", "browser.*"],
        "groupPolicy": {
          "allowlist": ["read", "send"]
        }
      }
    }]
  }
}

This configuration lets the agent read, write, edit, execute commands, and use the browser in DM conversations, but restricts it to read-only and message-sending in group chats—a sensible default that prevents the agent from running commands on behalf of group members.

Canvas: Live Agent-Controlled Rendering

Canvas provides the agent with a visual output surface beyond text. The Canvas host (src/canvas-host/server.ts) runs as an HTTP server on port 18793, serving a static HTML page that maintains a WebSocket connection for real-time updates.

The agent controls Canvas through four tools:

  • canvas.push — Sends an A2UI (Agent-to-UI) JSON payload that renders as interactive UI. A2UI supports forms, buttons, text, images, charts, and custom layouts.
  • canvas.reset — Clears the canvas to a blank state.
  • canvas.eval — Executes arbitrary JavaScript on the canvas page, enabling dynamic behavior beyond what A2UI templates support.
  • canvas.snapshot — Captures the current canvas state as an image, useful for the agent to verify its own rendering or for sharing with users.

Canvas clients include the macOS app (sidebar or fullscreen view), iOS and Android native apps, and any browser pointed at the canvas host URL. Updates stream in real-time over WebSocket, so changes appear instantly.

Use cases range from practical (live dashboards, data visualization, interactive forms for structured input) to creative (game boards, drawing surfaces, collaborative whiteboards). Canvas transforms OpenClaw from a text-only assistant into something that can present rich, interactive interfaces.

Model Integration and Failover

Multi-Provider Architecture

OpenClaw supports over a dozen LLM providers through a unified abstraction layer. Model selection is configurable per-agent, and the system supports failover chains for reliability.

Provider TypeAuthStreamingKey Feature
Cloud LLM APIs (multiple major providers)API key, OAuthYesPrimary usage, thinking/reasoning support
Enterprise cloud (managed model hosting)IAM credentialsYesEnterprise integration, compliance
Model routers (multi-provider proxies)API keyYesAccess to 50+ models through one key
Open-source model hostsAPI keyYesCost-effective open models
Local model servers (Ollama, etc.)HTTP endpointYesFully offline operation, maximum privacy
Universal proxy layersAPI keyYesRoute to 50+ providers through one interface
Edge deployment providersPlatform tokenYesLow-latency edge inference
Web-grounded providersAPI keyYesResponses grounded in live web data

Auth Profile Rotation

OpenClaw supports multiple API keys per provider with intelligent rotation (src/agents/auth-profiles.ts). Keys are load-balanced round-robin during normal operation. When a key hits a rate limit or returns an error, it enters a cooldown period with exponential backoff. The system automatically rotates to the next available key.

For providers that support OAuth, token refresh is handled automatically—tokens are refreshed before expiry, with graceful fallback to other keys during the refresh window.

Failover Chains

Model failover ensures the agent stays responsive even when a primary provider has issues:

JSON
{
  "agents": {
    "list": [{
      "id": "main",
      "model": {
        "primary": "provider-a/model-large",
        "fallbacks": ["provider-a/model-medium", "provider-b/model-large"]
      }
    }]
  }
}

If the primary model returns an error (rate limit, server error, timeout), the runtime automatically retries with the next model in the fallback chain. This happens transparently—the user sees a response, potentially from a different model, without knowing about the failover.

Provider-specific streaming parameters are handled through extra-params.ts, which maps the unified configuration to each provider's API format—token budgets, streaming completion tokens, and content deltas are all translated automatically.

Security Model

OpenClaw's security is designed in layers, each addressing a different threat vector.

Defense Layers

Layer 1: DM Pairing (Channel-level)

New senders must be explicitly approved before they can interact with the agent. In pairing mode (the default), unknown senders receive a pairing code that the owner must approve through the CLI or native app. This prevents random contacts from consuming API credits or accessing the agent's capabilities.

Layer 2: Channel Allowlists (Session-level)

Per-channel allowFrom lists restrict which users or groups can reach the agent. Combined with per-agent routing, this enables multi-tenant deployments where different agents serve different audiences.

Layer 3: Tool Policy (Agent-level)

Allowlists and denylists control which tools each agent can use, with separate policies for DM versus group contexts. A conservative default: full tool access in DMs, read-only in groups.

Layer 4: Sandboxing (Execution-level)

Docker containers isolate tool execution from the host system. Configurable workspace access (none, read-only, read-write) limits what sandboxed tools can see.

Layer 5: Approval Gates (User interaction)

Elevated tool executions (like running shell commands) can require explicit user approval before proceeding. The approval manager tracks pending requests and supports approval from any connected client.

Security Comparison

AspectOpenClaw (Self-Hosted)Cloud AI Assistants
Data locationYour hardware, your networkProvider's servers
Message transitDirect to messaging APIsThrough provider's infrastructure
Memory storageLocal Markdown filesCloud databases (opaque)
Execution controlFull (sandbox, policy, approval)Provider-determined
Audit trailFull JSONL transcripts on diskProvider logs (limited access)
Model providerYour choice, changeable anytimeLocked to provider
Access controlPairing + allowlists + tool policyAccount-based
Code inspectionFull source availableClosed source

The threat model documentation (docs/security/THREAT-MODEL-ATLAS.md) covers additional attack vectors: inbound DM injection (where a malicious sender tries to manipulate the agent through crafted messages), channel-based privilege escalation (exploiting differences between DM and group policies), and tool misuse (the agent being tricked into harmful actions). The layered defense approach means compromising any single layer doesn't grant full access.

Sources

Project and Documentation

Dependencies

  • Baileys — WhatsApp Web API library
  • grammY — Telegram Bot framework
  • BlueBubbles — iMessage bridge
  • sqlite-vec — SQLite vector search extension used for memory indexing

Ecosystem

  • ClawHub — Skills and extensions marketplace

Frequently Asked Questions

Enrico Piovano, PhD

Co-founder & CTO at Goji AI. Former Applied Scientist at Amazon (Alexa & AGI), focused on Agentic AI and LLMs. PhD in Electrical Engineering from Imperial College London. Gold Medalist at the National Mathematical Olympiad.

Related Articles

EducationAgentic AI

Building Agentic AI Systems: A Complete Implementation Guide

Hands-on guide to building AI agents—tool use, ReAct pattern, planning, memory, context management, MCP integration, and multi-agent orchestration. With full prompt examples and production patterns.

29 min read
Agentic AIAgents

OpenClaw: The Open-Source AI Assistant That Stormed the Internet

The story behind OpenClaw—the self-hosted AI assistant that went from zero to 165,000 GitHub stars in two months. What it is, why it went viral, and what concepts like SOUL.md, Heartbeat, and multi-channel architecture mean for the future of personal AI.

13 min read
Agentic AICoding

Cline: Deep Dive into the Open-Source AI Coding Agent

In-depth technical analysis of Cline—the open-source AI coding agent for VS Code. Understanding its agentic loop architecture, Plan/Act modes, 40+ LLM providers, Model Context Protocol integration, and how it orchestrates autonomous coding tasks with human oversight.

30 min read
Agentic AIUX Design

Human-in-the-Loop UX: Designing Control Surfaces for AI Agents

Design patterns for human oversight of AI agents—pause mechanisms, approval workflows, progressive autonomy, and the UX of agency. How to build systems where humans stay in control.

5 min read
LLMsPersonalization

LLM Memory Systems: From MemGPT to Long-Term Agent Memory

Understanding memory architectures for LLM agents—MemGPT's hierarchical memory, Letta's agent framework, and patterns for building agents that learn and remember across conversations.

30 min read
LLMsML Engineering

Conversation State Management for LLM Applications

Field guide to managing conversation state in LLM applications. Covers memory architectures, context window management, summarization strategies, long-term memory systems, and 2025 approaches including Mem0 and hierarchical memory.

13 min read
Agentic AILLMs

Building MCP Servers: Custom Tool Integrations for AI Agents

Field guide to building Model Context Protocol (MCP) servers—from basic tool exposure to production-grade integrations with authentication, streaming, and error handling.

18 min read
LLMsSecurity

LLM Application Security: Practical Defense Patterns for Production

End-to-end guide to securing LLM applications in production. Covers the OWASP Top 10 for LLMs 2025, prompt injection defense strategies, PII protection with Microsoft Presidio, guardrails with NeMo and Lakera, output validation, and defense-in-depth architecture.

20 min read