Agentic AI Agents Architecture Open Source LLMs Infrastructure

OpenClaw Architecture Deep Dive: How a Personal AI Assistant Actually Works

Q: Can OpenClaw run multiple agents simultaneously?

Yes. The `agents.list` configuration defines multiple agents, each with its own workspace, model, skills, tool policies, and session scope. Agents are independent—they have separate memory, separate session transcripts, and can connect to different channels. You might run a professional agent on Slack with access to your work tools and a personal agent on WhatsApp with a casual personality and access to your calendar. Each agent gets its own workspace at `~/.openclaw/agents/ /` with independent SOUL.md, MEMORY.md, and skills directories.

A technical deep dive into OpenClaw's architecture—hub-and-spoke gateway, agent runtime loop, pluggable channel adapters, markdown-based memory, skills system, Docker-sandboxed tool execution, Canvas rendering, and multi-model failover with auth rotation.

February 16, 20268 min read

Why Architecture Matters for Personal AI

Building an AI chatbot is straightforward. Building an always-on personal AI assistant that connects to a dozen messaging platforms, maintains persistent memory, executes tools safely, supports multiple LLM providers with failover, and scales from a Raspberry Pi to a multi-user deployment—that requires serious architecture.

OpenClaw is the fastest-growing open-source AI project in history (165,000+ GitHub stars in two months), and its architecture is a significant reason why. The system is built as a modular, event-driven platform where each subsystem—messaging, agent runtime, memory, skills, tools, and model integration—operates independently but composes into a cohesive whole.

This post examines every major subsystem in OpenClaw's TypeScript codebase. We'll trace the full lifecycle of a message from arrival on WhatsApp to agent reasoning to response delivery, and we'll explore how each component is designed for reliability, extensibility, and privacy. For the history and overview, see OpenClaw: The Open-Source AI Assistant That Stormed the Internet.

What this covers:

Gateway WebSocket control plane and message routing
Channel adapter pattern and the ChannelPlugin interface
Agent runtime loop (context assembly, model invocation, tool execution, persistence)
Session management and JSONL transcript format
Memory architecture (working, short-term, long-term, vector search)
Skills system and ClawHub integration
Tool execution and Docker sandboxing
Canvas live rendering surface
Multi-model integration with auth rotation and failover
Security model and threat defense layers

The Big Picture: System Architecture

Before examining individual subsystems, here's how everything fits together:

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                         MESSAGING CHANNELS                              │
│  WhatsApp  Telegram  Slack  Discord  Signal  iMessage  Teams  Web      │
│  (Baileys) (grammY)  (Bolt) (djs)   (cli)  (BlueBub) (API)  (HTTP)   │
└────────┬──────┬────────┬──────┬───────┬───────┬────────┬───────┬───────┘
         │      │        │      │       │       │        │       │
         ▼      ▼        ▼      ▼       ▼       ▼        ▼       ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                    GATEWAY WEBSOCKET CONTROL PLANE                       │
│                         (127.0.0.1:18789)                               │
│                                                                         │
│  ┌─────────────┐  ┌──────────────┐  ┌──────────────┐  ┌────────────┐  │
│  │  Channel    │  │   Session    │  │    Auth &    │  │  Heartbeat │  │
│  │  Managers   │  │   Router     │  │   Pairing    │  │   Daemon   │  │
│  └──────┬──────┘  └──────┬───────┘  └──────────────┘  └─────┬──────┘  │
│         │                │                                    │         │
│         ▼                ▼                                    ▼         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │                     AGENT RUNTIME (Pi Agent Core)               │   │
│  │                                                                 │   │
│  │  ┌───────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────┐  │   │
│  │  │  Context  │  │  Model   │  │   Tool   │  │    State     │  │   │
│  │  │  Assembly │  │ Invocatn │  │ Execution│  │ Persistence  │  │   │
│  │  └───────────┘  └──────────┘  └──────────┘  └──────────────┘  │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│         │                │                │                             │
└─────────┼────────────────┼────────────────┼─────────────────────────────┘
          │                │                │
          ▼                ▼                ▼
┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐
│   Workspace  │  │ LLM Providers│  │    Docker    │  │    Canvas    │
│              │  │              │  │   Sandbox    │  │    Host      │
│ • SOUL.md   │  │ • Cloud APIs │  │              │  │  (port 18793)│
│ • MEMORY.md │  │ • Self-hosted│  │ • Shell exec │  │              │
│ • Sessions  │  │ • Proxies    │  │ • Browser    │  │ • A2UI JSON  │
│ • Skills    │  │ • Enterprise │  │ • File I/O   │  │ • WebSocket  │
│ • Daily logs│  │ • Local LLMs │  │              │  │ • Live UI    │
│             │  │ • 12+ total  │  │              │  │              │
└──────────────┘  └──────────────┘  └──────────────┘  └──────────────┘

The architecture follows a clear data flow: messages arrive from external platforms, get normalized by channel adapters, route through the gateway to the agent runtime, and produce responses that flow back through the same path. Every component communicates through the gateway's WebSocket control plane.

The Gateway: Hub-and-Spoke Message Routing

The gateway is OpenClaw's central nervous system. Implemented in src/gateway/server.impl.ts, it's a WebSocket server bound to 127.0.0.1:18789 by default that owns all session state, manages channel connections, and coordinates the agent runtime.

WebSocket Control Plane

All communication flows through a single WebSocket endpoint using a request-response protocol with event streaming:

TypeScript

// Client sends a request
{ type: "req", id: "abc123", method: "agent", params: { sessionKey: "...", message: "..." } }

// Gateway sends a response
{ type: "res", id: "abc123", ok: true, payload: { runId: "...", acceptedAt: "..." } }

// Gateway streams events
{ type: "event", event: "agent", payload: { stream: "assistant", delta: "..." }, seq: 1 }
{ type: "event", event: "agent", payload: { stream: "tool", name: "read", ... }, seq: 2 }
{ type: "event", event: "agent", payload: { phase: "end" }, seq: 3 }

Clients connect with a role: operator (CLI, macOS app, web UI) or node (iOS/Android devices providing camera, location, and notification capabilities). The gateway authenticates connections via optional token-based auth (OPENCLAW_GATEWAY_TOKEN environment variable) with challenge-nonce signing for remote connections.

Message Routing

The gateway's routing logic handles a surprising amount of complexity:

Code

┌────────────────────────────────────────────────────────────────┐
│                    MESSAGE ROUTING                              │
│                                                                │
│  Inbound Message                                               │
│       │                                                        │
│       ▼                                                        │
│  ┌──────────┐    ┌──────────┐    ┌──────────────────────┐     │
│  │ Channel  │───►│  Pairing │───►│  Session Key          │     │
│  │ Adapter  │    │  Check   │    │  Resolution            │     │
│  │          │    │          │    │                        │     │
│  │ Normalize│    │ Allowed? │    │ agent:<id>:<scope>    │     │
│  │ message  │    │ Paired?  │    │                        │     │
│  └──────────┘    └──────────┘    └──────────┬─────────────┘     │
│                                              │                  │
│                                              ▼                  │
│                                   ┌──────────────────┐         │
│                                   │   Session Lane   │         │
│                                   │   Queue          │         │
│                                   │                  │         │
│                                   │  collect/process │         │
│                                   └────────┬─────────┘         │
│                                            │                   │
│                                            ▼                   │
│                                   ┌──────────────────┐         │
│                                   │   Agent Runtime  │         │
│                                   └──────────────────┘         │
└────────────────────────────────────────────────────────────────┘

When a message arrives, the channel adapter normalizes it into a common internal format. The pairing system checks whether the sender is authorized—new senders must be explicitly approved through a pairing code unless the channel is configured with an open DM policy. The session router determines which session the message belongs to based on the configured scope (main, per-peer, per-channel-peer, or per-account-channel-peer). Finally, the message queues into the session lane for agent processing.

Gateway Methods

The gateway exposes a comprehensive API through named methods (src/gateway/server-methods.ts):

health — Gateway status, model health, channel connectivity
status — Active channels, sessions, agent state
send — Deliver messages to specific channels
agent / agent.wait — Trigger agent turns (fire-and-forget or wait for completion)
sessions.list / sessions.reset — Session lifecycle management
cron.list / cron.pause — Scheduled job control
config.set / config.reload — Live configuration updates without restart
exec.approve — Approve sandboxed or elevated command execution

The live config reload capability is particularly valuable—you can change model providers, adjust heartbeat intervals, or modify channel settings without restarting the gateway.

Channel Adapters: The ChannelPlugin Interface

OpenClaw's multi-channel capability rests on a clean adapter pattern. Each messaging platform implements a common interface that handles the platform-specific details of connecting, receiving messages, and sending responses.

The Interface

TypeScript

interface ChannelPlugin {
  // Lifecycle
  start(): Promise<void>
  stop(): Promise<void>

  // Inbound: platform → gateway
  onMessage(handler: (msg: NormalizedMessage) => void): void

  // Outbound: gateway → platform
  send(target: ChannelTarget, message: OutboundMessage): Promise<SendResult>

  // Capabilities
  capabilities(): ChannelCapabilities
}

The NormalizedMessage type abstracts away platform differences—whether a message came from WhatsApp, Telegram, or Discord, the agent runtime sees the same structure: sender identity, message content, attachments, thread context, and channel metadata.

Channel Adapter Capabilities

Each adapter exposes different capabilities based on what the underlying platform supports:

Channel	Transport	Threading	Reactions	Attachments	Voice	Groups	Multi-Account
WhatsApp	Baileys (unofficial)	Reply quotes	Yes	Images, docs, audio	Yes	Yes	Single
Telegram	grammY framework	Forum topics	Yes	All types	Yes	Yes, forums	Multi-bot
Slack	Bolt framework	Native threads	Yes	Files, snippets	No	Channels, DMs	Multi-workspace
Discord	discord.js	Threads	Yes	Files, embeds	No	Guilds, DMs	Multi-guild
Signal	signal-cli subprocess	Reply quotes	Yes	Images, files	No	Yes	Single
iMessage	BlueBubbles HTTP	Reply quotes	Tapbacks	Images, files	No	Yes	Single
Teams	Direct API	Thread replies	Yes	Files	No	Channels, chats	Multi-tenant
WebChat	Built-in HTTP/WS	No	No	File upload	No	No	N/A

Extension channels add Matrix (with full federation), Zalo, IRC (including Twitch), Feishu/Lark, Mattermost, and Nextcloud Talk. The plugin architecture means adding a new channel requires implementing the ChannelPlugin interface—the gateway handles everything else.

Channel Configuration

Each channel is configured independently with fine-grained control:

JSON

{
  "channels": {
    "whatsapp": {
      "enabled": true,
      "allowFrom": ["+1234567890"],
      "dm": { "policy": "pairing" },
      "queueMode": "collect",
      "groups": { "allowFrom": ["*"] }
    },
    "slack": {
      "enabled": true,
      "accounts": {
        "workspace-id": {
          "token": "xoxb-...",
          "allowFrom": ["U01234567"]
        }
      }
    }
  }
}

The queueMode: "collect" setting is worth noting—it batches rapid-fire messages (common on WhatsApp) into a single agent turn rather than triggering separate runs for each message. This prevents the agent from responding to a half-typed thought.

The Agent Runtime: Pi Agent Core

The agent runtime is where messages become intelligent responses. OpenClaw uses an embedded instance of Pi Agent Core (@mariozechner/pi-coding-agent) as its AI execution engine, wrapped with OpenClaw-specific tooling, context management, and session handling.

The Agent Loop

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                        AGENT RUNTIME LOOP                               │
│                                                                         │
│  ┌──────────────────┐                                                   │
│  │  1. SESSION       │  Resolve session key → load JSONL transcript     │
│  │     RESOLUTION    │  Acquire write lock                              │
│  └────────┬─────────┘                                                   │
│           │                                                             │
│           ▼                                                             │
│  ┌──────────────────┐                                                   │
│  │  2. CONTEXT       │  Load bootstrap files (SOUL.md, IDENTITY.md,    │
│  │     ASSEMBLY      │  USER.md, MEMORY.md, HEARTBEAT.md)              │
│  │                   │  Inject skills, tool definitions                  │
│  │                   │  Compute token budget, apply compaction           │
│  └────────┬─────────┘                                                   │
│           │                                                             │
│           ▼                                                             │
│  ┌──────────────────┐                                                   │
│  │  3. MODEL         │  Send assembled context to LLM provider          │
│  │     INVOCATION    │  Stream response deltas                          │
│  │                   │  Parse tool calls from response                   │
│  └────────┬─────────┘                                                   │
│           │                                                             │
│           ├───────────── No tool calls? ──► Done (deliver response)     │
│           │                                                             │
│           ▼                                                             │
│  ┌──────────────────┐                                                   │
│  │  4. TOOL          │  Execute requested tools (read, write, exec,    │
│  │     EXECUTION     │  browser, canvas, channel actions)               │
│  │                   │  Apply tool policy (allowlist/denylist)           │
│  │                   │  Sandbox if configured                            │
│  └────────┬─────────┘                                                   │
│           │                                                             │
│           ▼                                                             │
│  ┌──────────────────┐                                                   │
│  │  5. STATE         │  Persist tool results to JSONL                   │
│  │     PERSISTENCE   │  Update session metadata                         │
│  │                   │  Check compaction threshold                       │
│  └────────┬─────────┘                                                   │
│           │                                                             │
│           └───────────── Loop back to step 3                            │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Context Assembly

Context assembly is the most nuanced phase of the agent loop. The system must construct a prompt that fits within the model's context window while including all relevant information.

The assembly process loads in priority order:

System prompt with tool definitions and behavioral rules
Bootstrap files — SOUL.md, IDENTITY.md, USER.md, TOOLS.md (trimmed to 20k characters each)
Memory — MEMORY.md (private sessions only) and today's/yesterday's daily logs
Skills — Active skills loaded from workspace, shared, and bundled directories
HEARTBEAT.md — If this is a heartbeat-triggered turn
Session transcript — The JSONL conversation history

Token budgets are computed dynamically. The system reserves 20,000 tokens by default for the model's response and tool results. If the session transcript exceeds available space, auto-compaction triggers: a silent agent turn writes durable notes to memory files, then older conversation turns are pruned. This ensures the agent never loses critical information—it just moves from working memory (context) to long-term memory (files).

Model Invocation

The runtime calls the configured LLM provider through Pi Agent Core's abstraction layer. Response streaming is handled through events:

Code

Gateway receives streaming events:
  { stream: "assistant", delta: "Let me check..." }  ← text chunks
  { stream: "tool", name: "read", input: {...} }     ← tool invocation
  { stream: "tool", result: "file contents..." }     ← tool result
  { phase: "end" }                                    ← turn complete

The streaming architecture means users see responses appearing in real-time on their messaging platform, rather than waiting for the full response to generate.

Tool Execution

When the model requests tool use, the runtime applies the tool policy (checking allowlists, denylists, and group-specific restrictions), determines whether sandboxing applies, executes the tool, and feeds the result back into the conversation for the next model turn.

State Persistence

Every turn—user messages, assistant responses, tool calls, tool results—is immediately appended to the session's JSONL file. This append-only design provides durability (a crash mid-response only loses the current incomplete turn) and streamability (new clients can replay the transcript to catch up on conversation state).

Session and Memory Management

JSONL Session Transcripts

Sessions are stored as line-delimited JSON files where each line represents a single message or event:

JSON

{"role":"user","content":"What's on my calendar today?","ts":"2026-02-16T09:00:00Z","channel":"whatsapp","peer":"alice"}
{"role":"assistant","content":"Let me check your calendar.","ts":"2026-02-16T09:00:01Z"}
{"role":"assistant","tool_calls":[{"name":"exec","input":{"command":"gcalcli today"}}],"ts":"2026-02-16T09:00:01Z"}
{"role":"tool","name":"exec","content":"09:30 Team standup\n14:00 Design review\n16:00 1:1 with Jordan","ts":"2026-02-16T09:00:03Z"}
{"role":"assistant","content":"You have three events today:\n- 09:30 Team standup\n- 14:00 Design review\n- 16:00 1:1 with Jordan","ts":"2026-02-16T09:00:04Z"}

The JSONL format is chosen deliberately: it's append-friendly (no need to parse the entire file to add an entry), crash-safe (partial writes only affect the last line), and human-readable (you can inspect sessions with standard text tools). Session metadata—token usage, update timestamps, channel origin—is tracked separately in sessions.json.

Session Scoping

Session keys determine conversation isolation:

Scope	Key Format	Use Case
`main`	`agent:<id>:main`	Single shared session across all platforms
`per-peer`	`agent:<id>:dm:<peerId>`	Separate conversation per person
`per-channel-peer`	`agent:<id>:<channel>:dm:<peerId>`	Separate per person per platform
`per-account-channel-peer`	`agent:<id>:<channel>:<account>:dm:<peerId>`	Full isolation including multi-account

For groups and channels, keys follow a similar pattern: agent:<id>:<channel>:group:<groupId>. Cron jobs and webhooks get their own session namespaces (cron:<jobId>, hook:<uuid>).

Identity links bridge sessions across platforms. If Alice messages from both Telegram and Discord, you can configure an identity link that maps both sender IDs to the same session—so context flows between platforms naturally.

Memory Architecture

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                         MEMORY ARCHITECTURE                             │
│                                                                         │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │  WORKING MEMORY (Context Window)                                │   │
│  │                                                                 │   │
│  │  Current session transcript, bootstrap files, active skills     │   │
│  │  Token-limited, auto-compacted when full                       │   │
│  │  ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │   │
│  │  Compaction triggers memory flush (silent agentic turn)        │   │
│  └──────────────────────────────┬──────────────────────────────────┘   │
│                                 │                                      │
│                                 ▼                                      │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │  SHORT-TERM MEMORY (Daily Logs)                                 │   │
│  │                                                                 │   │
│  │  memory/2026-02-16.md  ← today (loaded at session start)      │   │
│  │  memory/2026-02-15.md  ← yesterday (loaded at session start)  │   │
│  │  memory/2026-02-14.md  ← older (searchable via vector index)  │   │
│  │                                                                 │   │
│  │  Append-only daily files, auto-created by agent                │   │
│  └──────────────────────────────┬──────────────────────────────────┘   │
│                                 │                                      │
│                                 ▼                                      │
│  ┌─────────────────────────────────────────────────────────────────┐   │
│  │  LONG-TERM MEMORY                                               │   │
│  │                                                                 │   │
│  │  MEMORY.md  ← curated, persistent, loaded in private sessions │   │
│  │                                                                 │   │
│  │  Vector index (SQLite + sqlite-vec)                            │   │
│  │  ├─ Indexes MEMORY.md + all daily logs                        │   │
│  │  ├─ Embeddings via cloud APIs or local models                 │   │
│  │  ├─ Semantic search across full memory corpus                 │   │
│  │  └─ File watcher auto-reindexes on changes                   │   │
│  └─────────────────────────────────────────────────────────────────┘   │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

The three-tier design balances immediacy with durability. Working memory (the context window) gives the agent instant access to the current conversation. Short-term memory (daily logs) provides recent context without consuming the full context window. Long-term memory (MEMORY.md and vector search) enables recall across the agent's entire history.

The automatic memory flush before compaction is critical: when the context window fills up, the agent first writes important information to durable files before older conversation turns are pruned. This prevents information loss during routine context management.

An experimental QMD backend provides local-first search combining BM25 full-text search, vector similarity, and reranking—all without requiring external embedding APIs.

Skills and Extensions

Skill File Format

Skills are Markdown documents with YAML frontmatter. Here's a representative example:

Markdown

---
name: image-gen
description: Generate or edit images using a multimodal model
metadata:
  openclaw:
    requires:
      env:
        - IMAGE_API_KEY
      bins:
        - uv
    primaryEnv: IMAGE_API_KEY
user-invocable: true
---

## Image Generation

When the user asks you to generate, edit, or modify images, use the
image generation model via the `uv` tool runner.

### Generating a new image

Run the following command, replacing the prompt with the user's request:

```bash
uv run --with image-sdk image-gen.py --prompt "the user's description"
```

### Editing an existing image

If the user provides an image and asks for modifications, include the source
image path:

```bash
uv run --with image-sdk image-gen.py --edit --source /path/to/image.png --prompt "modifications"
```

### Important notes

- Always confirm the image was generated successfully before responding
- If generation fails, check that IMAGE_API_KEY is set correctly
- Large images may take 10-30 seconds to generate

The frontmatter's requires block defines gating rules: the skill only loads if IMAGE_API_KEY is set in the environment and the uv binary is on PATH. If either requirement fails, the skill silently doesn't load—no errors, no broken functionality.

Skill Loading and Precedence

Skills load from three directories in priority order:

Workspace skills (~/.openclaw/agents/<agentId>/workspace/skills/) — Highest priority, per-agent overrides
Managed skills (~/.openclaw/skills/) — Shared across agents, installed via ClawHub
Bundled skills (shipped with OpenClaw package) — Lowest priority, 54 directories covering 1Password, Bear, Discord, GitHub, and more

When multiple skills share a name, the highest-priority version wins. This layering lets you customize bundled skills without modifying the OpenClaw installation.

The Extension System

Extensions live in the extensions/ directory and provide additional channel adapters and capabilities. With 37 extension directories, the community has built adapters for Matrix, Zalo, IRC, Feishu, Mattermost, Nextcloud Talk, and more.

Extensions are loaded as plugins at gateway startup. The plugin architecture (src/plugins/) provides hooks for extending the gateway, adding tools, registering channels, and modifying agent behavior.

ClawHub

ClawHub (clawhub.ai) is the central registry for skills and extensions. Installation is a single command:

Bash

clawhub install <skill-slug>
clawhub update --all
clawhub sync --all

The registry handles versioning, dependency resolution, and updates. It creates a network effect similar to npm or VS Code's extension marketplace—but for AI agent capabilities.

Tool Execution and Sandboxing

Built-in Tools

OpenClaw's tool system provides the agent with the ability to interact with the world. The core tools (src/agents/pi-tools.ts, agents/bash-tools.ts):

File operations:

read — Read file contents (text or binary)
write — Create or overwrite files
edit — Apply semantic diffs via apply_patch

Shell execution:

exec — Run commands with PTY support, optional approval gates
process — Background process management
cd — Change working directory (per-session state)

Browser automation:

browser.action — Click, type, navigate, wait
browser.snapshot — Capture screenshots (integrates with vision models)

Canvas:

canvas.push — Send A2UI content to the live canvas
canvas.reset — Clear canvas
canvas.eval — Execute JavaScript on the canvas

Channel actions:

send — Route messages to specific channels
Platform-specific tools for Discord guild actions, WhatsApp contacts, Slack reactions

Node tools (for connected mobile devices):

camera.snap — Take photos
screen.record — Record screen
location.get — Get device GPS location
notify — Send push notifications

Docker Sandboxing

For security-sensitive deployments, OpenClaw supports Docker-based sandboxing for tool execution:

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                        EXECUTION SANDBOXING                             │
│                                                                         │
│  ┌─────────────────────────────────┐  ┌─────────────────────────────┐  │
│  │       TRUSTED ZONE              │  │     SANDBOXED ZONE          │  │
│  │       (Host)                    │  │     (Docker Container)      │  │
│  │                                 │  │                             │  │
│  │  • Gateway process              │  │  • Minimal Debian base     │  │
│  │  • Agent runtime                │  │  • Non-root user           │  │
│  │  • Session state                │  │  • Tool execution          │  │
│  │  • Memory files                 │  │  • Browser (optional)      │  │
│  │  • Configuration                │  │  • Ephemeral filesystem    │  │
│  │                                 │  │                             │  │
│  │  Controls what enters           │  │  Workspace access:         │  │
│  │  the sandbox via:               │  │  • none (sandbox-rooted)   │  │
│  │  • Tool policy                  │  │  • ro (read-only)          │  │
│  │  • Approval gates               │  │  • rw (read-write)         │  │
│  │  • Bind mounts                  │  │                             │  │
│  └─────────────────────────────────┘  └─────────────────────────────┘  │
│                                                                         │
│  Sandbox configuration:                                                 │
│  • mode: off | non-main | all                                          │
│  • scope: session | agent | shared                                     │
│  • workspaceAccess: none | ro | rw                                     │
│  • docker.binds: ["/src:/src:ro"]                                      │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

The sandbox uses a dedicated Dockerfile.sandbox that builds a minimal Debian image with a non-root sandbox user. The mode setting controls when sandboxing applies: non-main (default) sandboxes non-main sessions while allowing the primary session to execute on the host, all sandboxes everything, and off disables sandboxing entirely.

The scope setting controls container lifecycle: session creates a fresh container per session (most isolated), agent shares one container across all sessions for an agent, and shared uses a single container for everything (least isolated but most efficient).

Tool policy (agents/pi-tools.policy.ts) provides an additional layer of control:

JSON

{
  "agents": {
    "list": [{
      "id": "main",
      "tools": {
        "allowlist": ["read", "write", "edit", "exec", "browser.*"],
        "groupPolicy": {
          "allowlist": ["read", "send"]
        }
      }
    }]
  }
}

This configuration lets the agent read, write, edit, execute commands, and use the browser in DM conversations, but restricts it to read-only and message-sending in group chats—a sensible default that prevents the agent from running commands on behalf of group members.

Canvas: Live Agent-Controlled Rendering

Canvas provides the agent with a visual output surface beyond text. The Canvas host (src/canvas-host/server.ts) runs as an HTTP server on port 18793, serving a static HTML page that maintains a WebSocket connection for real-time updates.

The agent controls Canvas through four tools:

canvas.push — Sends an A2UI (Agent-to-UI) JSON payload that renders as interactive UI. A2UI supports forms, buttons, text, images, charts, and custom layouts.
canvas.reset — Clears the canvas to a blank state.
canvas.eval — Executes arbitrary JavaScript on the canvas page, enabling dynamic behavior beyond what A2UI templates support.
canvas.snapshot — Captures the current canvas state as an image, useful for the agent to verify its own rendering or for sharing with users.

Canvas clients include the macOS app (sidebar or fullscreen view), iOS and Android native apps, and any browser pointed at the canvas host URL. Updates stream in real-time over WebSocket, so changes appear instantly.

Use cases range from practical (live dashboards, data visualization, interactive forms for structured input) to creative (game boards, drawing surfaces, collaborative whiteboards). Canvas transforms OpenClaw from a text-only assistant into something that can present rich, interactive interfaces.

Model Integration and Failover

Multi-Provider Architecture

OpenClaw supports over a dozen LLM providers through a unified abstraction layer. Model selection is configurable per-agent, and the system supports failover chains for reliability.

Provider Type	Auth	Streaming	Key Feature
Cloud LLM APIs (multiple major providers)	API key, OAuth	Yes	Primary usage, thinking/reasoning support
Enterprise cloud (managed model hosting)	IAM credentials	Yes	Enterprise integration, compliance
Model routers (multi-provider proxies)	API key	Yes	Access to 50+ models through one key
Open-source model hosts	API key	Yes	Cost-effective open models
Local model servers (Ollama, etc.)	HTTP endpoint	Yes	Fully offline operation, maximum privacy
Universal proxy layers	API key	Yes	Route to 50+ providers through one interface
Edge deployment providers	Platform token	Yes	Low-latency edge inference
Web-grounded providers	API key	Yes	Responses grounded in live web data

Auth Profile Rotation

OpenClaw supports multiple API keys per provider with intelligent rotation (src/agents/auth-profiles.ts). Keys are load-balanced round-robin during normal operation. When a key hits a rate limit or returns an error, it enters a cooldown period with exponential backoff. The system automatically rotates to the next available key.

For providers that support OAuth, token refresh is handled automatically—tokens are refreshed before expiry, with graceful fallback to other keys during the refresh window.

Failover Chains

Model failover ensures the agent stays responsive even when a primary provider has issues:

JSON

{
  "agents": {
    "list": [{
      "id": "main",
      "model": {
        "primary": "provider-a/model-large",
        "fallbacks": ["provider-a/model-medium", "provider-b/model-large"]
      }
    }]
  }
}

If the primary model returns an error (rate limit, server error, timeout), the runtime automatically retries with the next model in the fallback chain. This happens transparently—the user sees a response, potentially from a different model, without knowing about the failover.

Provider-specific streaming parameters are handled through extra-params.ts, which maps the unified configuration to each provider's API format—token budgets, streaming completion tokens, and content deltas are all translated automatically.

Security Model

OpenClaw's security is designed in layers, each addressing a different threat vector.

Defense Layers

Layer 1: DM Pairing (Channel-level)

New senders must be explicitly approved before they can interact with the agent. In pairing mode (the default), unknown senders receive a pairing code that the owner must approve through the CLI or native app. This prevents random contacts from consuming API credits or accessing the agent's capabilities.

Layer 2: Channel Allowlists (Session-level)

Per-channel allowFrom lists restrict which users or groups can reach the agent. Combined with per-agent routing, this enables multi-tenant deployments where different agents serve different audiences.

Layer 3: Tool Policy (Agent-level)

Allowlists and denylists control which tools each agent can use, with separate policies for DM versus group contexts. A conservative default: full tool access in DMs, read-only in groups.

Layer 4: Sandboxing (Execution-level)

Docker containers isolate tool execution from the host system. Configurable workspace access (none, read-only, read-write) limits what sandboxed tools can see.

Layer 5: Approval Gates (User interaction)

Elevated tool executions (like running shell commands) can require explicit user approval before proceeding. The approval manager tracks pending requests and supports approval from any connected client.

Security Comparison

Aspect	OpenClaw (Self-Hosted)	Cloud AI Assistants
Data location	Your hardware, your network	Provider's servers
Message transit	Direct to messaging APIs	Through provider's infrastructure
Memory storage	Local Markdown files	Cloud databases (opaque)
Execution control	Full (sandbox, policy, approval)	Provider-determined
Audit trail	Full JSONL transcripts on disk	Provider logs (limited access)
Model provider	Your choice, changeable anytime	Locked to provider
Access control	Pairing + allowlists + tool policy	Account-based
Code inspection	Full source available	Closed source

The threat model documentation (docs/security/THREAT-MODEL-ATLAS.md) covers additional attack vectors: inbound DM injection (where a malicious sender tries to manipulate the agent through crafted messages), channel-based privilege escalation (exploiting differences between DM and group policies), and tool misuse (the agent being tricked into harmful actions). The layered defense approach means compromising any single layer doesn't grant full access.

Sources

Project and Documentation

OpenClaw — Official website
OpenClaw GitHub Repository — Full source code and documentation
OpenClaw Documentation — Getting started, architecture, and configuration guides

Dependencies

Baileys — WhatsApp Web API library
grammY — Telegram Bot framework
BlueBubbles — iMessage bridge
sqlite-vec — SQLite vector search extension used for memory indexing

Ecosystem

ClawHub — Skills and extensions marketplace

Frequently Asked Questions

OpenClaw's gateway is purpose-built for bidirectional, real-time AI agent communication—not HTTP request-response cycles. It uses WebSocket as its primary transport, supports event streaming (so users see the agent thinking in real-time), manages session state directly (rather than delegating to a separate store), and handles the unique concerns of multi-channel messaging like message batching, platform-specific formatting, and DM pairing. A traditional API gateway routes requests to backends; OpenClaw's gateway orchestrates an always-on agent system.

JSONL provides several properties that matter for agent sessions. It's append-only, which means writes are crash-safe—a power failure mid-write only corrupts the last incomplete line. It's human-readable, so you can inspect sessions with standard text tools. It's streamable, so new clients can replay transcripts efficiently. And it avoids database overhead—no connection pooling, no schema migrations, no query optimization. For the access patterns that agent sessions exhibit (sequential append, full-scan replay, occasional truncation during compaction), JSONL is a surprisingly good fit. The tradeoff is that random access to specific messages is slower, but that's rarely needed.

Three mechanisms prevent spam. First, when the agent determines nothing needs attention, it responds with the literal string HEARTBEAT_OK, which the gateway silently drops—no notification is sent. Second, active hours restrictions limit heartbeat execution to configured time windows (for example, 8 AM to midnight in your timezone). Third, the response is capped at a configurable character limit (ackMaxChars, default 300 characters), preventing the agent from generating lengthy unsolicited monologues. The combination means you only hear from Heartbeat when something genuinely warrants your attention, during hours when you're available.

Yes. The agents.list configuration defines multiple agents, each with its own workspace, model, skills, tool policies, and session scope. Agents are independent—they have separate memory, separate session transcripts, and can connect to different channels. You might run a professional agent on Slack with access to your work tools and a personal agent on WhatsApp with a casual personality and access to your calendar. Each agent gets its own workspace at ~/.openclaw/agents/<agentId>/ with independent SOUL.md, MEMORY.md, and skills directories.

When the session transcript approaches the model's context window limit (minus a 20,000-token reserve for responses and tool results), the system triggers a two-phase compaction. First, a silent agent turn runs with the instruction to write any important information from the current context to durable memory files (MEMORY.md or daily logs). The agent responds with NO_REPLY to avoid delivering a message. Second, older conversation turns are pruned from the transcript. The result: the context window is freed up, but the information that was in it has been persisted to files that the agent can access via memory search. The agent's "memory" is preserved even though its "working memory" has been cleared.

Container startup adds latency to each sandboxed tool execution—typically 200-500 milliseconds for a warm container, potentially several seconds for cold starts. The scope setting mitigates this: session scope creates one container per session (warm for subsequent tool calls within the session), agent scope shares a container across sessions (warm most of the time), and shared scope uses a single container for everything (almost always warm). For most deployments, agent scope provides the best balance of isolation and performance. The non-main mode lets you skip sandboxing for your primary trusted session while still sandboxing group conversations and untrusted channels.

The session lane queue manages concurrent message delivery. When a message arrives for a session that's currently processing an agent turn, the message queues in the session's lane. The queueMode setting controls behavior: collect mode batches queued messages and delivers them together when the current turn completes (ideal for WhatsApp where users send rapid-fire messages), while the default mode processes messages sequentially. The session write lock ensures only one agent turn runs per session at a time, preventing race conditions in session state.

Adding a new channel requires implementing the ChannelPlugin interface—roughly four methods (start, stop, onMessage, send) plus a capabilities declaration. The gateway handles everything else: session routing, agent invocation, message queuing, and response delivery. The extension system has already produced adapters for Matrix, Zalo, IRC, Feishu, Mattermost, and Nextcloud Talk, demonstrating that the interface is practical for a wide range of messaging platforms. The most complex part is typically the platform-specific SDK integration rather than the OpenClaw adapter itself.

Enrico Piovano, PhD

Co-founder & CTO at Goji AI. Former Applied Scientist at Amazon (Alexa & AGI), focused on Agentic AI and LLMs. PhD in Electrical Engineering from Imperial College London. Gold Medalist at the National Mathematical Olympiad.

EducationAgentic AI

Building Agentic AI Systems: A Complete Implementation Guide

Hands-on guide to building AI agents—tool use, ReAct pattern, planning, memory, context management, MCP integration, and multi-agent orchestration. With full prompt examples and production patterns.

29 min read

Agentic AIAgents

OpenClaw: The Open-Source AI Assistant That Stormed the Internet

The story behind OpenClaw—the self-hosted AI assistant that went from zero to 165,000 GitHub stars in two months. What it is, why it went viral, and what concepts like SOUL.md, Heartbeat, and multi-channel architecture mean for the future of personal AI.

13 min read

Agentic AICoding

Cline: Deep Dive into the Open-Source AI Coding Agent

In-depth technical analysis of Cline—the open-source AI coding agent for VS Code. Understanding its agentic loop architecture, Plan/Act modes, 40+ LLM providers, Model Context Protocol integration, and how it orchestrates autonomous coding tasks with human oversight.

30 min read

Agentic AIUX Design

Human-in-the-Loop UX: Designing Control Surfaces for AI Agents

Design patterns for human oversight of AI agents—pause mechanisms, approval workflows, progressive autonomy, and the UX of agency. How to build systems where humans stay in control.

5 min read

LLMsPersonalization

LLM Memory Systems: From MemGPT to Long-Term Agent Memory

Understanding memory architectures for LLM agents—MemGPT's hierarchical memory, Letta's agent framework, and patterns for building agents that learn and remember across conversations.

30 min read

LLMsML Engineering

Conversation State Management for LLM Applications

Field guide to managing conversation state in LLM applications. Covers memory architectures, context window management, summarization strategies, long-term memory systems, and 2025 approaches including Mem0 and hierarchical memory.

13 min read

Agentic AILLMs

Building MCP Servers: Custom Tool Integrations for AI Agents

Field guide to building Model Context Protocol (MCP) servers—from basic tool exposure to production-grade integrations with authentication, streaming, and error handling.

18 min read

LLMsSecurity

LLM Application Security: Practical Defense Patterns for Production

End-to-end guide to securing LLM applications in production. Covers the OWASP Top 10 for LLMs 2025, prompt injection defense strategies, PII protection with Microsoft Presidio, guardrails with NeMo and Lakera, output validation, and defense-in-depth architecture.

20 min read

Table of Contents

Why Architecture Matters for Personal AI

The Big Picture: System Architecture

The Gateway: Hub-and-Spoke Message Routing

WebSocket Control Plane

Message Routing

Gateway Methods

Channel Adapters: The ChannelPlugin Interface

The Interface

Channel Adapter Capabilities

Channel Configuration

The Agent Runtime: Pi Agent Core

The Agent Loop

Context Assembly

Model Invocation

Tool Execution

State Persistence

Session and Memory Management

JSONL Session Transcripts

Session Scoping

Memory Architecture

Skills and Extensions

Skill File Format

Skill Loading and Precedence

The Extension System

ClawHub

Tool Execution and Sandboxing

Built-in Tools

Docker Sandboxing

Canvas: Live Agent-Controlled Rendering

Model Integration and Failover

Multi-Provider Architecture

Auth Profile Rotation

Failover Chains

Security Model

Defense Layers

Security Comparison

Sources

Frequently Asked Questions

Enrico Piovano, PhD

Related Articles

Building Agentic AI Systems: A Complete Implementation Guide

OpenClaw: The Open-Source AI Assistant That Stormed the Internet

Cline: Deep Dive into the Open-Source AI Coding Agent

Human-in-the-Loop UX: Designing Control Surfaces for AI Agents

LLM Memory Systems: From MemGPT to Long-Term Agent Memory

Conversation State Management for LLM Applications

Building MCP Servers: Custom Tool Integrations for AI Agents

LLM Application Security: Practical Defense Patterns for Production