Skip to content

Container Architecture

Overview

Aegis uses Docker Compose for orchestrating containerized services. All containers run on the Aegis LXC host and communicate via Docker networks.

Container Inventory

Running Containers

Container Image Status Purpose
aegis-dashboard aegis-core-dashboard Up 8 hours Main web interface and API
aegis-scheduler aegis-core-scheduler Up 7 days Cron jobs and background tasks
aegis-docs squidfunk/mkdocs-material Up 26 minutes Documentation site
aegis-playwright ghcr.io/vlazic/playwright-screenshot-api Up 7 days Browser automation
traefik traefik:v2.11 Up 29 minutes Reverse proxy and TLS
falkordb falkordb/falkordb Up 7 days Knowledge graph database
open-notebook lfnovo/open_notebook:v1-latest-single Up (always) Research tool
code-server lscr.io/linuxserver/code-server Up (unless-stopped) VS Code web editor

Docker Compose Stacks

Primary Stack: aegis-core

Location: /home/agent/projects/aegis-core/docker-compose.yml

Dashboard Service

Image: aegis-core-dashboard (built from Dockerfile) Base: Python 3.11-slim Exposed Ports: 8080:8080

Key Features: - FastAPI web server - Stripe integration (revenue) - WhatsApp command center (Vonage) - Email digests (Resend) - Claude Code OAuth integration

Environment Variables:

# Database
POSTGRES_HOST: host.docker.internal
POSTGRES_PORT: 5432
POSTGRES_USER: agent
POSTGRES_DB: aegis

# LLM Backends
ZAI_API_KEY: ${ZAI_API_KEY}
ZAI_BASE_URL: https://api.z.ai/api/anthropic
CLAUDE_CODE_OAUTH_TOKEN: ${CLAUDE_CODE_OAUTH_TOKEN}

# Knowledge Graph
FALKORDB_HOST: host.docker.internal
FALKORDB_PORT: 6379
OLLAMA_BASE_URL: http://host.docker.internal:11434

# Integrations
STRIPE_SECRET_KEY: ${STRIPE_SECRET_KEY}
VONAGE_API_KEY: ${VONAGE_API_KEY}
RESEND_API_KEY: ${RESEND_API_KEY}
PERPLEXITY_API_KEY: ${PERPLEXITY_API_KEY}

Volumes:

- /var/run/docker.sock:/var/run/docker.sock:ro  # Docker management
- /home/agent/memory:/home/agent/memory          # Persistent memory
- /home/agent/downloads:/home/agent/downloads    # File storage
- /home/agent/.secure:/home/agent/.secure:ro     # Credentials

Health Check:

python -c "import urllib.request; urllib.request.urlopen('http://localhost:8080/health')"

Traefik Labels:

- "traefik.enable=true"
- "traefik.http.routers.aegis.rule=Host(`aegisagent.ai`)"
- "traefik.http.routers.aegis.entrypoints=websecure"
- "traefik.http.routers.aegis.tls.certresolver=cf"
- "traefik.http.services.aegis.loadbalancer.server.port=8080"

Scheduler Service

Image: aegis-core-scheduler (built from Dockerfile) Base: Python 3.11-slim Command: Runs scheduler daemon in infinite loop

Purpose: - Cron job execution (APScheduler) - Background task processing - Periodic health checks - Morning/evening routines

Schedule Examples:

# Every 5 minutes: Check system health
@scheduler.scheduled_job('interval', minutes=5)
async def health_check():
    ...

# Daily 06:00 UTC: Morning routine
@scheduler.scheduled_job('cron', hour=6, minute=0)
async def morning_routine():
    ...

# Daily 03:00 UTC: Ingest transcripts to knowledge graph
@scheduler.scheduled_job('cron', hour=3, minute=0)
async def ingest_transcripts():
    ...

Dependencies: - Dashboard (must start first) - Playwright (for screenshots)

Playwright Service

Image: ghcr.io/vlazic/playwright-screenshot-api:latest Purpose: Headless browser for visual monitoring and screenshots Exposed Ports: 3002:3000

API Endpoints: - GET /health - Health check - POST /screenshot - Capture website screenshot

{
  "url": "https://example.com",
  "fullPage": true,
  "format": "png"
}

Use Cases: - Visual regression testing - Competitor page monitoring - Proof-of-render for debugging

FalkorDB Service

Image: falkordb/falkordb:latest Purpose: Redis-compatible graph database for Graphiti knowledge graph Exposed Ports: - 6379:6379 (Redis protocol) - 3001:3000 (Browser UI)

Command:

redis-server --loadmodule /var/lib/falkordb/bin/falkordb.so TIMEOUT 30000

Configuration: - Graph timeout: 30 seconds - Query timeout: 30 seconds - Max memory: Dynamic (no limit)

Volume: falkordb_data:/data (persistent storage)

Health Check:

redis-cli ping

Browser UI: http://localhost:3001 - Query graph with Cypher - Visualize relationships - Inspect entity properties


Traefik Stack

Location: /home/agent/stacks/traefik/docker-compose.yml

Traefik Service

Image: traefik:v2.11 Purpose: Reverse proxy with automatic TLS Exposed Ports: - 80:80 (HTTP, redirects to HTTPS) - 443:443 (HTTPS) - 8081:8080 (Dashboard)

Command Line Args:

# API Dashboard
- "--api.dashboard=true"
- "--api.insecure=true"  # Dashboard on :8080 without auth

# Docker provider (reads container labels)
- "--providers.docker=true"
- "--providers.docker.exposedbydefault=false"
- "--providers.docker.network=traefik_proxy"

# File provider (for static configs)
- "--providers.file.directory=/etc/traefik/dynamic"
- "--providers.file.watch=true"

# Entrypoints
- "--entrypoints.web.address=:80"
- "--entrypoints.web.http.redirections.entrypoint.to=websecure"
- "--entrypoints.websecure.address=:443"

# Let's Encrypt (DNS challenge via Cloudflare)
- "--certificatesresolvers.cf.acme.dnschallenge=true"
- "--certificatesresolvers.cf.acme.dnschallenge.provider=cloudflare"
- "--certificatesresolvers.cf.acme.email=aegis@richardbankole.com"
- "--certificatesresolvers.cf.acme.storage=/letsencrypt/acme.json"

# Logging
- "--log.level=INFO"

Environment Variables:

CF_DNS_API_TOKEN: ${CF_DNS_API_TOKEN}  # Cloudflare API token for DNS-01

Volumes:

- /var/run/docker.sock:/var/run/docker.sock:ro  # Read Docker state
- ./letsencrypt:/letsencrypt                    # Certificate storage
- ./dynamic:/etc/traefik/dynamic:ro             # Static routes

Network: traefik_proxy (external, shared with all services)

Dashboard Access: http://traefik.rbnk.uk (internal only)


Open Notebook Stack

Location: /home/agent/stacks/open-notebook/docker-compose.yml

Open Notebook Service

Image: lfnovo/open_notebook:v1-latest-single Purpose: AI-powered research tool (Streamlit UI + FastAPI backend) Exposed Ports: - 8502:8502 (Streamlit UI) - 5055:5055 (FastAPI backend)

Environment Variables:

# LLM Backend (Z.ai via OpenAI-compatible endpoint)
OPENAI_COMPATIBLE_BASE_URL: https://api.z.ai/v1
OPENAI_COMPATIBLE_API_KEY: ${ZAI_API_KEY}

# Internal API URL
API_URL: http://localhost:5055

# SurrealDB (embedded database)
SURREAL_URL: file:///mydata
SURREAL_USER: root
SURREAL_PASSWORD: ${SURREAL_PASSWORD}
SURREAL_NAMESPACE: notebook
SURREAL_DATABASE: research

Volumes:

- ./notebook_data:/app/data    # Research artifacts
- ./surreal_data:/mydata       # Embedded database

Traefik Labels:

# Streamlit UI
- "traefik.http.routers.notebooks-aegis.rule=Host(`notebooks.aegisagent.ai`)"
- "traefik.http.services.notebooks-aegis.loadbalancer.server.port=8502"

# FastAPI backend
- "traefik.http.routers.notebooks-api-aegis.rule=Host(`api.notebooks.aegisagent.ai`)"
- "traefik.http.services.notebooks-api-aegis.loadbalancer.server.port=5055"

Use Cases: - Long-form research synthesis - Source aggregation and citation - Multi-step research workflows - Knowledge base construction


Code Server Stack

Location: /home/agent/stacks/code-server/docker-compose.yml

Code Server Service

Image: lscr.io/linuxserver/code-server:latest Purpose: Web-based VS Code for remote development Exposed Ports: 8443:8443

Environment Variables:

PUID: 1000              # Run as user 'agent'
PGID: 1000              # Run as group 'agent'
TZ: UTC
PASSWORD: ${PASSWORD}   # Web UI password
SUDO_PASSWORD: ${SUDO_PASSWORD}
DEFAULT_WORKSPACE: /home/agent

Volumes:

- /home/agent:/home/agent                    # Full filesystem access
- ./config:/config                           # VS Code settings
- ./config/custom-cont-init.d:/custom-cont-init.d:ro  # Init scripts

Traefik Labels:

- "traefik.http.routers.code.rule=Host(`code.aegisagent.ai`)"
- "traefik.http.services.code.loadbalancer.server.port=8443"

Security Considerations: - Password-protected (stored in ~/.secure/) - HTTPS-only (via Traefik TLS) - Full filesystem access (use with caution)


Documentation Stack

Location: /home/agent/stacks/aegis-docs/docker-compose.yml

MkDocs Service

Image: squidfunk/mkdocs-material:latest Purpose: Live documentation server Exposed Ports: 8000:8000

Command:

mkdocs serve --dev-addr 0.0.0.0:8000

Volumes:

- /home/agent/projects/aegis-core/docs:/docs  # Documentation source
- /home/agent/projects/aegis-core/mkdocs.yml:/mkdocs.yml  # Config

Access: http://localhost:8000 (internal only)

Build Pipeline:

# Development
docker compose up -d

# Production build
mkdocs build -d site/


Docker Networks

traefik_proxy

Type: Bridge Subnet: 172.19.0.0/16 Driver: bridge

Purpose: Shared ingress network for all public services

Connected Services: - traefik - aegis-dashboard - aegis-scheduler - aegis-playwright - falkordb - open-notebook - code-server

Why External?

# Create network before starting services
docker network create traefik_proxy

# All services join this network
networks:
  traefik_proxy:
    external: true

Default Bridge (docker0)

Subnet: 172.17.0.0/16 Purpose: Default Docker network (not used by Aegis services)


Container Lifecycle

Build Process

Dashboard/Scheduler Images:

cd /home/agent/projects/aegis-core
docker compose build dashboard
docker compose build scheduler

Build Time: ~3 minutes (cached layers) Base Layers: 1. Python 3.11-slim 2. System dependencies (curl, git, postgresql-client) 3. Python packages (requirements.txt) 4. Application code

Dockerfile Optimizations: - Multi-stage build (future optimization) - Layer caching for dependencies - .dockerignore excludes .git, __pycache__, .venv

Deployment Workflow

Manual Deployment:

cd /home/agent/projects/aegis-core
docker compose down
docker compose build
docker compose up -d
docker compose logs -f dashboard

Health Verification:

docker ps  # Check all containers running
curl http://localhost:8080/health  # Check dashboard
docker logs aegis-dashboard --tail 50  # Check for errors

Rollback:

# Use previous image tag
docker tag aegis-core-dashboard:latest aegis-core-dashboard:backup
docker compose up -d

Resource Limits

Dashboard: - CPU: 4 cores (soft limit) - Memory: 4GB (hard limit) - Swap: 2GB

Scheduler: - CPU: 2 cores (soft limit) - Memory: 2GB (hard limit) - Swap: 1GB

FalkorDB: - CPU: 4 cores (soft limit) - Memory: 8GB (hard limit) - No swap (Redis best practice)

Configuration (docker-compose.yml):

services:
  dashboard:
    deploy:
      resources:
        limits:
          cpus: '4.0'
          memory: 4G
        reservations:
          cpus: '2.0'
          memory: 2G


Container Monitoring

Health Checks

Dashboard:

healthcheck:
  test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8080/health')"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 60s

FalkorDB:

healthcheck:
  test: ["CMD", "redis-cli", "ping"]
  interval: 30s
  timeout: 10s
  retries: 3

Playwright:

healthcheck:
  test: ["CMD", "wget", "-q", "-O", "-", "http://localhost:3000/health"]
  interval: 30s
  timeout: 10s
  retries: 3

Restart Policies

Service Policy Reason
Dashboard unless-stopped Core service, manual intervention for stops
Scheduler unless-stopped Background jobs, should always run
Traefik unless-stopped Ingress, critical for access
FalkorDB unless-stopped Stateful, needs manual inspection before restart
Open Notebook always User-facing, maximize availability
Code Server unless-stopped Dev tool, manual stop intentional

Logging

Log Driver: json-file (default) Log Rotation: - Max size: 10MB per file - Max files: 3 - Total per container: 30MB

Configuration:

logging:
  driver: json-file
  options:
    max-size: "10m"
    max-file: "3"

View Logs:

# Real-time
docker logs -f aegis-dashboard

# Last 100 lines
docker logs --tail 100 aegis-dashboard

# With timestamps
docker logs -t aegis-dashboard

# Filter by time
docker logs --since 2024-01-01T00:00:00 aegis-dashboard


Container Security

Unprivileged Containers

All containers run as non-root users where possible:

Dashboard/Scheduler:

user: root  # Required for Docker socket access

Code Server:

environment:
  - PUID=1000
  - PGID=1000

Read-Only Filesystems

Sensitive Mounts:

volumes:
  - /var/run/docker.sock:/var/run/docker.sock:ro  # Read-only Docker socket
  - /home/agent/.secure:/home/agent/.secure:ro    # Read-only secrets

Secrets Management

Stored Secrets: - Environment variables (runtime injection) - Mounted files (read-only volumes) - Never committed to git

Access Control:

chmod 600 ~/.secure/*
chown agent:agent ~/.secure/*


Container Networking Internals

Host Gateway

Problem: Containers need to reach services on LXC host (PostgreSQL, FalkorDB, Ollama)

Solution: host.docker.internal DNS name

extra_hosts:
  - "host.docker.internal:host-gateway"

Resolution: host-gateway → 10.10.10.103 (bridge gateway)

Inter-Container Communication

Within traefik_proxy network: - Service discovery via container name - Example: http://aegis-dashboard:8080 - DNS provided by Docker embedded DNS server

Between Networks: - Not allowed (isolation) - Must route through Traefik or expose ports


Troubleshooting Containers

Container Won't Start

Diagnosis:

# Check recent logs
docker logs aegis-dashboard --tail 50

# Inspect container config
docker inspect aegis-dashboard

# Check exit code
docker ps -a | grep aegis-dashboard

Common Issues: - Port already in use: docker ps | grep :8080 - Volume mount fails: Check host path exists - Image not found: Run docker compose build - Health check fails: Test endpoint manually

High Resource Usage

Diagnosis:

# Real-time stats
docker stats

# Detailed container stats
docker stats aegis-dashboard --no-stream --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}"

Mitigation: - Restart container: docker compose restart dashboard - Increase limits: Edit docker-compose.yml - Check for memory leaks: docker exec aegis-dashboard ps aux

Network Issues

Diagnosis:

# Check network connectivity
docker exec aegis-dashboard ping host.docker.internal
docker exec aegis-dashboard curl http://host.docker.internal:5432

# Inspect network
docker network inspect traefik_proxy

Common Issues: - DNS resolution fails: Restart Docker daemon - Can't reach host: Check firewall rules - Traefik routing broken: Check labels in docker ps


Container Maintenance

Image Updates

Security Patches:

# Pull latest images
docker compose pull

# Rebuild custom images
docker compose build

# Restart with new images
docker compose up -d

Update Schedule: Weekly (automated via cron)

Cleanup

Remove Dangling Images:

docker image prune -f

Remove Unused Volumes:

docker volume prune -f

Remove Stopped Containers:

docker container prune -f

Full Cleanup:

docker system prune -af --volumes

Disk Usage:

docker system df


Future Container Improvements

Q1 2026

  • Multi-stage Dockerfile builds (reduce image size)
  • Container resource limits (prevent OOM)
  • Liveness probes for all services
  • Structured logging (JSON logs)

Q2 2026

  • Kubernetes migration (Helm charts)
  • Init containers for dependency checks
  • Sidecar containers for telemetry
  • Service mesh (Istio/Linkerd)

Q3 2026

  • Immutable infrastructure (no volume mounts)
  • Blue-green deployments
  • Canary releases
  • A/B testing infrastructure