Container Architecture¶
Overview¶
Aegis uses Docker Compose for orchestrating containerized services. All containers run on the Aegis LXC host and communicate via Docker networks.
Container Inventory¶
Running Containers¶
| Container | Image | Status | Purpose |
|---|---|---|---|
| aegis-dashboard | aegis-core-dashboard | Up 8 hours | Main web interface and API |
| aegis-scheduler | aegis-core-scheduler | Up 7 days | Cron jobs and background tasks |
| aegis-docs | squidfunk/mkdocs-material | Up 26 minutes | Documentation site |
| aegis-playwright | ghcr.io/vlazic/playwright-screenshot-api | Up 7 days | Browser automation |
| traefik | traefik:v2.11 | Up 29 minutes | Reverse proxy and TLS |
| falkordb | falkordb/falkordb | Up 7 days | Knowledge graph database |
| open-notebook | lfnovo/open_notebook:v1-latest-single | Up (always) | Research tool |
| code-server | lscr.io/linuxserver/code-server | Up (unless-stopped) | VS Code web editor |
Docker Compose Stacks¶
Primary Stack: aegis-core¶
Location: /home/agent/projects/aegis-core/docker-compose.yml
Dashboard Service¶
Image: aegis-core-dashboard (built from Dockerfile)
Base: Python 3.11-slim
Exposed Ports: 8080:8080
Key Features: - FastAPI web server - Stripe integration (revenue) - WhatsApp command center (Vonage) - Email digests (Resend) - Claude Code OAuth integration
Environment Variables:
# Database
POSTGRES_HOST: host.docker.internal
POSTGRES_PORT: 5432
POSTGRES_USER: agent
POSTGRES_DB: aegis
# LLM Backends
ZAI_API_KEY: ${ZAI_API_KEY}
ZAI_BASE_URL: https://api.z.ai/api/anthropic
CLAUDE_CODE_OAUTH_TOKEN: ${CLAUDE_CODE_OAUTH_TOKEN}
# Knowledge Graph
FALKORDB_HOST: host.docker.internal
FALKORDB_PORT: 6379
OLLAMA_BASE_URL: http://host.docker.internal:11434
# Integrations
STRIPE_SECRET_KEY: ${STRIPE_SECRET_KEY}
VONAGE_API_KEY: ${VONAGE_API_KEY}
RESEND_API_KEY: ${RESEND_API_KEY}
PERPLEXITY_API_KEY: ${PERPLEXITY_API_KEY}
Volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro # Docker management
- /home/agent/memory:/home/agent/memory # Persistent memory
- /home/agent/downloads:/home/agent/downloads # File storage
- /home/agent/.secure:/home/agent/.secure:ro # Credentials
Health Check:
Traefik Labels:
- "traefik.enable=true"
- "traefik.http.routers.aegis.rule=Host(`aegisagent.ai`)"
- "traefik.http.routers.aegis.entrypoints=websecure"
- "traefik.http.routers.aegis.tls.certresolver=cf"
- "traefik.http.services.aegis.loadbalancer.server.port=8080"
Scheduler Service¶
Image: aegis-core-scheduler (built from Dockerfile)
Base: Python 3.11-slim
Command: Runs scheduler daemon in infinite loop
Purpose: - Cron job execution (APScheduler) - Background task processing - Periodic health checks - Morning/evening routines
Schedule Examples:
# Every 5 minutes: Check system health
@scheduler.scheduled_job('interval', minutes=5)
async def health_check():
...
# Daily 06:00 UTC: Morning routine
@scheduler.scheduled_job('cron', hour=6, minute=0)
async def morning_routine():
...
# Daily 03:00 UTC: Ingest transcripts to knowledge graph
@scheduler.scheduled_job('cron', hour=3, minute=0)
async def ingest_transcripts():
...
Dependencies: - Dashboard (must start first) - Playwright (for screenshots)
Playwright Service¶
Image: ghcr.io/vlazic/playwright-screenshot-api:latest
Purpose: Headless browser for visual monitoring and screenshots
Exposed Ports: 3002:3000
API Endpoints:
- GET /health - Health check
- POST /screenshot - Capture website screenshot
Use Cases: - Visual regression testing - Competitor page monitoring - Proof-of-render for debugging
FalkorDB Service¶
Image: falkordb/falkordb:latest
Purpose: Redis-compatible graph database for Graphiti knowledge graph
Exposed Ports:
- 6379:6379 (Redis protocol)
- 3001:3000 (Browser UI)
Command:
Configuration: - Graph timeout: 30 seconds - Query timeout: 30 seconds - Max memory: Dynamic (no limit)
Volume: falkordb_data:/data (persistent storage)
Health Check:
Browser UI: http://localhost:3001 - Query graph with Cypher - Visualize relationships - Inspect entity properties
Traefik Stack¶
Location: /home/agent/stacks/traefik/docker-compose.yml
Traefik Service¶
Image: traefik:v2.11
Purpose: Reverse proxy with automatic TLS
Exposed Ports:
- 80:80 (HTTP, redirects to HTTPS)
- 443:443 (HTTPS)
- 8081:8080 (Dashboard)
Command Line Args:
# API Dashboard
- "--api.dashboard=true"
- "--api.insecure=true" # Dashboard on :8080 without auth
# Docker provider (reads container labels)
- "--providers.docker=true"
- "--providers.docker.exposedbydefault=false"
- "--providers.docker.network=traefik_proxy"
# File provider (for static configs)
- "--providers.file.directory=/etc/traefik/dynamic"
- "--providers.file.watch=true"
# Entrypoints
- "--entrypoints.web.address=:80"
- "--entrypoints.web.http.redirections.entrypoint.to=websecure"
- "--entrypoints.websecure.address=:443"
# Let's Encrypt (DNS challenge via Cloudflare)
- "--certificatesresolvers.cf.acme.dnschallenge=true"
- "--certificatesresolvers.cf.acme.dnschallenge.provider=cloudflare"
- "--certificatesresolvers.cf.acme.email=aegis@richardbankole.com"
- "--certificatesresolvers.cf.acme.storage=/letsencrypt/acme.json"
# Logging
- "--log.level=INFO"
Environment Variables:
Volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro # Read Docker state
- ./letsencrypt:/letsencrypt # Certificate storage
- ./dynamic:/etc/traefik/dynamic:ro # Static routes
Network: traefik_proxy (external, shared with all services)
Dashboard Access: http://traefik.rbnk.uk (internal only)
Open Notebook Stack¶
Location: /home/agent/stacks/open-notebook/docker-compose.yml
Open Notebook Service¶
Image: lfnovo/open_notebook:v1-latest-single
Purpose: AI-powered research tool (Streamlit UI + FastAPI backend)
Exposed Ports:
- 8502:8502 (Streamlit UI)
- 5055:5055 (FastAPI backend)
Environment Variables:
# LLM Backend (Z.ai via OpenAI-compatible endpoint)
OPENAI_COMPATIBLE_BASE_URL: https://api.z.ai/v1
OPENAI_COMPATIBLE_API_KEY: ${ZAI_API_KEY}
# Internal API URL
API_URL: http://localhost:5055
# SurrealDB (embedded database)
SURREAL_URL: file:///mydata
SURREAL_USER: root
SURREAL_PASSWORD: ${SURREAL_PASSWORD}
SURREAL_NAMESPACE: notebook
SURREAL_DATABASE: research
Volumes:
Traefik Labels:
# Streamlit UI
- "traefik.http.routers.notebooks-aegis.rule=Host(`notebooks.aegisagent.ai`)"
- "traefik.http.services.notebooks-aegis.loadbalancer.server.port=8502"
# FastAPI backend
- "traefik.http.routers.notebooks-api-aegis.rule=Host(`api.notebooks.aegisagent.ai`)"
- "traefik.http.services.notebooks-api-aegis.loadbalancer.server.port=5055"
Use Cases: - Long-form research synthesis - Source aggregation and citation - Multi-step research workflows - Knowledge base construction
Code Server Stack¶
Location: /home/agent/stacks/code-server/docker-compose.yml
Code Server Service¶
Image: lscr.io/linuxserver/code-server:latest
Purpose: Web-based VS Code for remote development
Exposed Ports: 8443:8443
Environment Variables:
PUID: 1000 # Run as user 'agent'
PGID: 1000 # Run as group 'agent'
TZ: UTC
PASSWORD: ${PASSWORD} # Web UI password
SUDO_PASSWORD: ${SUDO_PASSWORD}
DEFAULT_WORKSPACE: /home/agent
Volumes:
- /home/agent:/home/agent # Full filesystem access
- ./config:/config # VS Code settings
- ./config/custom-cont-init.d:/custom-cont-init.d:ro # Init scripts
Traefik Labels:
- "traefik.http.routers.code.rule=Host(`code.aegisagent.ai`)"
- "traefik.http.services.code.loadbalancer.server.port=8443"
Security Considerations:
- Password-protected (stored in ~/.secure/)
- HTTPS-only (via Traefik TLS)
- Full filesystem access (use with caution)
Documentation Stack¶
Location: /home/agent/stacks/aegis-docs/docker-compose.yml
MkDocs Service¶
Image: squidfunk/mkdocs-material:latest
Purpose: Live documentation server
Exposed Ports: 8000:8000
Command:
Volumes:
- /home/agent/projects/aegis-core/docs:/docs # Documentation source
- /home/agent/projects/aegis-core/mkdocs.yml:/mkdocs.yml # Config
Access: http://localhost:8000 (internal only)
Build Pipeline:
Docker Networks¶
traefik_proxy¶
Type: Bridge Subnet: 172.19.0.0/16 Driver: bridge
Purpose: Shared ingress network for all public services
Connected Services: - traefik - aegis-dashboard - aegis-scheduler - aegis-playwright - falkordb - open-notebook - code-server
Why External?
# Create network before starting services
docker network create traefik_proxy
# All services join this network
networks:
traefik_proxy:
external: true
Default Bridge (docker0)¶
Subnet: 172.17.0.0/16 Purpose: Default Docker network (not used by Aegis services)
Container Lifecycle¶
Build Process¶
Dashboard/Scheduler Images:
Build Time: ~3 minutes (cached layers) Base Layers: 1. Python 3.11-slim 2. System dependencies (curl, git, postgresql-client) 3. Python packages (requirements.txt) 4. Application code
Dockerfile Optimizations:
- Multi-stage build (future optimization)
- Layer caching for dependencies
- .dockerignore excludes .git, __pycache__, .venv
Deployment Workflow¶
Manual Deployment:
cd /home/agent/projects/aegis-core
docker compose down
docker compose build
docker compose up -d
docker compose logs -f dashboard
Health Verification:
docker ps # Check all containers running
curl http://localhost:8080/health # Check dashboard
docker logs aegis-dashboard --tail 50 # Check for errors
Rollback:
# Use previous image tag
docker tag aegis-core-dashboard:latest aegis-core-dashboard:backup
docker compose up -d
Resource Limits¶
Dashboard: - CPU: 4 cores (soft limit) - Memory: 4GB (hard limit) - Swap: 2GB
Scheduler: - CPU: 2 cores (soft limit) - Memory: 2GB (hard limit) - Swap: 1GB
FalkorDB: - CPU: 4 cores (soft limit) - Memory: 8GB (hard limit) - No swap (Redis best practice)
Configuration (docker-compose.yml):
services:
dashboard:
deploy:
resources:
limits:
cpus: '4.0'
memory: 4G
reservations:
cpus: '2.0'
memory: 2G
Container Monitoring¶
Health Checks¶
Dashboard:
healthcheck:
test: ["CMD", "python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8080/health')"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
FalkorDB:
Playwright:
healthcheck:
test: ["CMD", "wget", "-q", "-O", "-", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
Restart Policies¶
| Service | Policy | Reason |
|---|---|---|
| Dashboard | unless-stopped | Core service, manual intervention for stops |
| Scheduler | unless-stopped | Background jobs, should always run |
| Traefik | unless-stopped | Ingress, critical for access |
| FalkorDB | unless-stopped | Stateful, needs manual inspection before restart |
| Open Notebook | always | User-facing, maximize availability |
| Code Server | unless-stopped | Dev tool, manual stop intentional |
Logging¶
Log Driver: json-file (default) Log Rotation: - Max size: 10MB per file - Max files: 3 - Total per container: 30MB
Configuration:
View Logs:
# Real-time
docker logs -f aegis-dashboard
# Last 100 lines
docker logs --tail 100 aegis-dashboard
# With timestamps
docker logs -t aegis-dashboard
# Filter by time
docker logs --since 2024-01-01T00:00:00 aegis-dashboard
Container Security¶
Unprivileged Containers¶
All containers run as non-root users where possible:
Dashboard/Scheduler:
Code Server:
Read-Only Filesystems¶
Sensitive Mounts:
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro # Read-only Docker socket
- /home/agent/.secure:/home/agent/.secure:ro # Read-only secrets
Secrets Management¶
Stored Secrets: - Environment variables (runtime injection) - Mounted files (read-only volumes) - Never committed to git
Access Control:
Container Networking Internals¶
Host Gateway¶
Problem: Containers need to reach services on LXC host (PostgreSQL, FalkorDB, Ollama)
Solution: host.docker.internal DNS name
Resolution: host-gateway → 10.10.10.103 (bridge gateway)
Inter-Container Communication¶
Within traefik_proxy network:
- Service discovery via container name
- Example: http://aegis-dashboard:8080
- DNS provided by Docker embedded DNS server
Between Networks: - Not allowed (isolation) - Must route through Traefik or expose ports
Troubleshooting Containers¶
Container Won't Start¶
Diagnosis:
# Check recent logs
docker logs aegis-dashboard --tail 50
# Inspect container config
docker inspect aegis-dashboard
# Check exit code
docker ps -a | grep aegis-dashboard
Common Issues:
- Port already in use: docker ps | grep :8080
- Volume mount fails: Check host path exists
- Image not found: Run docker compose build
- Health check fails: Test endpoint manually
High Resource Usage¶
Diagnosis:
# Real-time stats
docker stats
# Detailed container stats
docker stats aegis-dashboard --no-stream --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}"
Mitigation:
- Restart container: docker compose restart dashboard
- Increase limits: Edit docker-compose.yml
- Check for memory leaks: docker exec aegis-dashboard ps aux
Network Issues¶
Diagnosis:
# Check network connectivity
docker exec aegis-dashboard ping host.docker.internal
docker exec aegis-dashboard curl http://host.docker.internal:5432
# Inspect network
docker network inspect traefik_proxy
Common Issues:
- DNS resolution fails: Restart Docker daemon
- Can't reach host: Check firewall rules
- Traefik routing broken: Check labels in docker ps
Container Maintenance¶
Image Updates¶
Security Patches:
# Pull latest images
docker compose pull
# Rebuild custom images
docker compose build
# Restart with new images
docker compose up -d
Update Schedule: Weekly (automated via cron)
Cleanup¶
Remove Dangling Images:
Remove Unused Volumes:
Remove Stopped Containers:
Full Cleanup:
Disk Usage:
Future Container Improvements¶
Q1 2026¶
- Multi-stage Dockerfile builds (reduce image size)
- Container resource limits (prevent OOM)
- Liveness probes for all services
- Structured logging (JSON logs)
Q2 2026¶
- Kubernetes migration (Helm charts)
- Init containers for dependency checks
- Sidecar containers for telemetry
- Service mesh (Istio/Linkerd)
Q3 2026¶
- Immutable infrastructure (no volume mounts)
- Blue-green deployments
- Canary releases
- A/B testing infrastructure