Skip to content

Aegis System Architecture

Comprehensive technical documentation for disaster recovery and system understanding

System Overview

Aegis is an autonomous AI agent platform operating within a contained LXC environment on a Hetzner EX130-R dedicated server. The system combines multi-tier memory, intelligent LLM routing, graph-based workflows, and multi-channel communication.

flowchart TB
    subgraph Internet["Internet"]
        Users[Users]
        APIs[External APIs]
    end

    subgraph Hetzner["Hetzner EX130-R (157.180.63.15)"]
        subgraph Proxmox["Proxmox Host"]
            direction TB

            subgraph Dockerhost["Dockerhost LXC (10.10.10.10)"]
                TraefikMain["Traefik Proxy<br/>TCP Passthrough<br/>:80, :443"]
            end

            subgraph AegisLXC["Aegis LXC (10.10.10.103)"]
                direction TB

                subgraph Docker["Docker Compose Stack"]
                    TraefikLocal["Traefik<br/>Local Routing"]
                    Dashboard["Dashboard<br/>FastAPI<br/>:8080"]
                    Scheduler["Scheduler<br/>APScheduler"]
                    Playwright["Playwright<br/>Screenshots<br/>:3002"]
                    FalkorDB["FalkorDB<br/>Knowledge Graph<br/>:6379"]
                end

                subgraph Host["Host Services"]
                    PostgreSQL["PostgreSQL<br/>:5432"]
                    Ollama["Ollama<br/>Local LLMs<br/>:11434"]
                    FileSystem["Filesystem<br/>/home/agent"]
                end
            end
        end
    end

    Internet --> TraefikMain
    TraefikMain --> TraefikLocal
    TraefikLocal --> Dashboard
    Dashboard --> Scheduler
    Dashboard --> PostgreSQL
    Dashboard --> Ollama
    Dashboard --> FalkorDB
    Dashboard --> Playwright
    Scheduler --> PostgreSQL
    Scheduler --> Playwright

Core Architecture Principles

1. Containment & Autonomy

  • LXC-based isolation with controlled resource limits (110GB memory)
  • Docker Compose for service orchestration
  • Host-gateway networking for PostgreSQL and Ollama access
  • Traefik for SSL termination and routing

2. Multi-Tier Cognition

Tier Model Use Case Rate Limit
1 Claude Opus 4.5 Strategic decisions, architecture Rare
1.5 Claude Haiku 4.5 Fast ops: classify, extract, parse High
2 GLM-4.7 via Z.ai 90%+ operational work ~8 req/min
3 Ollama Local Fallback, offline, vision Unlimited

3. Persistent Memory Architecture

  • Episodic: Events, decisions, interactions (PostgreSQL)
  • Semantic: Knowledge, facts, learnings (PostgreSQL + FalkorDB)
  • Procedural: Workflows, how-tos, templates (filesystem)
  • Cache: LLM responses, tool outputs (in-memory)
  • Knowledge Graph: Entity-relationship mapping (FalkorDB)

4. Graph-Based Execution

  • LangGraph-inspired workflow engine
  • PostgreSQL-backed checkpointing for crash recovery
  • Human-in-the-loop interrupt nodes
  • Conditional branching and error handling

5. Multi-Channel Communication

Channel Purpose
Discord Primary (status, logs, alerts, journal, tasks)
WhatsApp Two-way command channel (Vonage WABA)
Voice Inbound calls with ASR (Vonage)
Telegram Time-sensitive alerts
Email Gmail triage and automation

Quick Reference

Network Topology

IP Host Role
157.180.63.15 Proxmox Host Public IP
10.10.10.10 Dockerhost LXC Main Traefik (TCP passthrough)
10.10.10.103 Aegis LXC This instance

Public Domains

Primary: aegisagent.ai - https://aegisagent.ai - Dashboard - https://intel.aegisagent.ai - Geopolitical Intelligence - https://notebooks.aegisagent.ai - Open Notebook Research - https://code.aegisagent.ai - VS Code - https://vnc.aegisagent.ai - VNC Access

Container Stack

Container Image Ports Purpose
aegis-dashboard aegis-core-dashboard 8080 FastAPI web app
aegis-scheduler aegis-core-scheduler - APScheduler daemon
aegis-playwright playwright-screenshot-api 3002 Browser automation
falkordb falkordb/falkordb 6379, 3001 Knowledge graph
traefik traefik:v2.11 80, 443 Reverse proxy

Resource Usage

Resource Total Used Available
Memory 110 GiB ~16 GiB ~93 GiB
Disk 196 GB 164 GB 23 GB
CPU Cores 16 - -

Key File Locations

/home/agent/
├── .secure/                    # Credentials (never commit)
├── .claude/                    # Claude Code configuration
│   ├── settings.json           # Agent SDK config
│   ├── history.jsonl           # Session history
│   └── hooks/                  # Pre/post tool hooks
├── memory/                     # Multi-tier memory
│   ├── episodic/               # Events, decisions
│   ├── semantic/               # Knowledge, learnings
│   ├── procedural/             # Workflows, how-tos
│   └── journal/                # Daily journals
├── projects/
│   └── aegis-core/             # Main codebase
│       ├── aegis/              # Python package (78 modules)
│       ├── docker-compose.yml  # Service definitions
│       └── docs/               # Documentation
└── stacks/                     # Docker stacks

Documentation Index

Design Philosophy

Value Creation Over Feature Accumulation

Problem: 60-70% of the Aegis codebase (144K lines) is unused.

Solution: Prioritize SHIP (70%) > REACTIVE (20%) > PROACTIVE (10%) - Wire built features to commands/workflows/cron - Fix bugs in shipped features - Run demos, find breakage, fix it

Resource Discipline

  • 110GB memory limit enforced by LXC
  • $50/month API budget via Privacy.com
  • Tier selection protocol minimizes Claude API costs

Three-Strike Debug Protocol

  1. Strike 1: Retry with modified approach
  2. Strike 2: Switch to local reasoning model
  3. Strike 3: STOP, document, escalate to human

Last Updated: 2026-01-25