Aegis System Architecture¶
Comprehensive technical documentation for disaster recovery and system understanding
System Overview¶
Aegis is an autonomous AI agent platform operating within a contained LXC environment on a Hetzner EX130-R dedicated server. The system combines multi-tier memory, intelligent LLM routing, graph-based workflows, and multi-channel communication.
flowchart TB
subgraph Internet["Internet"]
Users[Users]
APIs[External APIs]
end
subgraph Hetzner["Hetzner EX130-R (157.180.63.15)"]
subgraph Proxmox["Proxmox Host"]
direction TB
subgraph Dockerhost["Dockerhost LXC (10.10.10.10)"]
TraefikMain["Traefik Proxy<br/>TCP Passthrough<br/>:80, :443"]
end
subgraph AegisLXC["Aegis LXC (10.10.10.103)"]
direction TB
subgraph Docker["Docker Compose Stack"]
TraefikLocal["Traefik<br/>Local Routing"]
Dashboard["Dashboard<br/>FastAPI<br/>:8080"]
Scheduler["Scheduler<br/>APScheduler"]
Playwright["Playwright<br/>Screenshots<br/>:3002"]
FalkorDB["FalkorDB<br/>Knowledge Graph<br/>:6379"]
end
subgraph Host["Host Services"]
PostgreSQL["PostgreSQL<br/>:5432"]
Ollama["Ollama<br/>Local LLMs<br/>:11434"]
FileSystem["Filesystem<br/>/home/agent"]
end
end
end
end
Internet --> TraefikMain
TraefikMain --> TraefikLocal
TraefikLocal --> Dashboard
Dashboard --> Scheduler
Dashboard --> PostgreSQL
Dashboard --> Ollama
Dashboard --> FalkorDB
Dashboard --> Playwright
Scheduler --> PostgreSQL
Scheduler --> Playwright
Core Architecture Principles¶
1. Containment & Autonomy¶
- LXC-based isolation with controlled resource limits (110GB memory)
- Docker Compose for service orchestration
- Host-gateway networking for PostgreSQL and Ollama access
- Traefik for SSL termination and routing
2. Multi-Tier Cognition¶
| Tier | Model | Use Case | Rate Limit |
|---|---|---|---|
| 1 | Claude Opus 4.5 | Strategic decisions, architecture | Rare |
| 1.5 | Claude Haiku 4.5 | Fast ops: classify, extract, parse | High |
| 2 | GLM-4.7 via Z.ai | 90%+ operational work | ~8 req/min |
| 3 | Ollama Local | Fallback, offline, vision | Unlimited |
3. Persistent Memory Architecture¶
- Episodic: Events, decisions, interactions (PostgreSQL)
- Semantic: Knowledge, facts, learnings (PostgreSQL + FalkorDB)
- Procedural: Workflows, how-tos, templates (filesystem)
- Cache: LLM responses, tool outputs (in-memory)
- Knowledge Graph: Entity-relationship mapping (FalkorDB)
4. Graph-Based Execution¶
- LangGraph-inspired workflow engine
- PostgreSQL-backed checkpointing for crash recovery
- Human-in-the-loop interrupt nodes
- Conditional branching and error handling
5. Multi-Channel Communication¶
| Channel | Purpose |
|---|---|
| Discord | Primary (status, logs, alerts, journal, tasks) |
| Two-way command channel (Vonage WABA) | |
| Voice | Inbound calls with ASR (Vonage) |
| Telegram | Time-sensitive alerts |
| Gmail triage and automation |
Quick Reference¶
Network Topology¶
| IP | Host | Role |
|---|---|---|
| 157.180.63.15 | Proxmox Host | Public IP |
| 10.10.10.10 | Dockerhost LXC | Main Traefik (TCP passthrough) |
| 10.10.10.103 | Aegis LXC | This instance |
Public Domains¶
Primary: aegisagent.ai - https://aegisagent.ai - Dashboard - https://intel.aegisagent.ai - Geopolitical Intelligence - https://notebooks.aegisagent.ai - Open Notebook Research - https://code.aegisagent.ai - VS Code - https://vnc.aegisagent.ai - VNC Access
Container Stack¶
| Container | Image | Ports | Purpose |
|---|---|---|---|
| aegis-dashboard | aegis-core-dashboard | 8080 | FastAPI web app |
| aegis-scheduler | aegis-core-scheduler | - | APScheduler daemon |
| aegis-playwright | playwright-screenshot-api | 3002 | Browser automation |
| falkordb | falkordb/falkordb | 6379, 3001 | Knowledge graph |
| traefik | traefik:v2.11 | 80, 443 | Reverse proxy |
Resource Usage¶
| Resource | Total | Used | Available |
|---|---|---|---|
| Memory | 110 GiB | ~16 GiB | ~93 GiB |
| Disk | 196 GB | 164 GB | 23 GB |
| CPU Cores | 16 | - | - |
Key File Locations¶
/home/agent/
├── .secure/ # Credentials (never commit)
├── .claude/ # Claude Code configuration
│ ├── settings.json # Agent SDK config
│ ├── history.jsonl # Session history
│ └── hooks/ # Pre/post tool hooks
├── memory/ # Multi-tier memory
│ ├── episodic/ # Events, decisions
│ ├── semantic/ # Knowledge, learnings
│ ├── procedural/ # Workflows, how-tos
│ └── journal/ # Daily journals
├── projects/
│ └── aegis-core/ # Main codebase
│ ├── aegis/ # Python package (78 modules)
│ ├── docker-compose.yml # Service definitions
│ └── docs/ # Documentation
└── stacks/ # Docker stacks
Documentation Index¶
- infrastructure.md - Hetzner server, LXC config, resources
- network.md - Network topology, DNS, SSL/TLS
- containers.md - Docker stack, health checks
- database.md - PostgreSQL schema (137 tables)
- memory.md - Memory system architecture
- llm-routing.md - Cognitive hierarchy
Design Philosophy¶
Value Creation Over Feature Accumulation¶
Problem: 60-70% of the Aegis codebase (144K lines) is unused.
Solution: Prioritize SHIP (70%) > REACTIVE (20%) > PROACTIVE (10%) - Wire built features to commands/workflows/cron - Fix bugs in shipped features - Run demos, find breakage, fix it
Resource Discipline¶
- 110GB memory limit enforced by LXC
- $50/month API budget via Privacy.com
- Tier selection protocol minimizes Claude API costs
Three-Strike Debug Protocol¶
- Strike 1: Retry with modified approach
- Strike 2: Switch to local reasoning model
- Strike 3: STOP, document, escalate to human
Last Updated: 2026-01-25