Network Architecture¶
Overview¶
Aegis uses a multi-layered network architecture with TCP passthrough, ensuring end-to-end TLS encryption from the internet to individual services.
Network Topology¶
graph TB
Internet[Internet] -->|443/TLS| CF[Cloudflare DNS]
CF -->|157.180.63.15| Proxmox[Proxmox Host]
Proxmox -->|Port Forward| DockerTraefik[Dockerhost Traefik<br/>10.10.10.10]
DockerTraefik -->|TCP Passthrough| AegisTraefik[Aegis Traefik<br/>10.10.10.103]
AegisTraefik -->|Route| Dashboard[Dashboard :8080]
AegisTraefik -->|Route| Code[VS Code :8443]
AegisTraefik -->|Route| Notebook[Open Notebook :8502]
AegisTraefik -->|Route| Intel[Intel Dashboard :8080]
Dashboard -.->|host.docker.internal| Postgres[(PostgreSQL :5432)]
Dashboard -.->|host.docker.internal| FalkorDB[(FalkorDB :6379)]
Dashboard -.->|host.docker.internal| Ollama[Ollama :11434]
style CF fill:#f96,stroke:#333
style DockerTraefik fill:#85c1e9,stroke:#333
style AegisTraefik fill:#85c1e9,stroke:#333
style Postgres fill:#76d7c4,stroke:#333
style FalkorDB fill:#76d7c4,stroke:#333
IP Address Allocation¶
Public Network¶
| Host | IP | Purpose |
|---|---|---|
| Hetzner Server | 157.180.63.15 | Public-facing IP |
Internal Network (10.10.10.0/24)¶
| Host | IP | Purpose | OS |
|---|---|---|---|
| Proxmox Gateway | 10.10.10.1 | Network gateway | Proxmox VE 8.x |
| Dockerhost | 10.10.10.10 | Primary Traefik | Ubuntu 22.04 |
| Aegis LXC | 10.10.10.103 | Aegis container | Ubuntu 24.04 |
VPN Network (Tailscale)¶
| Host | IP | Purpose |
|---|---|---|
| Aegis | 100.114.189.93 | Tailscale mesh VPN |
Docker Networks¶
| Network | Subnet | Purpose |
|---|---|---|
| docker0 | 172.17.0.0/16 | Default Docker bridge |
| traefik_proxy | 172.19.0.0/16 | Traefik ingress network |
Traffic Flow¶
External Request Flow¶
1. User Request
↓
2. DNS Resolution (Cloudflare)
aegisagent.ai → 157.180.63.15
↓
3. Proxmox Port Forward
443:443 → 10.10.10.10
↓
4. Dockerhost Traefik (SNI Inspection)
- Reads TLS ServerName
- Does NOT terminate TLS
- Forwards to Aegis based on hostname
↓
5. Aegis Traefik (TLS Termination)
- Terminates TLS with Let's Encrypt cert
- Routes to backend service via Docker labels
↓
6. Backend Service
- Responds over HTTP (internal)
↓
7. Response Path (reverse of above)
- TLS encryption applied at Aegis Traefik
- Sent back through Dockerhost (passthrough)
- Returns to user
Internal Service Communication¶
Dashboard Container
↓
host.docker.internal (10.10.10.103)
↓
PostgreSQL (localhost:5432)
FalkorDB (localhost:6379)
Ollama (localhost:11434)
Why host.docker.internal?
- Containers need to reach services on the LXC host
- Docker's special DNS name resolves to host gateway IP
- Defined in docker-compose.yml via extra_hosts
Port Allocation¶
Aegis LXC Ports¶
| Port | Service | Access | Purpose |
|---|---|---|---|
| 22 | SSH | Internal | Remote administration |
| 80 | Traefik HTTP | Public | HTTP redirect to 443 |
| 443 | Traefik HTTPS | Public | TLS ingress |
| 5432 | PostgreSQL | Internal | Database |
| 6379 | FalkorDB | Internal | Knowledge graph |
| 8080 | Dashboard | Internal | Web UI (via Traefik) |
| 8081 | Traefik Dashboard | Internal | Traefik web UI |
| 8502 | Open Notebook | Internal | Research tool (via Traefik) |
| 5055 | Notebook API | Internal | Research API (via Traefik) |
| 8443 | VS Code | Internal | Code editor (via Traefik) |
| 3001 | FalkorDB Browser | Internal | Graph visualization |
| 3002 | Playwright | Internal | Screenshot API |
| 11434 | Ollama | Internal | Local LLM inference |
Public Ports (Proxmox)¶
| Port | Forwarded To | Purpose |
|---|---|---|
| 22 | 10.10.10.103:22 | Aegis SSH |
| 80 | 10.10.10.10:80 | Dockerhost Traefik HTTP |
| 443 | 10.10.10.10:443 | Dockerhost Traefik HTTPS |
| 8006 | Proxmox:8006 | Proxmox Web UI |
Domain Routing¶
Primary Domain: aegisagent.ai¶
| Subdomain | Target | Service | Port |
|---|---|---|---|
| (root) | aegis.rbnk.uk redirect | Dashboard | 8080 |
| code | aegis-code.rbnk.uk redirect | VS Code | 8443 |
| intel | Aegis Traefik | Intel Dashboard | 8080 |
| notebooks | Aegis Traefik | Open Notebook UI | 8502 |
| api.notebooks | Aegis Traefik | Open Notebook API | 5055 |
| vnc | Aegis Traefik | VNC Access | 6080 |
Legacy Domain: rbnk.uk¶
| Subdomain | Redirect To | Purpose |
|---|---|---|
| aegis | aegisagent.ai | Dashboard (301 permanent) |
| aegis-code | code.aegisagent.ai | VS Code (301 permanent) |
| aegis-notebook | notebooks.aegisagent.ai | Notebook (301 permanent) |
| aegis-notebook-api | api.notebooks.aegisagent.ai | API (301 permanent) |
| traefik | Internal only | Traefik dashboard |
Internal Domains¶
| Domain | Target | Access |
|---|---|---|
| *.rbnk.uk | 10.10.10.103 | Internal network only |
| localhost | 127.0.0.1 | LXC container only |
DNS Configuration¶
Cloudflare DNS Records¶
A Records:
aegisagent.ai → 157.180.63.15
*.aegisagent.ai → 157.180.63.15
rbnk.uk → 157.180.63.15
*.rbnk.uk → 157.180.63.15
DNS Settings:
- TTL: 10 minutes (for fast failover)
- Proxy Status: DNS only (orange cloud OFF)
- DNSSEC: Enabled
- CAA Records: 0 issue "letsencrypt.org"
Why DNS-only (no proxy)? - TCP passthrough requires direct connection - Cloudflare proxy doesn't support custom ports - We manage TLS ourselves with Traefik + Let's Encrypt
TLS/SSL Configuration¶
Certificate Management¶
Certificate Authority: Let's Encrypt Challenge Type: DNS-01 (via Cloudflare API) Renewal: Automatic (30 days before expiry)
Certificate Locations:
- Aegis Traefik: /home/agent/stacks/traefik/letsencrypt/acme.json
- Dockerhost Traefik: /srv/dockerdata/traefik/letsencrypt/acme.json
TLS Settings¶
Minimum TLS Version: 1.2 Cipher Suites: Modern profile - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 - TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305
HSTS: Enabled (max-age=31536000) Certificate Pinning: Not implemented (for flexibility)
Traefik Configuration¶
Dockerhost Traefik (10.10.10.10)¶
Role: TCP passthrough router (SNI-based routing)
Configuration (/srv/dockerdata/traefik/dynamic/aegis-passthrough.yml):
tcp:
routers:
aegis-passthrough:
rule: "HostSNI(`aegisagent.ai`) || HostSNI(`*.aegisagent.ai`) || HostSNI(`*.rbnk.uk`)"
service: aegis-backend
tls:
passthrough: true
services:
aegis-backend:
loadBalancer:
servers:
- address: "10.10.10.103:443"
Why Passthrough? - Aegis manages its own certificates - End-to-end TLS encryption - Dockerhost doesn't need to trust Aegis certs - Simplifies certificate renewal
Aegis Traefik (10.10.10.103)¶
Role: TLS termination and HTTP routing
Key Features: - Docker provider (reads container labels) - File provider (for non-Docker services) - Let's Encrypt DNS challenge - HTTP to HTTPS redirect - Dashboard on port 8081
Example Service Labels (from docker-compose.yml):
labels:
- "traefik.enable=true"
- "traefik.http.routers.aegis.rule=Host(`aegisagent.ai`)"
- "traefik.http.routers.aegis.entrypoints=websecure"
- "traefik.http.routers.aegis.tls.certresolver=cf"
- "traefik.http.services.aegis.loadbalancer.server.port=8080"
Firewall Rules¶
Proxmox Host Firewall¶
# Allow SSH (from anywhere)
ACCEPT tcp 22
# Allow HTTP/HTTPS (from anywhere)
ACCEPT tcp 80
ACCEPT tcp 443
# Allow Proxmox Web UI (from trusted IPs only)
ACCEPT tcp 8006 source=<trusted-ips>
# Allow Tailscale (from anywhere)
ACCEPT udp 41641
# Drop all other traffic
DROP all
Aegis LXC Firewall (iptables)¶
# Allow established connections
ACCEPT all state=RELATED,ESTABLISHED
# Allow loopback
ACCEPT all -i lo
# Allow SSH from Tailscale network
ACCEPT tcp 22 -s 100.0.0.0/8
# Allow HTTP/HTTPS from Dockerhost
ACCEPT tcp 80,443 -s 10.10.10.10
# Allow internal Docker traffic
ACCEPT all -s 172.16.0.0/12
# Drop invalid packets
DROP all state=INVALID
# Log and drop everything else
LOG all
DROP all
Load Balancing¶
Current Setup: Single Node¶
- No load balancing: All traffic to 10.10.10.103
- No failover: Single point of failure
- Scaling: Vertical only (more resources to LXC)
Future: Multi-Node with HAProxy¶
Internet → Cloudflare
↓
HAProxy (10.10.10.10)
↓
Round-robin distribution:
├─→ Aegis-01 (10.10.10.103)
└─→ Aegis-02 (10.10.10.104)
Health Checks: - HTTP GET /health every 5 seconds - Remove node if 3 consecutive failures - Re-add node after 2 consecutive successes
Network Security¶
DDoS Protection¶
Layer 3/4 (Network/Transport): - Hetzner upstream filtering (100 Gbit/s capacity) - Synproxy for SYN flood protection - Connection rate limiting (iptables)
Layer 7 (Application): - Traefik rate limiting middleware - API rate limits per customer (Redis-backed) - Challenge-response for suspicious traffic (future)
Intrusion Detection¶
fail2ban Configuration:
[sshd]
enabled = true
bantime = 3600 # 1 hour
maxretry = 3
[traefik-auth]
enabled = true
bantime = 7200 # 2 hours
maxretry = 5
Banned IPs: Stored in ~/.claude/banned_ips.txt
Auto-unban: After ban duration expires
VPN Access (Tailscale)¶
Tailnet: rickoslyder@ Access Control: - SSH: Only from Tailscale network (100.0.0.0/8) - Admin endpoints: Require Tailscale IP - MagicDNS: Enabled (aegis.tail-scale.ts.net)
Use Cases: - Remote SSH when traveling - Database access (psql over Tailscale) - Development environment access
Network Monitoring¶
Traffic Analysis¶
Tools:
- iftop: Real-time bandwidth monitoring
- nethogs: Per-process network usage
- tcpdump: Packet capture for debugging
Metrics Collected: - Bytes in/out per interface - Packet loss rate - TCP retransmissions - DNS query times
Health Checks¶
External Monitoring (via Uptime Kuma): - https://aegisagent.ai/health (every 60 seconds) - https://intel.aegisagent.ai/health (every 60 seconds) - https://notebooks.aegisagent.ai/ (every 300 seconds)
Internal Monitoring (via Prometheus): - Traefik metrics (requests, errors, latency) - Docker network metrics - PostgreSQL connection pool usage
Alerts¶
Trigger Conditions: - Service down for > 2 minutes - HTTP 5xx rate > 10% for 5 minutes - Network latency > 500ms for 10 minutes - Bandwidth saturation > 80% for 15 minutes
Notification Targets: - Discord #alerts channel - Telegram bot - WhatsApp (for critical alerts only)
Troubleshooting¶
Common Issues¶
"Connection Refused" on aegisagent.ai¶
Diagnosis:
# Check if Traefik is running
docker ps | grep traefik
# Check Traefik logs
docker logs traefik -f
# Verify port 443 is listening
netstat -tlnp | grep :443
Possible Causes: - Traefik container stopped - Certificate renewal failed - Dockerhost passthrough misconfigured
Services Unreachable from Containers¶
Diagnosis:
# From inside container
ping host.docker.internal
curl http://host.docker.internal:5432
# Check if ports are listening on host
netstat -tlnp | grep -E "5432|6379|11434"
Possible Causes:
- PostgreSQL not bound to 0.0.0.0
- Firewall blocking internal connections
- Wrong extra_hosts configuration
Let's Encrypt Rate Limit¶
Error: "too many certificates already issued"
Diagnosis:
# Check certificate status
docker exec traefik cat /letsencrypt/acme.json | jq '.cf.Certificates[] | {domain: .domain.main, expiry: .certificate}'
Solution:
- Wait 7 days for rate limit reset
- Use staging server for testing: --certificatesresolvers.cf.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory
Network Performance Optimization¶
Current Optimizations¶
-
TCP BBR Congestion Control:
-
Increased Connection Limits:
-
Docker Network MTU: 1500 bytes (standard Ethernet)
-
Traefik Connection Pooling: Keep-alive enabled
Future Optimizations¶
- HTTP/3 (QUIC) support in Traefik
- WebSocket connection pooling
- gRPC for internal service communication
- Service mesh (Istio/Linkerd) for mTLS
Network Diagram (Detailed)¶
┌─────────────────────────────────────────────────────────────┐
│ INTERNET │
└─────────────────────────┬───────────────────────────────────┘
│
┌───────▼────────┐
│ Cloudflare │
│ DNS (10min TTL│
└───────┬────────┘
│
157.180.63.15:443 (TLS)
│
┌─────────────────────────▼───────────────────────────────────┐
│ PROXMOX HOST │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Port Forward: 443 → 10.10.10.10:443 │ │
│ └──────────────────────┬───────────────────────────────┘ │
└─────────────────────────┼───────────────────────────────────┘
│
┌───────▼────────┐
│ Dockerhost │
│ 10.10.10.10 │
│ Traefik │
│ (Passthrough) │
└───────┬────────┘
│
TCP Passthrough (TLS intact)
│
┌─────────────────────────▼───────────────────────────────────┐
│ AEGIS LXC (10.10.10.103) │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Traefik (TLS Termination) │ │
│ │ ┌────────────────────────────────────────────────┐ │ │
│ │ │ Router: Host(`aegisagent.ai`) │ │ │
│ │ │ TLS: Let's Encrypt (DNS Challenge) │ │ │
│ │ └────────┬───────────────────────────────────────┘ │ │
│ └───────────┼──────────────────────────────────────────┘ │
│ │ │
│ ┌─────────┴─────────┬──────────┬────────────┐ │
│ │ │ │ │ │
│ ┌─▼──────┐ ┌────────▼────┐ ┌──▼──────┐ ┌──▼──────┐ │
│ │Dashboard│ │ VS Code │ │ Notebook│ │ Intel │ │
│ │ :8080 │ │ :8443 │ │ :8502 │ │ :8080 │ │
│ └─────┬───┘ └─────────────┘ └─────────┘ └─────────┘ │
│ │ │
│ │ host.docker.internal (10.10.10.103) │
│ │ │
│ ┌─────▼───────────┬───────────────┬───────────────┐ │
│ │ │ │ │ │
│ ┌▼──────────┐ ┌────▼────────┐ ┌───▼────────┐ ┌────▼─────┐ │
│ │ PostgreSQL│ │ FalkorDB │ │ Ollama │ │ Redis │ │
│ │ :5432 │ │ :6379 │ │ :11434 │ │ :6380 │ │
│ └───────────┘ └─────────────┘ └────────────┘ └──────────┘ │
│ │
│ Docker Networks: │
│ - traefik_proxy (172.19.0.0/16) │
│ - docker0 (172.17.0.0/16) │
│ │
│ Host Network: │
│ - eth0: 10.10.10.103/24 │
│ - tailscale0: 100.114.189.93/32 │
└─────────────────────────────────────────────────────────────┘
Bandwidth Usage¶
Typical Traffic Patterns¶
| Service | Ingress | Egress | Daily Total |
|---|---|---|---|
| Dashboard | 50 MB | 200 MB | 250 MB |
| VS Code | 10 MB | 30 MB | 40 MB |
| Open Notebook | 100 MB | 500 MB | 600 MB |
| API Requests | 200 MB | 800 MB | 1 GB |
| Ollama (internal) | - | - | - |
| Total | ~360 MB | ~1.5 GB | ~2 GB/day |
Monthly Bandwidth: ~60 GB/month (well under 20TB limit)