Skip to content

Network Architecture

Overview

Aegis uses a multi-layered network architecture with TCP passthrough, ensuring end-to-end TLS encryption from the internet to individual services.

Network Topology

graph TB
    Internet[Internet] -->|443/TLS| CF[Cloudflare DNS]
    CF -->|157.180.63.15| Proxmox[Proxmox Host]
    Proxmox -->|Port Forward| DockerTraefik[Dockerhost Traefik<br/>10.10.10.10]
    DockerTraefik -->|TCP Passthrough| AegisTraefik[Aegis Traefik<br/>10.10.10.103]

    AegisTraefik -->|Route| Dashboard[Dashboard :8080]
    AegisTraefik -->|Route| Code[VS Code :8443]
    AegisTraefik -->|Route| Notebook[Open Notebook :8502]
    AegisTraefik -->|Route| Intel[Intel Dashboard :8080]

    Dashboard -.->|host.docker.internal| Postgres[(PostgreSQL :5432)]
    Dashboard -.->|host.docker.internal| FalkorDB[(FalkorDB :6379)]
    Dashboard -.->|host.docker.internal| Ollama[Ollama :11434]

    style CF fill:#f96,stroke:#333
    style DockerTraefik fill:#85c1e9,stroke:#333
    style AegisTraefik fill:#85c1e9,stroke:#333
    style Postgres fill:#76d7c4,stroke:#333
    style FalkorDB fill:#76d7c4,stroke:#333

IP Address Allocation

Public Network

Host IP Purpose
Hetzner Server 157.180.63.15 Public-facing IP

Internal Network (10.10.10.0/24)

Host IP Purpose OS
Proxmox Gateway 10.10.10.1 Network gateway Proxmox VE 8.x
Dockerhost 10.10.10.10 Primary Traefik Ubuntu 22.04
Aegis LXC 10.10.10.103 Aegis container Ubuntu 24.04

VPN Network (Tailscale)

Host IP Purpose
Aegis 100.114.189.93 Tailscale mesh VPN

Docker Networks

Network Subnet Purpose
docker0 172.17.0.0/16 Default Docker bridge
traefik_proxy 172.19.0.0/16 Traefik ingress network

Traffic Flow

External Request Flow

1. User Request
2. DNS Resolution (Cloudflare)
   aegisagent.ai → 157.180.63.15
3. Proxmox Port Forward
   443:443 → 10.10.10.10
4. Dockerhost Traefik (SNI Inspection)
   - Reads TLS ServerName
   - Does NOT terminate TLS
   - Forwards to Aegis based on hostname
5. Aegis Traefik (TLS Termination)
   - Terminates TLS with Let's Encrypt cert
   - Routes to backend service via Docker labels
6. Backend Service
   - Responds over HTTP (internal)
7. Response Path (reverse of above)
   - TLS encryption applied at Aegis Traefik
   - Sent back through Dockerhost (passthrough)
   - Returns to user

Internal Service Communication

Dashboard Container
host.docker.internal (10.10.10.103)
PostgreSQL (localhost:5432)
FalkorDB (localhost:6379)
Ollama (localhost:11434)

Why host.docker.internal? - Containers need to reach services on the LXC host - Docker's special DNS name resolves to host gateway IP - Defined in docker-compose.yml via extra_hosts

Port Allocation

Aegis LXC Ports

Port Service Access Purpose
22 SSH Internal Remote administration
80 Traefik HTTP Public HTTP redirect to 443
443 Traefik HTTPS Public TLS ingress
5432 PostgreSQL Internal Database
6379 FalkorDB Internal Knowledge graph
8080 Dashboard Internal Web UI (via Traefik)
8081 Traefik Dashboard Internal Traefik web UI
8502 Open Notebook Internal Research tool (via Traefik)
5055 Notebook API Internal Research API (via Traefik)
8443 VS Code Internal Code editor (via Traefik)
3001 FalkorDB Browser Internal Graph visualization
3002 Playwright Internal Screenshot API
11434 Ollama Internal Local LLM inference

Public Ports (Proxmox)

Port Forwarded To Purpose
22 10.10.10.103:22 Aegis SSH
80 10.10.10.10:80 Dockerhost Traefik HTTP
443 10.10.10.10:443 Dockerhost Traefik HTTPS
8006 Proxmox:8006 Proxmox Web UI

Domain Routing

Primary Domain: aegisagent.ai

Subdomain Target Service Port
(root) aegis.rbnk.uk redirect Dashboard 8080
code aegis-code.rbnk.uk redirect VS Code 8443
intel Aegis Traefik Intel Dashboard 8080
notebooks Aegis Traefik Open Notebook UI 8502
api.notebooks Aegis Traefik Open Notebook API 5055
vnc Aegis Traefik VNC Access 6080

Legacy Domain: rbnk.uk

Subdomain Redirect To Purpose
aegis aegisagent.ai Dashboard (301 permanent)
aegis-code code.aegisagent.ai VS Code (301 permanent)
aegis-notebook notebooks.aegisagent.ai Notebook (301 permanent)
aegis-notebook-api api.notebooks.aegisagent.ai API (301 permanent)
traefik Internal only Traefik dashboard

Internal Domains

Domain Target Access
*.rbnk.uk 10.10.10.103 Internal network only
localhost 127.0.0.1 LXC container only

DNS Configuration

Cloudflare DNS Records

A Records:

aegisagent.ai           → 157.180.63.15
*.aegisagent.ai         → 157.180.63.15
rbnk.uk                 → 157.180.63.15
*.rbnk.uk               → 157.180.63.15

DNS Settings: - TTL: 10 minutes (for fast failover) - Proxy Status: DNS only (orange cloud OFF) - DNSSEC: Enabled - CAA Records: 0 issue "letsencrypt.org"

Why DNS-only (no proxy)? - TCP passthrough requires direct connection - Cloudflare proxy doesn't support custom ports - We manage TLS ourselves with Traefik + Let's Encrypt

TLS/SSL Configuration

Certificate Management

Certificate Authority: Let's Encrypt Challenge Type: DNS-01 (via Cloudflare API) Renewal: Automatic (30 days before expiry)

Certificate Locations: - Aegis Traefik: /home/agent/stacks/traefik/letsencrypt/acme.json - Dockerhost Traefik: /srv/dockerdata/traefik/letsencrypt/acme.json

TLS Settings

Minimum TLS Version: 1.2 Cipher Suites: Modern profile - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 - TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305

HSTS: Enabled (max-age=31536000) Certificate Pinning: Not implemented (for flexibility)

Traefik Configuration

Dockerhost Traefik (10.10.10.10)

Role: TCP passthrough router (SNI-based routing)

Configuration (/srv/dockerdata/traefik/dynamic/aegis-passthrough.yml):

tcp:
  routers:
    aegis-passthrough:
      rule: "HostSNI(`aegisagent.ai`) || HostSNI(`*.aegisagent.ai`) || HostSNI(`*.rbnk.uk`)"
      service: aegis-backend
      tls:
        passthrough: true
  services:
    aegis-backend:
      loadBalancer:
        servers:
          - address: "10.10.10.103:443"

Why Passthrough? - Aegis manages its own certificates - End-to-end TLS encryption - Dockerhost doesn't need to trust Aegis certs - Simplifies certificate renewal

Aegis Traefik (10.10.10.103)

Role: TLS termination and HTTP routing

Key Features: - Docker provider (reads container labels) - File provider (for non-Docker services) - Let's Encrypt DNS challenge - HTTP to HTTPS redirect - Dashboard on port 8081

Example Service Labels (from docker-compose.yml):

labels:
  - "traefik.enable=true"
  - "traefik.http.routers.aegis.rule=Host(`aegisagent.ai`)"
  - "traefik.http.routers.aegis.entrypoints=websecure"
  - "traefik.http.routers.aegis.tls.certresolver=cf"
  - "traefik.http.services.aegis.loadbalancer.server.port=8080"

Firewall Rules

Proxmox Host Firewall

# Allow SSH (from anywhere)
ACCEPT tcp 22

# Allow HTTP/HTTPS (from anywhere)
ACCEPT tcp 80
ACCEPT tcp 443

# Allow Proxmox Web UI (from trusted IPs only)
ACCEPT tcp 8006 source=<trusted-ips>

# Allow Tailscale (from anywhere)
ACCEPT udp 41641

# Drop all other traffic
DROP all

Aegis LXC Firewall (iptables)

# Allow established connections
ACCEPT all state=RELATED,ESTABLISHED

# Allow loopback
ACCEPT all -i lo

# Allow SSH from Tailscale network
ACCEPT tcp 22 -s 100.0.0.0/8

# Allow HTTP/HTTPS from Dockerhost
ACCEPT tcp 80,443 -s 10.10.10.10

# Allow internal Docker traffic
ACCEPT all -s 172.16.0.0/12

# Drop invalid packets
DROP all state=INVALID

# Log and drop everything else
LOG all
DROP all

Load Balancing

Current Setup: Single Node

  • No load balancing: All traffic to 10.10.10.103
  • No failover: Single point of failure
  • Scaling: Vertical only (more resources to LXC)

Future: Multi-Node with HAProxy

Internet → Cloudflare
HAProxy (10.10.10.10)
Round-robin distribution:
  ├─→ Aegis-01 (10.10.10.103)
  └─→ Aegis-02 (10.10.10.104)

Health Checks: - HTTP GET /health every 5 seconds - Remove node if 3 consecutive failures - Re-add node after 2 consecutive successes

Network Security

DDoS Protection

Layer 3/4 (Network/Transport): - Hetzner upstream filtering (100 Gbit/s capacity) - Synproxy for SYN flood protection - Connection rate limiting (iptables)

Layer 7 (Application): - Traefik rate limiting middleware - API rate limits per customer (Redis-backed) - Challenge-response for suspicious traffic (future)

Intrusion Detection

fail2ban Configuration:

[sshd]
enabled = true
bantime = 3600  # 1 hour
maxretry = 3

[traefik-auth]
enabled = true
bantime = 7200  # 2 hours
maxretry = 5

Banned IPs: Stored in ~/.claude/banned_ips.txt Auto-unban: After ban duration expires

VPN Access (Tailscale)

Tailnet: rickoslyder@ Access Control: - SSH: Only from Tailscale network (100.0.0.0/8) - Admin endpoints: Require Tailscale IP - MagicDNS: Enabled (aegis.tail-scale.ts.net)

Use Cases: - Remote SSH when traveling - Database access (psql over Tailscale) - Development environment access

Network Monitoring

Traffic Analysis

Tools: - iftop: Real-time bandwidth monitoring - nethogs: Per-process network usage - tcpdump: Packet capture for debugging

Metrics Collected: - Bytes in/out per interface - Packet loss rate - TCP retransmissions - DNS query times

Health Checks

External Monitoring (via Uptime Kuma): - https://aegisagent.ai/health (every 60 seconds) - https://intel.aegisagent.ai/health (every 60 seconds) - https://notebooks.aegisagent.ai/ (every 300 seconds)

Internal Monitoring (via Prometheus): - Traefik metrics (requests, errors, latency) - Docker network metrics - PostgreSQL connection pool usage

Alerts

Trigger Conditions: - Service down for > 2 minutes - HTTP 5xx rate > 10% for 5 minutes - Network latency > 500ms for 10 minutes - Bandwidth saturation > 80% for 15 minutes

Notification Targets: - Discord #alerts channel - Telegram bot - WhatsApp (for critical alerts only)

Troubleshooting

Common Issues

"Connection Refused" on aegisagent.ai

Diagnosis:

# Check if Traefik is running
docker ps | grep traefik

# Check Traefik logs
docker logs traefik -f

# Verify port 443 is listening
netstat -tlnp | grep :443

Possible Causes: - Traefik container stopped - Certificate renewal failed - Dockerhost passthrough misconfigured

Services Unreachable from Containers

Diagnosis:

# From inside container
ping host.docker.internal
curl http://host.docker.internal:5432

# Check if ports are listening on host
netstat -tlnp | grep -E "5432|6379|11434"

Possible Causes: - PostgreSQL not bound to 0.0.0.0 - Firewall blocking internal connections - Wrong extra_hosts configuration

Let's Encrypt Rate Limit

Error: "too many certificates already issued"

Diagnosis:

# Check certificate status
docker exec traefik cat /letsencrypt/acme.json | jq '.cf.Certificates[] | {domain: .domain.main, expiry: .certificate}'

Solution: - Wait 7 days for rate limit reset - Use staging server for testing: --certificatesresolvers.cf.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory

Network Performance Optimization

Current Optimizations

  1. TCP BBR Congestion Control:

    sysctl net.ipv4.tcp_congestion_control=bbr
    sysctl net.core.default_qdisc=fq
    

  2. Increased Connection Limits:

    sysctl net.core.somaxconn=65535
    sysctl net.ipv4.tcp_max_syn_backlog=8096
    

  3. Docker Network MTU: 1500 bytes (standard Ethernet)

  4. Traefik Connection Pooling: Keep-alive enabled

Future Optimizations

  • HTTP/3 (QUIC) support in Traefik
  • WebSocket connection pooling
  • gRPC for internal service communication
  • Service mesh (Istio/Linkerd) for mTLS

Network Diagram (Detailed)

┌─────────────────────────────────────────────────────────────┐
│                         INTERNET                            │
└─────────────────────────┬───────────────────────────────────┘
                  ┌───────▼────────┐
                  │  Cloudflare    │
                  │  DNS (10min TTL│
                  └───────┬────────┘
                157.180.63.15:443 (TLS)
┌─────────────────────────▼───────────────────────────────────┐
│                    PROXMOX HOST                             │
│  ┌──────────────────────────────────────────────────────┐   │
│  │           Port Forward: 443 → 10.10.10.10:443        │   │
│  └──────────────────────┬───────────────────────────────┘   │
└─────────────────────────┼───────────────────────────────────┘
                  ┌───────▼────────┐
                  │  Dockerhost    │
                  │  10.10.10.10   │
                  │  Traefik       │
                  │  (Passthrough) │
                  └───────┬────────┘
                  TCP Passthrough (TLS intact)
┌─────────────────────────▼───────────────────────────────────┐
│                    AEGIS LXC (10.10.10.103)                 │
│  ┌──────────────────────────────────────────────────────┐   │
│  │            Traefik (TLS Termination)                 │   │
│  │  ┌────────────────────────────────────────────────┐  │   │
│  │  │  Router: Host(`aegisagent.ai`)                 │  │   │
│  │  │  TLS: Let's Encrypt (DNS Challenge)            │  │   │
│  │  └────────┬───────────────────────────────────────┘  │   │
│  └───────────┼──────────────────────────────────────────┘   │
│              │                                              │
│    ┌─────────┴─────────┬──────────┬────────────┐           │
│    │                   │          │            │           │
│  ┌─▼──────┐  ┌────────▼────┐  ┌──▼──────┐  ┌──▼──────┐    │
│  │Dashboard│  │   VS Code   │  │ Notebook│  │  Intel  │    │
│  │  :8080  │  │    :8443    │  │  :8502  │  │  :8080  │    │
│  └─────┬───┘  └─────────────┘  └─────────┘  └─────────┘    │
│        │                                                    │
│        │ host.docker.internal (10.10.10.103)               │
│        │                                                    │
│  ┌─────▼───────────┬───────────────┬───────────────┐       │
│  │                 │               │               │       │
│ ┌▼──────────┐ ┌────▼────────┐ ┌───▼────────┐ ┌────▼─────┐ │
│ │ PostgreSQL│ │  FalkorDB   │ │   Ollama   │ │  Redis   │ │
│ │   :5432   │ │    :6379    │ │   :11434   │ │  :6380   │ │
│ └───────────┘ └─────────────┘ └────────────┘ └──────────┘ │
│                                                              │
│  Docker Networks:                                           │
│  - traefik_proxy (172.19.0.0/16)                            │
│  - docker0 (172.17.0.0/16)                                  │
│                                                              │
│  Host Network:                                              │
│  - eth0: 10.10.10.103/24                                    │
│  - tailscale0: 100.114.189.93/32                            │
└─────────────────────────────────────────────────────────────┘

Bandwidth Usage

Typical Traffic Patterns

Service Ingress Egress Daily Total
Dashboard 50 MB 200 MB 250 MB
VS Code 10 MB 30 MB 40 MB
Open Notebook 100 MB 500 MB 600 MB
API Requests 200 MB 800 MB 1 GB
Ollama (internal) - - -
Total ~360 MB ~1.5 GB ~2 GB/day

Monthly Bandwidth: ~60 GB/month (well under 20TB limit)