Skip to content

Disaster Recovery Overview

This section contains everything needed to recover the Aegis system from complete loss.

Recovery Objectives

Metric Target Notes
RTO (Recovery Time Objective) 2 hours From backup to operational
RPO (Recovery Point Objective) 24 hours Maximum data loss

Critical Systems Priority

  1. PostgreSQL Database - Core data store (137 tables)
  2. Docker Services - Dashboard, Scheduler, FalkorDB
  3. Configuration - MCP servers, credentials, settings
  4. Memory - Journals, semantic knowledge, episodic logs

Recovery Scenarios

Complete System Loss

Full restoration from backup: 1. Provision new LXC container 2. Install base packages 3. Restore PostgreSQL database 4. Restore configuration files 5. Start Docker services 6. Verify integrations

Time: ~2 hours | See: runbook.md

Database Corruption

Restore PostgreSQL from backup: 1. Stop services 2. Restore from pg_dump 3. Run migrations 4. Verify data integrity 5. Restart services

Time: ~30 minutes | See: restoration.md

Container Failures

Rebuild and restart containers: 1. Check logs for root cause 2. Rebuild images if needed 3. Restart containers in order 4. Verify health checks

Time: ~10 minutes | See: troubleshooting

Configuration Loss

Restore from backups: 1. Restore ~/.secure/ from backup 2. Restore ~/.claude.json 3. Restore ~/.claude/settings.json 4. Restart Claude Code

Time: ~15 minutes | See: restoration.md

Backup Locations

What Source Backup Location
PostgreSQL localhost:5432/aegis Daily pg_dump
Memory ~/memory/ Git repo
Configuration ~/.secure/, ~/.claude/ Encrypted backup
Source Code ~/projects/aegis-core/ GitHub
Docker Volumes falkordb_data Volume backup

Emergency Contacts

Channel Contact Purpose
Discord Guild 1454722052777836546, #alerts Primary alerts
WhatsApp +447441443388 Mobile access
Telegram Chat ID 1275129801 Time-sensitive
Email aegis@richardbankole.com Async

Documentation Index

Quick Recovery Checklist

□ Assess damage (what's lost/corrupted)
□ Identify most recent backup
□ Provision infrastructure (if needed)
□ Restore database
□ Restore configuration
□ Start services in order
□ Verify health checks
□ Test integrations
□ Resume operations
□ Document incident

Last Updated: 2026-01-25