Disaster Recovery Overview¶
This section contains everything needed to recover the Aegis system from complete loss.
Recovery Objectives¶
| Metric | Target | Notes |
|---|---|---|
| RTO (Recovery Time Objective) | 2 hours | From backup to operational |
| RPO (Recovery Point Objective) | 24 hours | Maximum data loss |
Critical Systems Priority¶
- PostgreSQL Database - Core data store (137 tables)
- Docker Services - Dashboard, Scheduler, FalkorDB
- Configuration - MCP servers, credentials, settings
- Memory - Journals, semantic knowledge, episodic logs
Recovery Scenarios¶
Complete System Loss¶
Full restoration from backup: 1. Provision new LXC container 2. Install base packages 3. Restore PostgreSQL database 4. Restore configuration files 5. Start Docker services 6. Verify integrations
Time: ~2 hours | See: runbook.md
Database Corruption¶
Restore PostgreSQL from backup: 1. Stop services 2. Restore from pg_dump 3. Run migrations 4. Verify data integrity 5. Restart services
Time: ~30 minutes | See: restoration.md
Container Failures¶
Rebuild and restart containers: 1. Check logs for root cause 2. Rebuild images if needed 3. Restart containers in order 4. Verify health checks
Time: ~10 minutes | See: troubleshooting
Configuration Loss¶
Restore from backups:
1. Restore ~/.secure/ from backup
2. Restore ~/.claude.json
3. Restore ~/.claude/settings.json
4. Restart Claude Code
Time: ~15 minutes | See: restoration.md
Backup Locations¶
| What | Source | Backup Location |
|---|---|---|
| PostgreSQL | localhost:5432/aegis | Daily pg_dump |
| Memory | ~/memory/ | Git repo |
| Configuration | ~/.secure/, ~/.claude/ | Encrypted backup |
| Source Code | ~/projects/aegis-core/ | GitHub |
| Docker Volumes | falkordb_data | Volume backup |
Emergency Contacts¶
| Channel | Contact | Purpose |
|---|---|---|
| Discord | Guild 1454722052777836546, #alerts | Primary alerts |
| +447441443388 | Mobile access | |
| Telegram | Chat ID 1275129801 | Time-sensitive |
| aegis@richardbankole.com | Async |
Documentation Index¶
- backup.md - Backup procedures and schedules
- restoration.md - Step-by-step restoration
- runbook.md - Emergency procedures
Quick Recovery Checklist¶
□ Assess damage (what's lost/corrupted)
□ Identify most recent backup
□ Provision infrastructure (if needed)
□ Restore database
□ Restore configuration
□ Start services in order
□ Verify health checks
□ Test integrations
□ Resume operations
□ Document incident
Last Updated: 2026-01-25