Backups¶
Comprehensive guide to Aegis backup procedures, recovery testing, and disaster recovery.
Backup Strategy¶
What Gets Backed Up¶
| Component | Location | Frequency | Retention | Method |
|---|---|---|---|---|
| PostgreSQL Database | localhost:5432 | Daily | 30 days | pg_dump |
| Knowledge Graph (FalkorDB) | localhost:6379 | Daily | 30 days | RDB snapshots |
| Memory Files | ~/memory/ | Daily | 30 days | Tar archive |
| Configuration | ~/.secure/, ~/.claude.json | Weekly | 90 days | Tar archive |
| Application Code | ~/projects/aegis-core | On commit | Git history | Git |
| Crontab | crontab -l | Weekly | 30 days | Text file |
| Docker Volumes | falkordb_data | Daily | 30 days | Docker volume backup |
Backup Locations¶
Primary: /home/agent/logs/backup/
Off-Site (recommended): Set up one of: - S3: Encrypted backups to AWS S3 - Backblaze B2: Cost-effective object storage - Rsync: To remote server (e.g., Dockerhost)
PostgreSQL Backups¶
Manual Database Backup¶
Full Database Dump:
# Dump entire database
pg_dump -U agent -d aegis -F c -f /home/agent/logs/backup/aegis_$(date +%Y%m%d_%H%M%S).dump
# Dump with compression
pg_dump -U agent -d aegis -F c -Z 9 -f /home/agent/logs/backup/aegis_$(date +%Y%m%d).dump
# Plain SQL format (human-readable)
pg_dump -U agent -d aegis > /home/agent/logs/backup/aegis_$(date +%Y%m%d).sql
# Dump specific tables
pg_dump -U agent -d aegis -t episodic_memory -t semantic_memory > /home/agent/logs/backup/memory_$(date +%Y%m%d).sql
Database Backup with Rotation:
#!/bin/bash
# /home/agent/scripts/backup_postgres.sh
BACKUP_DIR="/home/agent/logs/backup"
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="$BACKUP_DIR/aegis_$DATE.dump"
RETENTION_DAYS=30
# Create backup directory if it doesn't exist
mkdir -p "$BACKUP_DIR"
# Dump database
pg_dump -U agent -d aegis -F c -Z 9 -f "$BACKUP_FILE"
# Check if backup succeeded
if [ $? -eq 0 ]; then
echo "✓ Database backup completed: $BACKUP_FILE"
echo "Backup size: $(du -h $BACKUP_FILE | cut -f1)"
else
echo "✗ Database backup failed!"
exit 1
fi
# Clean up old backups (>30 days)
find "$BACKUP_DIR" -name "aegis_*.dump" -mtime +$RETENTION_DAYS -delete
echo "Backup retention: keeping last $RETENTION_DAYS days"
Make Script Executable:
Automated Database Backups¶
Add to Crontab:
# Edit crontab
crontab -e
# Add daily backup at 01:00 AM UTC (during maintenance window)
0 1 * * * /home/agent/scripts/backup_postgres.sh >> /home/agent/logs/backup/postgres-backup.log 2>&1
Verify Cron Job:
# List cron jobs
crontab -l | grep backup
# Test backup script manually
/home/agent/scripts/backup_postgres.sh
# Check backup logs
tail -50 /home/agent/logs/backup/postgres-backup.log
Database Restore¶
Restore from Custom Format Dump:
# Drop existing database (CAREFUL!)
dropdb -U agent aegis
# Create new database
createdb -U agent aegis
# Restore from backup
pg_restore -U agent -d aegis /home/agent/logs/backup/aegis_20260125.dump
# Or restore specific tables
pg_restore -U agent -d aegis -t episodic_memory /home/agent/logs/backup/aegis_20260125.dump
Restore from SQL Dump:
# Restore entire database
psql -U agent -d aegis < /home/agent/logs/backup/aegis_20260125.sql
# Restore specific tables (if exported separately)
psql -U agent -d aegis < /home/agent/logs/backup/memory_20260125.sql
Verify Restore:
# Check row counts
psql -U agent -d aegis -c "SELECT
(SELECT count(*) FROM episodic_memory) as episodic,
(SELECT count(*) FROM semantic_memory) as semantic,
(SELECT count(*) FROM workflow_runs) as workflows;"
# Check recent data
psql -U agent -d aegis -c "SELECT * FROM episodic_memory ORDER BY timestamp DESC LIMIT 5;"
FalkorDB (Knowledge Graph) Backups¶
Manual FalkorDB Backup¶
RDB Snapshot (recommended):
# Trigger manual snapshot
docker exec falkordb redis-cli BGSAVE
# Wait for save to complete
docker exec falkordb redis-cli LASTSAVE
# Copy RDB file from container
docker cp falkordb:/data/dump.rdb /home/agent/logs/backup/falkordb_$(date +%Y%m%d).rdb
Alternative: Redis SAVE Command:
# Synchronous save (blocks Redis)
docker exec falkordb redis-cli SAVE
# Copy backup
docker cp falkordb:/data/dump.rdb /home/agent/logs/backup/falkordb_$(date +%Y%m%d).rdb
Automated FalkorDB Backups¶
Backup Script:
#!/bin/bash
# /home/agent/scripts/backup_falkordb.sh
BACKUP_DIR="/home/agent/logs/backup"
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="$BACKUP_DIR/falkordb_$DATE.rdb"
RETENTION_DAYS=30
# Create backup directory
mkdir -p "$BACKUP_DIR"
# Trigger background save
docker exec falkordb redis-cli BGSAVE
# Wait for save to complete (check every 5 seconds, max 60 seconds)
for i in {1..12}; do
SAVE_STATUS=$(docker exec falkordb redis-cli LASTSAVE)
sleep 5
NEW_STATUS=$(docker exec falkordb redis-cli LASTSAVE)
if [ "$NEW_STATUS" != "$SAVE_STATUS" ]; then
echo "✓ FalkorDB snapshot completed"
break
fi
done
# Copy RDB file
docker cp falkordb:/data/dump.rdb "$BACKUP_FILE"
if [ $? -eq 0 ]; then
echo "✓ FalkorDB backup completed: $BACKUP_FILE"
echo "Backup size: $(du -h $BACKUP_FILE | cut -f1)"
else
echo "✗ FalkorDB backup failed!"
exit 1
fi
# Compress backup
gzip "$BACKUP_FILE"
# Clean up old backups
find "$BACKUP_DIR" -name "falkordb_*.rdb.gz" -mtime +$RETENTION_DAYS -delete
echo "Backup retention: keeping last $RETENTION_DAYS days"
Add to Crontab:
# Daily backup at 01:30 AM UTC
30 1 * * * /home/agent/scripts/backup_falkordb.sh >> /home/agent/logs/backup/falkordb-backup.log 2>&1
FalkorDB Restore¶
Restore from RDB File:
# Stop FalkorDB container
cd /home/agent/projects/aegis-core && docker compose stop falkordb
# Uncompress backup if needed
gunzip /home/agent/logs/backup/falkordb_20260125.rdb.gz
# Copy backup to volume
docker cp /home/agent/logs/backup/falkordb_20260125.rdb falkordb:/data/dump.rdb
# Restart container (loads dump.rdb automatically)
cd /home/agent/projects/aegis-core && docker compose start falkordb
# Verify data
docker exec falkordb redis-cli DBSIZE
Memory Files Backup¶
Manual Memory Backup¶
Full Memory Archive:
# Create tar archive of all memory
tar -czf /home/agent/logs/backup/memory_$(date +%Y%m%d).tar.gz \
-C /home/agent memory/
# Verify archive
tar -tzf /home/agent/logs/backup/memory_$(date +%Y%m%d).tar.gz | head -20
Selective Memory Backup:
# Backup specific directories
tar -czf /home/agent/logs/backup/memory_semantic_$(date +%Y%m%d).tar.gz \
-C /home/agent memory/semantic/
tar -czf /home/agent/logs/backup/memory_journal_$(date +%Y%m%d).tar.gz \
-C /home/agent memory/journal/
Automated Memory Backups¶
Backup Script:
#!/bin/bash
# /home/agent/scripts/backup_memory.sh
BACKUP_DIR="/home/agent/logs/backup"
DATE=$(date +%Y%m%d_%H%M%S)
RETENTION_DAYS=30
mkdir -p "$BACKUP_DIR"
# Backup memory directory
tar -czf "$BACKUP_DIR/memory_$DATE.tar.gz" -C /home/agent memory/
if [ $? -eq 0 ]; then
echo "✓ Memory backup completed: $BACKUP_DIR/memory_$DATE.tar.gz"
echo "Backup size: $(du -h $BACKUP_DIR/memory_$DATE.tar.gz | cut -f1)"
else
echo "✗ Memory backup failed!"
exit 1
fi
# Clean up old backups
find "$BACKUP_DIR" -name "memory_*.tar.gz" -mtime +$RETENTION_DAYS -delete
echo "Backup retention: keeping last $RETENTION_DAYS days"
Add to Crontab:
# Daily backup at 02:00 AM UTC
0 2 * * * /home/agent/scripts/backup_memory.sh >> /home/agent/logs/backup/memory-backup.log 2>&1
Memory Restore¶
Restore Full Memory:
# Backup current memory first
mv /home/agent/memory /home/agent/memory.bak
# Extract backup
tar -xzf /home/agent/logs/backup/memory_20260125.tar.gz -C /home/agent
# Verify restore
ls -la /home/agent/memory/
Configuration Backups¶
Manual Configuration Backup¶
Backup Secure Credentials:
# Backup .secure directory
tar -czf /home/agent/logs/backup/secure_$(date +%Y%m%d).tar.gz \
-C /home/agent .secure/
# Backup Claude config
cp ~/.claude.json /home/agent/logs/backup/claude.json.$(date +%Y%m%d)
# Backup SSH keys
tar -czf /home/agent/logs/backup/ssh_$(date +%Y%m%d).tar.gz \
-C /home/agent .ssh/
Backup Crontab:
Automated Configuration Backups¶
Backup Script:
#!/bin/bash
# /home/agent/scripts/backup_config.sh
BACKUP_DIR="/home/agent/logs/backup"
DATE=$(date +%Y%m%d)
RETENTION_DAYS=90
mkdir -p "$BACKUP_DIR"
# Backup .secure directory
tar -czf "$BACKUP_DIR/secure_$DATE.tar.gz" -C /home/agent .secure/
# Backup Claude config
cp ~/.claude.json "$BACKUP_DIR/claude.json.$DATE"
# Backup crontab
crontab -l > "$BACKUP_DIR/crontab.$DATE.txt"
# Backup SSH keys
tar -czf "$BACKUP_DIR/ssh_$DATE.tar.gz" -C /home/agent .ssh/
echo "✓ Configuration backup completed"
# Clean up old backups (90 days retention)
find "$BACKUP_DIR" -name "secure_*.tar.gz" -mtime +$RETENTION_DAYS -delete
find "$BACKUP_DIR" -name "claude.json.*" -mtime +$RETENTION_DAYS -delete
find "$BACKUP_DIR" -name "crontab.*.txt" -mtime +$RETENTION_DAYS -delete
find "$BACKUP_DIR" -name "ssh_*.tar.gz" -mtime +$RETENTION_DAYS -delete
echo "Backup retention: keeping last $RETENTION_DAYS days"
Add to Crontab:
# Weekly backup on Sunday at 03:00 AM UTC
0 3 * * 0 /home/agent/scripts/backup_config.sh >> /home/agent/logs/backup/config-backup.log 2>&1
Docker Volume Backups¶
Manual Volume Backup¶
Backup FalkorDB Volume:
# Create backup of Docker volume
docker run --rm -v falkordb_data:/data -v /home/agent/logs/backup:/backup \
alpine tar -czf /backup/falkordb_volume_$(date +%Y%m%d).tar.gz -C /data .
# Verify backup
tar -tzf /home/agent/logs/backup/falkordb_volume_$(date +%Y%m%d).tar.gz | head -10
Volume Restore¶
Restore Docker Volume:
# Stop container
cd /home/agent/projects/aegis-core && docker compose stop falkordb
# Restore volume
docker run --rm -v falkordb_data:/data -v /home/agent/logs/backup:/backup \
alpine tar -xzf /backup/falkordb_volume_20260125.tar.gz -C /data
# Start container
cd /home/agent/projects/aegis-core && docker compose start falkordb
Off-Site Backups¶
Rsync to Remote Server¶
Sync to Dockerhost:
#!/bin/bash
# /home/agent/scripts/backup_offsite.sh
BACKUP_DIR="/home/agent/logs/backup"
REMOTE="dockerhost:/mnt/backups/aegis/"
# Sync backups to remote server
rsync -avz --delete \
--exclude="*.log" \
"$BACKUP_DIR/" "$REMOTE"
if [ $? -eq 0 ]; then
echo "✓ Off-site backup completed"
else
echo "✗ Off-site backup failed!"
exit 1
fi
Add to Crontab:
# Daily off-site sync at 04:00 AM UTC
0 4 * * * /home/agent/scripts/backup_offsite.sh >> /home/agent/logs/backup/offsite-backup.log 2>&1
S3 Backup (Optional)¶
Install AWS CLI:
S3 Sync Script:
#!/bin/bash
# /home/agent/scripts/backup_s3.sh
BACKUP_DIR="/home/agent/logs/backup"
S3_BUCKET="s3://aegis-backups/"
# Sync to S3 with encryption
aws s3 sync "$BACKUP_DIR" "$S3_BUCKET" \
--exclude "*.log" \
--storage-class STANDARD_IA \
--server-side-encryption AES256
if [ $? -eq 0 ]; then
echo "✓ S3 backup completed"
else
echo "✗ S3 backup failed!"
exit 1
fi
Recovery Testing¶
Test Database Restore¶
Monthly Recovery Test:
#!/bin/bash
# /home/agent/scripts/test_recovery.sh
echo "=== Testing PostgreSQL Restore ==="
# Use most recent backup
LATEST_BACKUP=$(ls -t /home/agent/logs/backup/aegis_*.dump | head -1)
echo "Testing backup: $LATEST_BACKUP"
# Create test database
createdb -U agent aegis_test
# Restore to test database
pg_restore -U agent -d aegis_test "$LATEST_BACKUP"
if [ $? -eq 0 ]; then
# Verify data
ROWS=$(psql -U agent -d aegis_test -t -c "SELECT count(*) FROM episodic_memory")
echo "✓ Restore successful. Rows in episodic_memory: $ROWS"
# Clean up
dropdb -U agent aegis_test
else
echo "✗ Restore failed!"
exit 1
fi
Schedule Recovery Test:
# Monthly on the 1st at 05:00 AM UTC
0 5 1 * * /home/agent/scripts/test_recovery.sh >> /home/agent/logs/backup/recovery-test.log 2>&1
Verify Backup Integrity¶
Check Backup Files:
# List recent backups with sizes
ls -lh /home/agent/logs/backup/*.dump | tail -7
# Verify tar archives
for file in /home/agent/logs/backup/*.tar.gz; do
tar -tzf "$file" > /dev/null 2>&1 && echo "✓ $file" || echo "✗ $file CORRUPTED"
done
# Check database dump integrity
pg_restore --list /home/agent/logs/backup/aegis_20260125.dump > /dev/null && echo "✓ Valid dump" || echo "✗ Corrupted dump"
Disaster Recovery¶
Full System Restore¶
Recovery Steps:
-
Install Dependencies:
-
Restore Configuration:
-
Restore Database:
-
Restore FalkorDB:
-
Restore Memory:
-
Restore Application:
-
Verify Recovery:
Backup Monitoring¶
Check Backup Status¶
Daily Backup Verification:
#!/bin/bash
# /home/agent/scripts/check_backups.sh
# Check if backups ran today
TODAY=$(date +%Y%m%d)
# Check PostgreSQL backup
if ls /home/agent/logs/backup/aegis_$TODAY*.dump 1> /dev/null 2>&1; then
echo "✓ PostgreSQL backup found for today"
else
echo "✗ PostgreSQL backup MISSING for today"
# Alert to Discord
fi
# Check FalkorDB backup
if ls /home/agent/logs/backup/falkordb_$TODAY*.rdb* 1> /dev/null 2>&1; then
echo "✓ FalkorDB backup found for today"
else
echo "✗ FalkorDB backup MISSING for today"
fi
# Check memory backup
if ls /home/agent/logs/backup/memory_$TODAY*.tar.gz 1> /dev/null 2>&1; then
echo "✓ Memory backup found for today"
else
echo "✗ Memory backup MISSING for today"
fi
# Check disk space
DISK_USAGE=$(df -h /home/agent/logs/backup | awk 'NR==2{print $5}' | sed 's/%//')
if [ $DISK_USAGE -gt 80 ]; then
echo "⚠️ Backup disk usage high: $DISK_USAGE%"
fi
Add to Morning Routine:
# Run during morning check at 06:30 AM UTC
30 6 * * * /home/agent/scripts/check_backups.sh >> /home/agent/logs/backup/backup-status.log 2>&1
Best Practices¶
- 3-2-1 Rule: 3 copies, 2 different media types, 1 off-site
- Test Restores: Test recovery monthly, don't assume backups work
- Automate Everything: Use cron for all backup tasks
- Monitor Backup Success: Alert if backups fail or missing
- Retention Policy: Keep 30 days local, 90 days off-site
- Encrypt Off-Site: Use encryption for all off-site backups
- Document Procedures: Keep this doc updated with any changes
- Version Configuration: Store config backups with application version
- Quick Access: Keep last 7 days of backups for fast recovery
- Disk Space: Monitor backup directory, clean up old backups
Related Documentation¶
- Monitoring - Health checks and alerting
- Maintenance - Routine maintenance tasks
- Troubleshooting - Common issues and solutions