Skip to content

Backups

Comprehensive guide to Aegis backup procedures, recovery testing, and disaster recovery.

Backup Strategy

What Gets Backed Up

Component Location Frequency Retention Method
PostgreSQL Database localhost:5432 Daily 30 days pg_dump
Knowledge Graph (FalkorDB) localhost:6379 Daily 30 days RDB snapshots
Memory Files ~/memory/ Daily 30 days Tar archive
Configuration ~/.secure/, ~/.claude.json Weekly 90 days Tar archive
Application Code ~/projects/aegis-core On commit Git history Git
Crontab crontab -l Weekly 30 days Text file
Docker Volumes falkordb_data Daily 30 days Docker volume backup

Backup Locations

Primary: /home/agent/logs/backup/

Off-Site (recommended): Set up one of: - S3: Encrypted backups to AWS S3 - Backblaze B2: Cost-effective object storage - Rsync: To remote server (e.g., Dockerhost)

PostgreSQL Backups

Manual Database Backup

Full Database Dump:

# Dump entire database
pg_dump -U agent -d aegis -F c -f /home/agent/logs/backup/aegis_$(date +%Y%m%d_%H%M%S).dump

# Dump with compression
pg_dump -U agent -d aegis -F c -Z 9 -f /home/agent/logs/backup/aegis_$(date +%Y%m%d).dump

# Plain SQL format (human-readable)
pg_dump -U agent -d aegis > /home/agent/logs/backup/aegis_$(date +%Y%m%d).sql

# Dump specific tables
pg_dump -U agent -d aegis -t episodic_memory -t semantic_memory > /home/agent/logs/backup/memory_$(date +%Y%m%d).sql

Database Backup with Rotation:

#!/bin/bash
# /home/agent/scripts/backup_postgres.sh

BACKUP_DIR="/home/agent/logs/backup"
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="$BACKUP_DIR/aegis_$DATE.dump"
RETENTION_DAYS=30

# Create backup directory if it doesn't exist
mkdir -p "$BACKUP_DIR"

# Dump database
pg_dump -U agent -d aegis -F c -Z 9 -f "$BACKUP_FILE"

# Check if backup succeeded
if [ $? -eq 0 ]; then
    echo "✓ Database backup completed: $BACKUP_FILE"
    echo "Backup size: $(du -h $BACKUP_FILE | cut -f1)"
else
    echo "✗ Database backup failed!"
    exit 1
fi

# Clean up old backups (>30 days)
find "$BACKUP_DIR" -name "aegis_*.dump" -mtime +$RETENTION_DAYS -delete

echo "Backup retention: keeping last $RETENTION_DAYS days"

Make Script Executable:

chmod +x /home/agent/scripts/backup_postgres.sh

Automated Database Backups

Add to Crontab:

# Edit crontab
crontab -e

# Add daily backup at 01:00 AM UTC (during maintenance window)
0 1 * * * /home/agent/scripts/backup_postgres.sh >> /home/agent/logs/backup/postgres-backup.log 2>&1

Verify Cron Job:

# List cron jobs
crontab -l | grep backup

# Test backup script manually
/home/agent/scripts/backup_postgres.sh

# Check backup logs
tail -50 /home/agent/logs/backup/postgres-backup.log

Database Restore

Restore from Custom Format Dump:

# Drop existing database (CAREFUL!)
dropdb -U agent aegis

# Create new database
createdb -U agent aegis

# Restore from backup
pg_restore -U agent -d aegis /home/agent/logs/backup/aegis_20260125.dump

# Or restore specific tables
pg_restore -U agent -d aegis -t episodic_memory /home/agent/logs/backup/aegis_20260125.dump

Restore from SQL Dump:

# Restore entire database
psql -U agent -d aegis < /home/agent/logs/backup/aegis_20260125.sql

# Restore specific tables (if exported separately)
psql -U agent -d aegis < /home/agent/logs/backup/memory_20260125.sql

Verify Restore:

# Check row counts
psql -U agent -d aegis -c "SELECT
  (SELECT count(*) FROM episodic_memory) as episodic,
  (SELECT count(*) FROM semantic_memory) as semantic,
  (SELECT count(*) FROM workflow_runs) as workflows;"

# Check recent data
psql -U agent -d aegis -c "SELECT * FROM episodic_memory ORDER BY timestamp DESC LIMIT 5;"

FalkorDB (Knowledge Graph) Backups

Manual FalkorDB Backup

RDB Snapshot (recommended):

# Trigger manual snapshot
docker exec falkordb redis-cli BGSAVE

# Wait for save to complete
docker exec falkordb redis-cli LASTSAVE

# Copy RDB file from container
docker cp falkordb:/data/dump.rdb /home/agent/logs/backup/falkordb_$(date +%Y%m%d).rdb

Alternative: Redis SAVE Command:

# Synchronous save (blocks Redis)
docker exec falkordb redis-cli SAVE

# Copy backup
docker cp falkordb:/data/dump.rdb /home/agent/logs/backup/falkordb_$(date +%Y%m%d).rdb

Automated FalkorDB Backups

Backup Script:

#!/bin/bash
# /home/agent/scripts/backup_falkordb.sh

BACKUP_DIR="/home/agent/logs/backup"
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="$BACKUP_DIR/falkordb_$DATE.rdb"
RETENTION_DAYS=30

# Create backup directory
mkdir -p "$BACKUP_DIR"

# Trigger background save
docker exec falkordb redis-cli BGSAVE

# Wait for save to complete (check every 5 seconds, max 60 seconds)
for i in {1..12}; do
    SAVE_STATUS=$(docker exec falkordb redis-cli LASTSAVE)
    sleep 5
    NEW_STATUS=$(docker exec falkordb redis-cli LASTSAVE)
    if [ "$NEW_STATUS" != "$SAVE_STATUS" ]; then
        echo "✓ FalkorDB snapshot completed"
        break
    fi
done

# Copy RDB file
docker cp falkordb:/data/dump.rdb "$BACKUP_FILE"

if [ $? -eq 0 ]; then
    echo "✓ FalkorDB backup completed: $BACKUP_FILE"
    echo "Backup size: $(du -h $BACKUP_FILE | cut -f1)"
else
    echo "✗ FalkorDB backup failed!"
    exit 1
fi

# Compress backup
gzip "$BACKUP_FILE"

# Clean up old backups
find "$BACKUP_DIR" -name "falkordb_*.rdb.gz" -mtime +$RETENTION_DAYS -delete

echo "Backup retention: keeping last $RETENTION_DAYS days"

Add to Crontab:

# Daily backup at 01:30 AM UTC
30 1 * * * /home/agent/scripts/backup_falkordb.sh >> /home/agent/logs/backup/falkordb-backup.log 2>&1

FalkorDB Restore

Restore from RDB File:

# Stop FalkorDB container
cd /home/agent/projects/aegis-core && docker compose stop falkordb

# Uncompress backup if needed
gunzip /home/agent/logs/backup/falkordb_20260125.rdb.gz

# Copy backup to volume
docker cp /home/agent/logs/backup/falkordb_20260125.rdb falkordb:/data/dump.rdb

# Restart container (loads dump.rdb automatically)
cd /home/agent/projects/aegis-core && docker compose start falkordb

# Verify data
docker exec falkordb redis-cli DBSIZE

Memory Files Backup

Manual Memory Backup

Full Memory Archive:

# Create tar archive of all memory
tar -czf /home/agent/logs/backup/memory_$(date +%Y%m%d).tar.gz \
  -C /home/agent memory/

# Verify archive
tar -tzf /home/agent/logs/backup/memory_$(date +%Y%m%d).tar.gz | head -20

Selective Memory Backup:

# Backup specific directories
tar -czf /home/agent/logs/backup/memory_semantic_$(date +%Y%m%d).tar.gz \
  -C /home/agent memory/semantic/

tar -czf /home/agent/logs/backup/memory_journal_$(date +%Y%m%d).tar.gz \
  -C /home/agent memory/journal/

Automated Memory Backups

Backup Script:

#!/bin/bash
# /home/agent/scripts/backup_memory.sh

BACKUP_DIR="/home/agent/logs/backup"
DATE=$(date +%Y%m%d_%H%M%S)
RETENTION_DAYS=30

mkdir -p "$BACKUP_DIR"

# Backup memory directory
tar -czf "$BACKUP_DIR/memory_$DATE.tar.gz" -C /home/agent memory/

if [ $? -eq 0 ]; then
    echo "✓ Memory backup completed: $BACKUP_DIR/memory_$DATE.tar.gz"
    echo "Backup size: $(du -h $BACKUP_DIR/memory_$DATE.tar.gz | cut -f1)"
else
    echo "✗ Memory backup failed!"
    exit 1
fi

# Clean up old backups
find "$BACKUP_DIR" -name "memory_*.tar.gz" -mtime +$RETENTION_DAYS -delete

echo "Backup retention: keeping last $RETENTION_DAYS days"

Add to Crontab:

# Daily backup at 02:00 AM UTC
0 2 * * * /home/agent/scripts/backup_memory.sh >> /home/agent/logs/backup/memory-backup.log 2>&1

Memory Restore

Restore Full Memory:

# Backup current memory first
mv /home/agent/memory /home/agent/memory.bak

# Extract backup
tar -xzf /home/agent/logs/backup/memory_20260125.tar.gz -C /home/agent

# Verify restore
ls -la /home/agent/memory/

Configuration Backups

Manual Configuration Backup

Backup Secure Credentials:

# Backup .secure directory
tar -czf /home/agent/logs/backup/secure_$(date +%Y%m%d).tar.gz \
  -C /home/agent .secure/

# Backup Claude config
cp ~/.claude.json /home/agent/logs/backup/claude.json.$(date +%Y%m%d)

# Backup SSH keys
tar -czf /home/agent/logs/backup/ssh_$(date +%Y%m%d).tar.gz \
  -C /home/agent .ssh/

Backup Crontab:

# Export crontab
crontab -l > /home/agent/logs/backup/crontab.$(date +%Y%m%d).txt

Automated Configuration Backups

Backup Script:

#!/bin/bash
# /home/agent/scripts/backup_config.sh

BACKUP_DIR="/home/agent/logs/backup"
DATE=$(date +%Y%m%d)
RETENTION_DAYS=90

mkdir -p "$BACKUP_DIR"

# Backup .secure directory
tar -czf "$BACKUP_DIR/secure_$DATE.tar.gz" -C /home/agent .secure/

# Backup Claude config
cp ~/.claude.json "$BACKUP_DIR/claude.json.$DATE"

# Backup crontab
crontab -l > "$BACKUP_DIR/crontab.$DATE.txt"

# Backup SSH keys
tar -czf "$BACKUP_DIR/ssh_$DATE.tar.gz" -C /home/agent .ssh/

echo "✓ Configuration backup completed"

# Clean up old backups (90 days retention)
find "$BACKUP_DIR" -name "secure_*.tar.gz" -mtime +$RETENTION_DAYS -delete
find "$BACKUP_DIR" -name "claude.json.*" -mtime +$RETENTION_DAYS -delete
find "$BACKUP_DIR" -name "crontab.*.txt" -mtime +$RETENTION_DAYS -delete
find "$BACKUP_DIR" -name "ssh_*.tar.gz" -mtime +$RETENTION_DAYS -delete

echo "Backup retention: keeping last $RETENTION_DAYS days"

Add to Crontab:

# Weekly backup on Sunday at 03:00 AM UTC
0 3 * * 0 /home/agent/scripts/backup_config.sh >> /home/agent/logs/backup/config-backup.log 2>&1

Docker Volume Backups

Manual Volume Backup

Backup FalkorDB Volume:

# Create backup of Docker volume
docker run --rm -v falkordb_data:/data -v /home/agent/logs/backup:/backup \
  alpine tar -czf /backup/falkordb_volume_$(date +%Y%m%d).tar.gz -C /data .

# Verify backup
tar -tzf /home/agent/logs/backup/falkordb_volume_$(date +%Y%m%d).tar.gz | head -10

Volume Restore

Restore Docker Volume:

# Stop container
cd /home/agent/projects/aegis-core && docker compose stop falkordb

# Restore volume
docker run --rm -v falkordb_data:/data -v /home/agent/logs/backup:/backup \
  alpine tar -xzf /backup/falkordb_volume_20260125.tar.gz -C /data

# Start container
cd /home/agent/projects/aegis-core && docker compose start falkordb

Off-Site Backups

Rsync to Remote Server

Sync to Dockerhost:

#!/bin/bash
# /home/agent/scripts/backup_offsite.sh

BACKUP_DIR="/home/agent/logs/backup"
REMOTE="dockerhost:/mnt/backups/aegis/"

# Sync backups to remote server
rsync -avz --delete \
  --exclude="*.log" \
  "$BACKUP_DIR/" "$REMOTE"

if [ $? -eq 0 ]; then
    echo "✓ Off-site backup completed"
else
    echo "✗ Off-site backup failed!"
    exit 1
fi

Add to Crontab:

# Daily off-site sync at 04:00 AM UTC
0 4 * * * /home/agent/scripts/backup_offsite.sh >> /home/agent/logs/backup/offsite-backup.log 2>&1

S3 Backup (Optional)

Install AWS CLI:

pip install awscli
aws configure

S3 Sync Script:

#!/bin/bash
# /home/agent/scripts/backup_s3.sh

BACKUP_DIR="/home/agent/logs/backup"
S3_BUCKET="s3://aegis-backups/"

# Sync to S3 with encryption
aws s3 sync "$BACKUP_DIR" "$S3_BUCKET" \
  --exclude "*.log" \
  --storage-class STANDARD_IA \
  --server-side-encryption AES256

if [ $? -eq 0 ]; then
    echo "✓ S3 backup completed"
else
    echo "✗ S3 backup failed!"
    exit 1
fi

Recovery Testing

Test Database Restore

Monthly Recovery Test:

#!/bin/bash
# /home/agent/scripts/test_recovery.sh

echo "=== Testing PostgreSQL Restore ==="

# Use most recent backup
LATEST_BACKUP=$(ls -t /home/agent/logs/backup/aegis_*.dump | head -1)
echo "Testing backup: $LATEST_BACKUP"

# Create test database
createdb -U agent aegis_test

# Restore to test database
pg_restore -U agent -d aegis_test "$LATEST_BACKUP"

if [ $? -eq 0 ]; then
    # Verify data
    ROWS=$(psql -U agent -d aegis_test -t -c "SELECT count(*) FROM episodic_memory")
    echo "✓ Restore successful. Rows in episodic_memory: $ROWS"

    # Clean up
    dropdb -U agent aegis_test
else
    echo "✗ Restore failed!"
    exit 1
fi

Schedule Recovery Test:

# Monthly on the 1st at 05:00 AM UTC
0 5 1 * * /home/agent/scripts/test_recovery.sh >> /home/agent/logs/backup/recovery-test.log 2>&1

Verify Backup Integrity

Check Backup Files:

# List recent backups with sizes
ls -lh /home/agent/logs/backup/*.dump | tail -7

# Verify tar archives
for file in /home/agent/logs/backup/*.tar.gz; do
    tar -tzf "$file" > /dev/null 2>&1 && echo "✓ $file" || echo "✗ $file CORRUPTED"
done

# Check database dump integrity
pg_restore --list /home/agent/logs/backup/aegis_20260125.dump > /dev/null && echo "✓ Valid dump" || echo "✗ Corrupted dump"

Disaster Recovery

Full System Restore

Recovery Steps:

  1. Install Dependencies:

    sudo apt update
    sudo apt install -y postgresql docker.io docker-compose python3 python3-pip
    

  2. Restore Configuration:

    # Extract secure credentials
    tar -xzf /path/to/secure_20260125.tar.gz -C /home/agent
    
    # Restore Claude config
    cp /path/to/claude.json.20260125 ~/.claude.json
    
    # Restore SSH keys
    tar -xzf /path/to/ssh_20260125.tar.gz -C /home/agent
    chmod 700 ~/.ssh
    chmod 600 ~/.ssh/*
    

  3. Restore Database:

    # Create database
    createdb -U agent aegis
    
    # Restore from backup
    pg_restore -U agent -d aegis /path/to/aegis_20260125.dump
    

  4. Restore FalkorDB:

    # Start FalkorDB container
    cd /home/agent/projects/aegis-core
    docker compose up -d falkordb
    
    # Stop and restore data
    docker compose stop falkordb
    docker cp /path/to/falkordb_20260125.rdb falkordb:/data/dump.rdb
    docker compose start falkordb
    

  5. Restore Memory:

    tar -xzf /path/to/memory_20260125.tar.gz -C /home/agent
    

  6. Restore Application:

    # Clone repository
    git clone https://github.com/aegis-agent/aegis-core.git ~/projects/aegis-core
    cd ~/projects/aegis-core
    
    # Restore to specific commit (if known)
    git checkout <commit-hash>
    
    # Build and start
    docker compose up -d
    

  7. Verify Recovery:

    # Check health
    curl http://localhost:8080/health
    
    # Verify data
    psql -U agent -d aegis -c "SELECT count(*) FROM episodic_memory;"
    
    # Check containers
    docker ps
    

Backup Monitoring

Check Backup Status

Daily Backup Verification:

#!/bin/bash
# /home/agent/scripts/check_backups.sh

# Check if backups ran today
TODAY=$(date +%Y%m%d)

# Check PostgreSQL backup
if ls /home/agent/logs/backup/aegis_$TODAY*.dump 1> /dev/null 2>&1; then
    echo "✓ PostgreSQL backup found for today"
else
    echo "✗ PostgreSQL backup MISSING for today"
    # Alert to Discord
fi

# Check FalkorDB backup
if ls /home/agent/logs/backup/falkordb_$TODAY*.rdb* 1> /dev/null 2>&1; then
    echo "✓ FalkorDB backup found for today"
else
    echo "✗ FalkorDB backup MISSING for today"
fi

# Check memory backup
if ls /home/agent/logs/backup/memory_$TODAY*.tar.gz 1> /dev/null 2>&1; then
    echo "✓ Memory backup found for today"
else
    echo "✗ Memory backup MISSING for today"
fi

# Check disk space
DISK_USAGE=$(df -h /home/agent/logs/backup | awk 'NR==2{print $5}' | sed 's/%//')
if [ $DISK_USAGE -gt 80 ]; then
    echo "⚠️  Backup disk usage high: $DISK_USAGE%"
fi

Add to Morning Routine:

# Run during morning check at 06:30 AM UTC
30 6 * * * /home/agent/scripts/check_backups.sh >> /home/agent/logs/backup/backup-status.log 2>&1

Best Practices

  1. 3-2-1 Rule: 3 copies, 2 different media types, 1 off-site
  2. Test Restores: Test recovery monthly, don't assume backups work
  3. Automate Everything: Use cron for all backup tasks
  4. Monitor Backup Success: Alert if backups fail or missing
  5. Retention Policy: Keep 30 days local, 90 days off-site
  6. Encrypt Off-Site: Use encryption for all off-site backups
  7. Document Procedures: Keep this doc updated with any changes
  8. Version Configuration: Store config backups with application version
  9. Quick Access: Keep last 7 days of backups for fast recovery
  10. Disk Space: Monitor backup directory, clean up old backups