Skip to content

Maintenance

Comprehensive guide to routine maintenance tasks, optimization procedures, and system cleanup.

Maintenance Window

Daily Maintenance Window: 00:00-06:00 UTC

During this window, the following activities occur: - Database vacuuming and analysis - Log rotation - Backup operations - Cache cleanup - Knowledge graph consolidation - Container health checks

Daily Maintenance Tasks

Morning Routine (06:00-08:00 UTC)

Automated via /morning Command:

# Morning check includes:
# - System status verification
# - Docker container health
# - Database connectivity
# - Wallet balance check
# - Pending tasks review
# - Discord status update

Manual Verification:

# Check system health
curl -s https://aegisagent.ai/health | jq

# Verify containers are running
docker ps --format "table {{.Names}}\t{{.Status}}"

# Check disk space
df -h /home/agent

# Review backup status
ls -lh /home/agent/logs/backup/ | tail -10

# Check recent errors
grep ERROR /home/agent/logs/*.log | tail -20

Evening Routine (22:00-24:00 UTC)

Automated via /evening Command:

# Evening summary includes:
# - Daily log generation
# - Completed tasks summary
# - Tomorrow's priorities
# - Discord evening update

Manual Tasks:

# Update journal
# Review: ~/memory/journal/$(date +%Y-%m-%d).md

# Commit any changes
cd /home/agent/projects/aegis-core
git status
git add .
git commit -m "Daily updates: $(date +%Y-%m-%d)"

# Sync Beads tasks
~/.local/bin/bd sync

Database Maintenance

PostgreSQL VACUUM and ANALYZE

Manual VACUUM:

# Full vacuum (locks tables, use during maintenance window)
psql -U agent -d aegis -c "VACUUM FULL;"

# Regular vacuum (no locks, safe anytime)
psql -U agent -d aegis -c "VACUUM;"

# Analyze table statistics
psql -U agent -d aegis -c "ANALYZE;"

# Vacuum and analyze together
psql -U agent -d aegis -c "VACUUM ANALYZE;"

Vacuum Specific Tables:

# High-churn tables
psql -U agent -d aegis -c "VACUUM ANALYZE episodic_memory;"
psql -U agent -d aegis -c "VACUUM ANALYZE workflow_runs;"
psql -U agent -d aegis -c "VACUUM ANALYZE api_keys;"

Automated VACUUM:

#!/bin/bash
# /home/agent/scripts/vacuum_postgres.sh

echo "=== PostgreSQL Maintenance ==="
date

# Vacuum and analyze
psql -U agent -d aegis -c "VACUUM ANALYZE;" 2>&1

# Check bloat
psql -U agent -d aegis -c "
  SELECT
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
    n_dead_tup
  FROM pg_stat_user_tables
  WHERE n_dead_tup > 1000
  ORDER BY n_dead_tup DESC;
" 2>&1

echo "✓ Vacuum completed"

Add to Crontab:

# Daily at 02:00 AM UTC
0 2 * * * /home/agent/scripts/vacuum_postgres.sh >> /home/agent/logs/maintenance/vacuum.log 2>&1

Database Index Maintenance

Rebuild Indexes:

# Reindex database (use during maintenance window)
psql -U agent -d aegis -c "REINDEX DATABASE aegis;"

# Reindex specific table
psql -U agent -d aegis -c "REINDEX TABLE episodic_memory;"

# Check index usage
psql -U agent -d aegis -c "
  SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
  FROM pg_stat_user_indexes
  WHERE idx_scan = 0
  ORDER BY schemaname, tablename;
"

Automated Monthly Reindex:

# Monthly on the 1st at 03:00 AM UTC
0 3 1 * * psql -U agent -d aegis -c "REINDEX DATABASE aegis;" >> /home/agent/logs/maintenance/reindex.log 2>&1

Connection Pool Cleanup

Check Active Connections:

# List active connections
psql -U agent -d aegis -c "
  SELECT
    pid,
    usename,
    application_name,
    client_addr,
    state,
    query_start,
    state_change
  FROM pg_stat_activity
  WHERE datname = 'aegis'
  ORDER BY query_start;
"

# Count connections by state
psql -U agent -d aegis -c "
  SELECT state, count(*)
  FROM pg_stat_activity
  WHERE datname = 'aegis'
  GROUP BY state;
"

Kill Idle Connections:

# Kill idle connections (>1 hour)
psql -U agent -d aegis -c "
  SELECT pg_terminate_backend(pid)
  FROM pg_stat_activity
  WHERE
    datname = 'aegis'
    AND state = 'idle'
    AND state_change < NOW() - INTERVAL '1 hour';
"

# Kill long-running queries (>10 minutes)
psql -U agent -d aegis -c "
  SELECT pg_terminate_backend(pid)
  FROM pg_stat_activity
  WHERE
    datname = 'aegis'
    AND state = 'active'
    AND query_start < NOW() - INTERVAL '10 minutes';
"

Container Maintenance

Container Updates

Update Images:

# Pull latest images
cd /home/agent/projects/aegis-core
docker compose pull

# Rebuild custom images
docker compose build --pull

# Recreate containers with new images
docker compose up -d --force-recreate

# Verify updates
docker images | grep aegis

Prune Unused Images:

# Remove dangling images
docker image prune -f

# Remove unused images (older versions)
docker image prune -a --filter "until=24h" -f

# Check disk space saved
df -h /var/lib/docker

Container Restarts

Graceful Restart:

cd /home/agent/projects/aegis-core

# Restart single container
docker compose restart dashboard

# Restart all containers
docker compose restart

# Stop and start (full cycle)
docker compose down
docker compose up -d

Force Recreate:

cd /home/agent/projects/aegis-core

# Recreate specific container
docker compose up -d --force-recreate dashboard

# Recreate all containers
docker compose up -d --force-recreate

Container Logs Cleanup

Truncate Docker Logs:

# Find large log files
sudo du -sh /var/lib/docker/containers/*/*.log | sort -rh | head -10

# Truncate specific container logs
sudo truncate -s 0 /var/lib/docker/containers/$(docker inspect --format='{{.Id}}' aegis-dashboard)/*.log

# Rotate all container logs
for container in $(docker ps -q); do
    log_file="/var/lib/docker/containers/$container/$container-json.log"
    if [ -f "$log_file" ]; then
        sudo truncate -s 0 "$log_file"
    fi
done

Cache Cleanup

Application Cache

Clean Scheduler Cache:

# Clear APScheduler job store (if needed)
psql -U agent -d aegis -c "DELETE FROM apscheduler_jobs WHERE next_run_time < NOW() - INTERVAL '7 days';"

Clean Memory Cache:

# Remove old episodic memories (>90 days, low importance)
psql -U agent -d aegis -c "
  DELETE FROM episodic_memory
  WHERE
    timestamp < NOW() - INTERVAL '90 days'
    AND importance < 5;
"

# Vacuum after delete
psql -U agent -d aegis -c "VACUUM ANALYZE episodic_memory;"

FalkorDB Cache

Check Memory Usage:

# Check FalkorDB memory stats
docker exec falkordb redis-cli INFO memory

# Check database size
docker exec falkordb redis-cli DBSIZE

Manual Cleanup (if needed):

# Flush all data (CAREFUL!)
docker exec falkordb redis-cli FLUSHALL

# Compact memory
docker exec falkordb redis-cli MEMORY PURGE

Python Cache Cleanup

Clean pip cache:

# Show cache info
pip cache info

# Remove cache
pip cache purge

# Clean __pycache__ directories
find /home/agent/projects/aegis-core -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null
find /home/agent/projects/aegis-core -name "*.pyc" -delete

Log Rotation

Manual Log Rotation

Rotate Application Logs:

#!/bin/bash
# /home/agent/scripts/rotate_logs.sh

LOG_DIR="/home/agent/logs"
DATE=$(date +%Y%m%d)

cd "$LOG_DIR"

# Rotate each log file
for log in *.log; do
    if [ -f "$log" ] && [ -s "$log" ]; then
        # Copy with date suffix
        cp "$log" "$log.$DATE"

        # Compress old log
        gzip "$log.$DATE"

        # Truncate current log
        truncate -s 0 "$log"

        echo "✓ Rotated: $log"
    fi
done

# Clean up logs older than 30 days
find "$LOG_DIR" -name "*.log.*.gz" -mtime +30 -delete

echo "✓ Log rotation completed"

Add to Crontab:

# Daily at 00:30 AM UTC
30 0 * * * /home/agent/scripts/rotate_logs.sh >> /home/agent/logs/maintenance/log-rotation.log 2>&1

Configure logrotate

Create logrotate Config:

sudo tee /etc/logrotate.d/aegis <<EOF
/home/agent/logs/*.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    notifempty
    create 0644 agent agent
    sharedscripts
    postrotate
        # Optional: notify Discord
        echo "Logs rotated: \$(date)" >> /home/agent/logs/maintenance/log-rotation.log
    endscript
}
EOF

Test Configuration:

# Dry run
sudo logrotate -d /etc/logrotate.d/aegis

# Force rotation
sudo logrotate -f /etc/logrotate.d/aegis

Knowledge Graph Maintenance

Consolidation

Automated Consolidation: The knowledge graph runs consolidation daily at 02:00 AM UTC via cron.

Manual Consolidation:

cd /home/agent/projects/aegis-core
source .venv/bin/activate

python -c "
from aegis.memory.consolidation import run_maintenance_consolidation
import asyncio
asyncio.run(run_maintenance_consolidation())
"

Check Consolidation Logs:

tail -50 /home/agent/logs/memory-consolidation.log

Graph Cleanup

Remove Old Episodes:

# Using GraphitiClient
from aegis.memory.graphiti_client import GraphitiClient
import asyncio

async def cleanup_old_episodes():
    client = GraphitiClient()
    await client.initialize()

    # Query old episodes
    # (This is conceptual - Graphiti doesn't have a built-in delete by age)
    # Implement based on your retention policy

asyncio.run(cleanup_old_episodes())

Transcript Ingestion

Automated Ingestion: Runs daily at 03:00 AM UTC via cron.

Manual Ingestion:

cd /home/agent/projects/aegis-core
python scripts/ingest_transcripts.py --stats

# Ingest specific date range
python scripts/ingest_transcripts.py --after 2026-01-20

# Dry run
python scripts/ingest_transcripts.py --dry-run

Disk Space Management

Check Disk Usage

Overall Usage:

# Disk usage summary
df -h

# Specific directories
du -sh /home/agent/* | sort -rh | head -10

# Docker disk usage
docker system df

# PostgreSQL size
psql -U agent -d aegis -c "
  SELECT
    pg_size_pretty(pg_database_size('aegis')) as db_size,
    pg_size_pretty(pg_total_relation_size('episodic_memory')) as episodic_size,
    pg_size_pretty(pg_total_relation_size('semantic_memory')) as semantic_size;
"

Clean Up Disk Space

Safe Cleanup:

# Remove old backups (>30 days)
find /home/agent/logs/backup -name "*.dump" -mtime +30 -delete
find /home/agent/logs/backup -name "*.tar.gz" -mtime +30 -delete

# Remove old logs (>30 days)
find /home/agent/logs -name "*.log.*" -mtime +30 -delete

# Clean Docker
docker system prune -a --volumes -f

# Clean pip cache
pip cache purge

# Clean apt cache
sudo apt clean
sudo apt autoremove -y

Aggressive Cleanup (if disk >85% full):

# Remove old journal entries (>90 days)
find /home/agent/memory/journal -name "*.md" -mtime +90 -delete

# Compress memory files
find /home/agent/memory/semantic -name "*.md" -exec gzip {} \;

# Remove old screenshots
find /home/agent/projects/aegis-core/data/monitor/screenshots -name "*.png" -mtime +7 -delete

# Vacuum full database
psql -U agent -d aegis -c "VACUUM FULL;"

Performance Optimization

Database Query Optimization

Identify Slow Queries:

# Enable pg_stat_statements
psql -U agent -d aegis -c "CREATE EXTENSION IF NOT EXISTS pg_stat_statements;"

# Find slow queries
psql -U agent -d aegis -c "
  SELECT
    query,
    calls,
    total_time,
    mean_time,
    max_time
  FROM pg_stat_statements
  WHERE mean_time > 100
  ORDER BY mean_time DESC
  LIMIT 20;
"

# Reset stats
psql -U agent -d aegis -c "SELECT pg_stat_statements_reset();"

Add Missing Indexes:

# Check for missing indexes
psql -U agent -d aegis -c "
  SELECT
    schemaname,
    tablename,
    attname,
    n_distinct,
    null_frac
  FROM pg_stats
  WHERE
    schemaname = 'public'
    AND n_distinct > 100
    AND null_frac < 0.1
  ORDER BY n_distinct DESC;
"

# Create indexes as needed
psql -U agent -d aegis -c "CREATE INDEX IF NOT EXISTS idx_episodic_timestamp ON episodic_memory(timestamp DESC);"
psql -U agent -d aegis -c "CREATE INDEX IF NOT EXISTS idx_workflow_status ON workflow_runs(status);"

Container Resource Limits

Add Resource Limits to docker-compose.yml:

services:
  dashboard:
    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 4G
        reservations:
          cpus: '0.5'
          memory: 1G

Monitor Resource Usage:

# Check current usage
docker stats --no-stream

# Historical usage
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"

Scheduled Maintenance Tasks

Cron Schedule Summary

# View all scheduled tasks
crontab -l

# Expected schedule:
# 00:30 - Log rotation
# 01:00 - PostgreSQL backup
# 01:30 - FalkorDB backup
# 02:00 - Database vacuum
# 02:00 - Memory backup
# 02:00 - Knowledge graph consolidation
# 03:00 - Transcript ingestion
# 03:00 - Configuration backup (weekly)
# 04:00 - Off-site backup sync
# 05:00 - Recovery test (monthly)
# 06:00 - Morning routine
# 22:00 - Evening routine

Create Maintenance Master Script

#!/bin/bash
# /home/agent/scripts/maintenance_master.sh

echo "=== Aegis Maintenance Master Script ==="
echo "Started: $(date)"

# Database maintenance
echo "Running database maintenance..."
/home/agent/scripts/vacuum_postgres.sh

# Log rotation
echo "Rotating logs..."
/home/agent/scripts/rotate_logs.sh

# Cleanup
echo "Cleaning up disk space..."
docker system prune -f
find /home/agent/logs/backup -name "*.dump" -mtime +30 -delete

# Verify backups
echo "Verifying backups..."
/home/agent/scripts/check_backups.sh

# Health check
echo "Running health check..."
/home/agent/projects/aegis-core/scripts/health_check.sh https://aegisagent.ai

echo "Completed: $(date)"
echo "=== Maintenance Complete ==="

Run Weekly:

# Every Sunday at 01:00 AM UTC
0 1 * * 0 /home/agent/scripts/maintenance_master.sh >> /home/agent/logs/maintenance/master.log 2>&1

Maintenance Checklist

Daily (Automated)

  • Database vacuum
  • Log rotation
  • Container health checks
  • Backup verification
  • Knowledge graph consolidation

Weekly (Manual)

  • Review disk space usage
  • Check for container updates
  • Review error logs
  • Verify backup integrity
  • Review journal entries

Monthly (Manual)

  • Database reindex
  • Test backup restore
  • Review and clean old data
  • Update dependencies
  • Review and optimize slow queries
  • Clean up unused Docker images/volumes

Quarterly (Manual)

  • Review retention policies
  • Audit security credentials
  • Performance benchmark
  • Review and update documentation
  • Infrastructure capacity planning

Troubleshooting Maintenance Issues

Maintenance Window Overrun

Symptom: Maintenance tasks taking >6 hours

Action:

# Check long-running processes
ps aux | grep -E "(vacuum|backup|pg_dump)" | grep -v grep

# Check disk I/O
iostat -x 5 3

# Kill stuck processes (if safe)
pkill -9 pg_dump

Disk Space Critical

Symptom: Disk usage >90%

Immediate Action:

# Emergency cleanup
docker system prune -a --volumes -f
find /home/agent/logs -name "*.log" -size +100M -delete
truncate -s 0 /home/agent/logs/*.log

Database Bloat

Symptom: Database size growing despite cleanup

Action:

# Check table bloat
psql -U agent -d aegis -c "
  SELECT
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS total_size,
    pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) - pg_relation_size(schemaname||'.'||tablename)) AS index_size
  FROM pg_stat_user_tables
  ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
  LIMIT 10;
"

# Full vacuum during maintenance window
psql -U agent -d aegis -c "VACUUM FULL ANALYZE;"

Best Practices

  1. Always Use Maintenance Window: Run heavy operations 00:00-06:00 UTC
  2. Test Before Production: Test maintenance scripts on test database first
  3. Monitor During Maintenance: Watch logs for errors during maintenance
  4. Automate Everything: Use cron for all routine tasks
  5. Document Changes: Update this doc when changing procedures
  6. Backup Before Major Changes: Always backup before schema changes
  7. Gradual Rollout: Deploy container updates one at a time
  8. Resource Limits: Set memory/CPU limits on containers
  9. Alert on Failures: Send Discord alerts for failed maintenance tasks
  10. Keep It Simple: Prefer simple, reliable scripts over complex solutions