Maintenance¶
Comprehensive guide to routine maintenance tasks, optimization procedures, and system cleanup.
Maintenance Window¶
Daily Maintenance Window: 00:00-06:00 UTC
During this window, the following activities occur: - Database vacuuming and analysis - Log rotation - Backup operations - Cache cleanup - Knowledge graph consolidation - Container health checks
Daily Maintenance Tasks¶
Morning Routine (06:00-08:00 UTC)¶
Automated via /morning Command:
# Morning check includes:
# - System status verification
# - Docker container health
# - Database connectivity
# - Wallet balance check
# - Pending tasks review
# - Discord status update
Manual Verification:
# Check system health
curl -s https://aegisagent.ai/health | jq
# Verify containers are running
docker ps --format "table {{.Names}}\t{{.Status}}"
# Check disk space
df -h /home/agent
# Review backup status
ls -lh /home/agent/logs/backup/ | tail -10
# Check recent errors
grep ERROR /home/agent/logs/*.log | tail -20
Evening Routine (22:00-24:00 UTC)¶
Automated via /evening Command:
# Evening summary includes:
# - Daily log generation
# - Completed tasks summary
# - Tomorrow's priorities
# - Discord evening update
Manual Tasks:
# Update journal
# Review: ~/memory/journal/$(date +%Y-%m-%d).md
# Commit any changes
cd /home/agent/projects/aegis-core
git status
git add .
git commit -m "Daily updates: $(date +%Y-%m-%d)"
# Sync Beads tasks
~/.local/bin/bd sync
Database Maintenance¶
PostgreSQL VACUUM and ANALYZE¶
Manual VACUUM:
# Full vacuum (locks tables, use during maintenance window)
psql -U agent -d aegis -c "VACUUM FULL;"
# Regular vacuum (no locks, safe anytime)
psql -U agent -d aegis -c "VACUUM;"
# Analyze table statistics
psql -U agent -d aegis -c "ANALYZE;"
# Vacuum and analyze together
psql -U agent -d aegis -c "VACUUM ANALYZE;"
Vacuum Specific Tables:
# High-churn tables
psql -U agent -d aegis -c "VACUUM ANALYZE episodic_memory;"
psql -U agent -d aegis -c "VACUUM ANALYZE workflow_runs;"
psql -U agent -d aegis -c "VACUUM ANALYZE api_keys;"
Automated VACUUM:
#!/bin/bash
# /home/agent/scripts/vacuum_postgres.sh
echo "=== PostgreSQL Maintenance ==="
date
# Vacuum and analyze
psql -U agent -d aegis -c "VACUUM ANALYZE;" 2>&1
# Check bloat
psql -U agent -d aegis -c "
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
n_dead_tup
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;
" 2>&1
echo "✓ Vacuum completed"
Add to Crontab:
# Daily at 02:00 AM UTC
0 2 * * * /home/agent/scripts/vacuum_postgres.sh >> /home/agent/logs/maintenance/vacuum.log 2>&1
Database Index Maintenance¶
Rebuild Indexes:
# Reindex database (use during maintenance window)
psql -U agent -d aegis -c "REINDEX DATABASE aegis;"
# Reindex specific table
psql -U agent -d aegis -c "REINDEX TABLE episodic_memory;"
# Check index usage
psql -U agent -d aegis -c "
SELECT
schemaname,
tablename,
indexname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY schemaname, tablename;
"
Automated Monthly Reindex:
# Monthly on the 1st at 03:00 AM UTC
0 3 1 * * psql -U agent -d aegis -c "REINDEX DATABASE aegis;" >> /home/agent/logs/maintenance/reindex.log 2>&1
Connection Pool Cleanup¶
Check Active Connections:
# List active connections
psql -U agent -d aegis -c "
SELECT
pid,
usename,
application_name,
client_addr,
state,
query_start,
state_change
FROM pg_stat_activity
WHERE datname = 'aegis'
ORDER BY query_start;
"
# Count connections by state
psql -U agent -d aegis -c "
SELECT state, count(*)
FROM pg_stat_activity
WHERE datname = 'aegis'
GROUP BY state;
"
Kill Idle Connections:
# Kill idle connections (>1 hour)
psql -U agent -d aegis -c "
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE
datname = 'aegis'
AND state = 'idle'
AND state_change < NOW() - INTERVAL '1 hour';
"
# Kill long-running queries (>10 minutes)
psql -U agent -d aegis -c "
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE
datname = 'aegis'
AND state = 'active'
AND query_start < NOW() - INTERVAL '10 minutes';
"
Container Maintenance¶
Container Updates¶
Update Images:
# Pull latest images
cd /home/agent/projects/aegis-core
docker compose pull
# Rebuild custom images
docker compose build --pull
# Recreate containers with new images
docker compose up -d --force-recreate
# Verify updates
docker images | grep aegis
Prune Unused Images:
# Remove dangling images
docker image prune -f
# Remove unused images (older versions)
docker image prune -a --filter "until=24h" -f
# Check disk space saved
df -h /var/lib/docker
Container Restarts¶
Graceful Restart:
cd /home/agent/projects/aegis-core
# Restart single container
docker compose restart dashboard
# Restart all containers
docker compose restart
# Stop and start (full cycle)
docker compose down
docker compose up -d
Force Recreate:
cd /home/agent/projects/aegis-core
# Recreate specific container
docker compose up -d --force-recreate dashboard
# Recreate all containers
docker compose up -d --force-recreate
Container Logs Cleanup¶
Truncate Docker Logs:
# Find large log files
sudo du -sh /var/lib/docker/containers/*/*.log | sort -rh | head -10
# Truncate specific container logs
sudo truncate -s 0 /var/lib/docker/containers/$(docker inspect --format='{{.Id}}' aegis-dashboard)/*.log
# Rotate all container logs
for container in $(docker ps -q); do
log_file="/var/lib/docker/containers/$container/$container-json.log"
if [ -f "$log_file" ]; then
sudo truncate -s 0 "$log_file"
fi
done
Cache Cleanup¶
Application Cache¶
Clean Scheduler Cache:
# Clear APScheduler job store (if needed)
psql -U agent -d aegis -c "DELETE FROM apscheduler_jobs WHERE next_run_time < NOW() - INTERVAL '7 days';"
Clean Memory Cache:
# Remove old episodic memories (>90 days, low importance)
psql -U agent -d aegis -c "
DELETE FROM episodic_memory
WHERE
timestamp < NOW() - INTERVAL '90 days'
AND importance < 5;
"
# Vacuum after delete
psql -U agent -d aegis -c "VACUUM ANALYZE episodic_memory;"
FalkorDB Cache¶
Check Memory Usage:
# Check FalkorDB memory stats
docker exec falkordb redis-cli INFO memory
# Check database size
docker exec falkordb redis-cli DBSIZE
Manual Cleanup (if needed):
# Flush all data (CAREFUL!)
docker exec falkordb redis-cli FLUSHALL
# Compact memory
docker exec falkordb redis-cli MEMORY PURGE
Python Cache Cleanup¶
Clean pip cache:
# Show cache info
pip cache info
# Remove cache
pip cache purge
# Clean __pycache__ directories
find /home/agent/projects/aegis-core -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null
find /home/agent/projects/aegis-core -name "*.pyc" -delete
Log Rotation¶
Manual Log Rotation¶
Rotate Application Logs:
#!/bin/bash
# /home/agent/scripts/rotate_logs.sh
LOG_DIR="/home/agent/logs"
DATE=$(date +%Y%m%d)
cd "$LOG_DIR"
# Rotate each log file
for log in *.log; do
if [ -f "$log" ] && [ -s "$log" ]; then
# Copy with date suffix
cp "$log" "$log.$DATE"
# Compress old log
gzip "$log.$DATE"
# Truncate current log
truncate -s 0 "$log"
echo "✓ Rotated: $log"
fi
done
# Clean up logs older than 30 days
find "$LOG_DIR" -name "*.log.*.gz" -mtime +30 -delete
echo "✓ Log rotation completed"
Add to Crontab:
# Daily at 00:30 AM UTC
30 0 * * * /home/agent/scripts/rotate_logs.sh >> /home/agent/logs/maintenance/log-rotation.log 2>&1
Configure logrotate¶
Create logrotate Config:
sudo tee /etc/logrotate.d/aegis <<EOF
/home/agent/logs/*.log {
daily
rotate 30
compress
delaycompress
missingok
notifempty
create 0644 agent agent
sharedscripts
postrotate
# Optional: notify Discord
echo "Logs rotated: \$(date)" >> /home/agent/logs/maintenance/log-rotation.log
endscript
}
EOF
Test Configuration:
# Dry run
sudo logrotate -d /etc/logrotate.d/aegis
# Force rotation
sudo logrotate -f /etc/logrotate.d/aegis
Knowledge Graph Maintenance¶
Consolidation¶
Automated Consolidation: The knowledge graph runs consolidation daily at 02:00 AM UTC via cron.
Manual Consolidation:
cd /home/agent/projects/aegis-core
source .venv/bin/activate
python -c "
from aegis.memory.consolidation import run_maintenance_consolidation
import asyncio
asyncio.run(run_maintenance_consolidation())
"
Check Consolidation Logs:
Graph Cleanup¶
Remove Old Episodes:
# Using GraphitiClient
from aegis.memory.graphiti_client import GraphitiClient
import asyncio
async def cleanup_old_episodes():
client = GraphitiClient()
await client.initialize()
# Query old episodes
# (This is conceptual - Graphiti doesn't have a built-in delete by age)
# Implement based on your retention policy
asyncio.run(cleanup_old_episodes())
Transcript Ingestion¶
Automated Ingestion: Runs daily at 03:00 AM UTC via cron.
Manual Ingestion:
cd /home/agent/projects/aegis-core
python scripts/ingest_transcripts.py --stats
# Ingest specific date range
python scripts/ingest_transcripts.py --after 2026-01-20
# Dry run
python scripts/ingest_transcripts.py --dry-run
Disk Space Management¶
Check Disk Usage¶
Overall Usage:
# Disk usage summary
df -h
# Specific directories
du -sh /home/agent/* | sort -rh | head -10
# Docker disk usage
docker system df
# PostgreSQL size
psql -U agent -d aegis -c "
SELECT
pg_size_pretty(pg_database_size('aegis')) as db_size,
pg_size_pretty(pg_total_relation_size('episodic_memory')) as episodic_size,
pg_size_pretty(pg_total_relation_size('semantic_memory')) as semantic_size;
"
Clean Up Disk Space¶
Safe Cleanup:
# Remove old backups (>30 days)
find /home/agent/logs/backup -name "*.dump" -mtime +30 -delete
find /home/agent/logs/backup -name "*.tar.gz" -mtime +30 -delete
# Remove old logs (>30 days)
find /home/agent/logs -name "*.log.*" -mtime +30 -delete
# Clean Docker
docker system prune -a --volumes -f
# Clean pip cache
pip cache purge
# Clean apt cache
sudo apt clean
sudo apt autoremove -y
Aggressive Cleanup (if disk >85% full):
# Remove old journal entries (>90 days)
find /home/agent/memory/journal -name "*.md" -mtime +90 -delete
# Compress memory files
find /home/agent/memory/semantic -name "*.md" -exec gzip {} \;
# Remove old screenshots
find /home/agent/projects/aegis-core/data/monitor/screenshots -name "*.png" -mtime +7 -delete
# Vacuum full database
psql -U agent -d aegis -c "VACUUM FULL;"
Performance Optimization¶
Database Query Optimization¶
Identify Slow Queries:
# Enable pg_stat_statements
psql -U agent -d aegis -c "CREATE EXTENSION IF NOT EXISTS pg_stat_statements;"
# Find slow queries
psql -U agent -d aegis -c "
SELECT
query,
calls,
total_time,
mean_time,
max_time
FROM pg_stat_statements
WHERE mean_time > 100
ORDER BY mean_time DESC
LIMIT 20;
"
# Reset stats
psql -U agent -d aegis -c "SELECT pg_stat_statements_reset();"
Add Missing Indexes:
# Check for missing indexes
psql -U agent -d aegis -c "
SELECT
schemaname,
tablename,
attname,
n_distinct,
null_frac
FROM pg_stats
WHERE
schemaname = 'public'
AND n_distinct > 100
AND null_frac < 0.1
ORDER BY n_distinct DESC;
"
# Create indexes as needed
psql -U agent -d aegis -c "CREATE INDEX IF NOT EXISTS idx_episodic_timestamp ON episodic_memory(timestamp DESC);"
psql -U agent -d aegis -c "CREATE INDEX IF NOT EXISTS idx_workflow_status ON workflow_runs(status);"
Container Resource Limits¶
Add Resource Limits to docker-compose.yml:
services:
dashboard:
deploy:
resources:
limits:
cpus: '2.0'
memory: 4G
reservations:
cpus: '0.5'
memory: 1G
Monitor Resource Usage:
# Check current usage
docker stats --no-stream
# Historical usage
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"
Scheduled Maintenance Tasks¶
Cron Schedule Summary¶
# View all scheduled tasks
crontab -l
# Expected schedule:
# 00:30 - Log rotation
# 01:00 - PostgreSQL backup
# 01:30 - FalkorDB backup
# 02:00 - Database vacuum
# 02:00 - Memory backup
# 02:00 - Knowledge graph consolidation
# 03:00 - Transcript ingestion
# 03:00 - Configuration backup (weekly)
# 04:00 - Off-site backup sync
# 05:00 - Recovery test (monthly)
# 06:00 - Morning routine
# 22:00 - Evening routine
Create Maintenance Master Script¶
#!/bin/bash
# /home/agent/scripts/maintenance_master.sh
echo "=== Aegis Maintenance Master Script ==="
echo "Started: $(date)"
# Database maintenance
echo "Running database maintenance..."
/home/agent/scripts/vacuum_postgres.sh
# Log rotation
echo "Rotating logs..."
/home/agent/scripts/rotate_logs.sh
# Cleanup
echo "Cleaning up disk space..."
docker system prune -f
find /home/agent/logs/backup -name "*.dump" -mtime +30 -delete
# Verify backups
echo "Verifying backups..."
/home/agent/scripts/check_backups.sh
# Health check
echo "Running health check..."
/home/agent/projects/aegis-core/scripts/health_check.sh https://aegisagent.ai
echo "Completed: $(date)"
echo "=== Maintenance Complete ==="
Run Weekly:
# Every Sunday at 01:00 AM UTC
0 1 * * 0 /home/agent/scripts/maintenance_master.sh >> /home/agent/logs/maintenance/master.log 2>&1
Maintenance Checklist¶
Daily (Automated)¶
- Database vacuum
- Log rotation
- Container health checks
- Backup verification
- Knowledge graph consolidation
Weekly (Manual)¶
- Review disk space usage
- Check for container updates
- Review error logs
- Verify backup integrity
- Review journal entries
Monthly (Manual)¶
- Database reindex
- Test backup restore
- Review and clean old data
- Update dependencies
- Review and optimize slow queries
- Clean up unused Docker images/volumes
Quarterly (Manual)¶
- Review retention policies
- Audit security credentials
- Performance benchmark
- Review and update documentation
- Infrastructure capacity planning
Troubleshooting Maintenance Issues¶
Maintenance Window Overrun¶
Symptom: Maintenance tasks taking >6 hours
Action:
# Check long-running processes
ps aux | grep -E "(vacuum|backup|pg_dump)" | grep -v grep
# Check disk I/O
iostat -x 5 3
# Kill stuck processes (if safe)
pkill -9 pg_dump
Disk Space Critical¶
Symptom: Disk usage >90%
Immediate Action:
# Emergency cleanup
docker system prune -a --volumes -f
find /home/agent/logs -name "*.log" -size +100M -delete
truncate -s 0 /home/agent/logs/*.log
Database Bloat¶
Symptom: Database size growing despite cleanup
Action:
# Check table bloat
psql -U agent -d aegis -c "
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS total_size,
pg_size_pretty(pg_relation_size(schemaname||'.'||tablename)) AS table_size,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) - pg_relation_size(schemaname||'.'||tablename)) AS index_size
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;
"
# Full vacuum during maintenance window
psql -U agent -d aegis -c "VACUUM FULL ANALYZE;"
Best Practices¶
- Always Use Maintenance Window: Run heavy operations 00:00-06:00 UTC
- Test Before Production: Test maintenance scripts on test database first
- Monitor During Maintenance: Watch logs for errors during maintenance
- Automate Everything: Use cron for all routine tasks
- Document Changes: Update this doc when changing procedures
- Backup Before Major Changes: Always backup before schema changes
- Gradual Rollout: Deploy container updates one at a time
- Resource Limits: Set memory/CPU limits on containers
- Alert on Failures: Send Discord alerts for failed maintenance tasks
- Keep It Simple: Prefer simple, reliable scripts over complex solutions
Related Documentation¶
- Monitoring - Health checks and alerting
- Backups - Backup and recovery procedures
- Troubleshooting - Common issues and solutions
- Logging - Log management and analysis