Logging¶
Comprehensive guide to Aegis logging infrastructure, log locations, and analysis.
Log Locations¶
Application Logs¶
Directory: /home/agent/logs/
| Log File | Purpose | Rotation | Typical Size |
|---|---|---|---|
| balance-check.log | Wallet balance monitoring | Daily | ~170KB |
| claude-session-worker.log | Claude session management | Daily | ~50KB |
| claude_worker.log | Claude worker tasks | Daily | ~13KB |
| feed-health.log | RSS feed health checks | Daily | ~12KB |
| ideas-channel.log | Discord #ideas monitoring | Daily | ~110KB |
| market-scheduler.log | Market intelligence jobs | Daily | ~760KB |
| memory-consolidation.log | Knowledge graph sync | Daily | ~8KB |
| proactive-feedback.log | Proactive learning feedback | Daily | ~300B |
| proactive-news-scan.log | News aggregation | Daily | ~5KB |
| proactive-pattern-learning.log | Pattern detection | Daily | ~1KB |
| profile-analyzer.log | Profile analysis jobs | Daily | ~100B |
| profile-channel.log | Discord profile analysis | Daily | ~32KB |
| transcript-ingestion.log | Journal/history ingestion | Daily | ~32KB |
Docker Container Logs¶
Access via Docker:
# View container logs
docker logs aegis-dashboard
docker logs aegis-scheduler
docker logs aegis-playwright
docker logs falkordb
# Follow logs in real-time
docker logs -f aegis-dashboard
# Show last N lines
docker logs --tail 100 aegis-dashboard
# Show logs since timestamp
docker logs --since 2026-01-25T10:00:00 aegis-dashboard
# Show logs with timestamps
docker logs -t aegis-dashboard
Log Drivers:
Docker uses the default json-file driver. Logs are stored in:
System Logs¶
Systemd Logs (if running services via systemd):
# View systemd logs
journalctl -u aegis-dashboard.service
# Follow systemd logs
journalctl -u aegis-dashboard.service -f
# Show logs since boot
journalctl -u aegis-dashboard.service -b
Cron Job Logs¶
Crontab Logs:
# View cron logs
grep CRON /var/log/syslog
# Check specific job output
grep "transcript-ingestion" /home/agent/logs/transcript-ingestion.log
Active Cron Jobs:
# List current cron jobs
crontab -l
# Example cron log redirect
0 2 * * * cd /home/agent/projects/aegis-core && source .venv/bin/activate && python scripts/ingest_transcripts.py >> /home/agent/logs/transcript-ingestion.log 2>&1
Log Format¶
Structured Logging (structlog)¶
Aegis uses structlog for structured logging throughout the application.
Example Log Entry:
{
"event": "health_check_completed",
"timestamp": "2026-01-25T12:34:56.789Z",
"level": "info",
"logger": "aegis.scheduler",
"status": "healthy",
"duration_ms": 45.3,
"checks_passed": 5,
"checks_failed": 0
}
Log Levels: - DEBUG: Detailed diagnostic information - INFO: Informational messages (normal operation) - WARNING: Warning messages (potential issues) - ERROR: Error messages (failures that don't stop execution) - CRITICAL: Critical errors (system-threatening issues)
Custom Log Context¶
Adding Context to Logs:
import structlog
logger = structlog.get_logger()
# Log with context
logger.info(
"task_completed",
task_id="123",
duration_ms=1234.5,
status="success"
)
# Bind context for multiple log calls
log = logger.bind(request_id="abc-123")
log.info("request_received")
log.info("processing_complete")
Log Rotation¶
Current Configuration¶
No Automatic Rotation: Currently, logs in /home/agent/logs/ are NOT automatically rotated.
Manual Rotation (recommended during maintenance window):
# Rotate logs manually
cd /home/agent/logs
for log in *.log; do
if [ -f "$log" ]; then
mv "$log" "$log.$(date +%Y%m%d)"
gzip "$log.$(date +%Y%m%d)"
touch "$log"
fi
done
# Clean up old logs (>30 days)
find /home/agent/logs -name "*.log.*.gz" -mtime +30 -delete
Setting Up logrotate¶
Create logrotate Configuration:
# Create config file
sudo tee /etc/logrotate.d/aegis <<EOF
/home/agent/logs/*.log {
daily
rotate 30
compress
delaycompress
missingok
notifempty
create 0644 agent agent
sharedscripts
postrotate
# Optional: send notification
/usr/bin/docker exec aegis-dashboard python -c "import structlog; structlog.get_logger().info('logs_rotated')" || true
endscript
}
EOF
# Test configuration
sudo logrotate -d /etc/logrotate.d/aegis
# Force rotation (testing)
sudo logrotate -f /etc/logrotate.d/aegis
Docker Log Rotation¶
Configure in docker-compose.yml:
Apply Changes:
Error Tracking¶
Error MCP Tools¶
Aegis includes dedicated error tracking tools via the MCP server.
Record Errors:
# Log error with context
mcp__aegis__error_record(
error_type="DatabaseConnectionError",
context="Failed to connect to PostgreSQL during health check",
strike_count=1,
severity="warning"
)
Search Past Errors:
# Find similar errors
results = mcp__aegis__error_search(
query="DatabaseConnectionError"
)
# Returns past occurrences, resolutions, patterns
Record Resolution:
# Document how error was fixed
mcp__aegis__error_resolution(
error_id="err_123",
resolution="Restarted PostgreSQL service",
worked=True,
lessons_learned="Database had reached max connections, needed to increase max_connections setting"
)
Recent Errors:
Three-Strike Protocol¶
Strike 1: Log error, retry with modified approach
mcp__aegis__error_record(
error_type="APITimeoutError",
context="OpenAI API timeout after 30s",
strike_count=1
)
Strike 2: Escalate, use local model, search for similar errors
# Search for past resolutions
similar = mcp__aegis__error_search(query="APITimeoutError")
# Record escalation
mcp__aegis__error_record(
error_type="APITimeoutError",
context="Second timeout, switching to local model",
strike_count=2,
severity="error"
)
Strike 3: STOP, document, alert human
# Record critical error
mcp__aegis__error_record(
error_type="APITimeoutError",
context="Third consecutive timeout, halting task",
strike_count=3,
severity="critical"
)
# Alert to Discord
mcp__discord__discord_send_message(
channel_id="1455049130614329508",
content="🚨 Three-strike protocol triggered: APITimeoutError. Task halted, awaiting human input."
)
Log Analysis¶
Search Logs¶
Grep for Errors:
# Find all errors in last 24 hours
find /home/agent/logs -name "*.log" -mtime -1 -exec grep -H "ERROR" {} \;
# Count errors by type
grep ERROR /home/agent/logs/*.log | cut -d: -f3 | sort | uniq -c | sort -rn
# Find errors in specific log
grep ERROR /home/agent/logs/market-scheduler.log | tail -20
Analyze Patterns:
# Most common log messages
cat /home/agent/logs/*.log | cut -d: -f2- | sort | uniq -c | sort -rn | head -20
# Errors by hour
grep ERROR /home/agent/logs/*.log | cut -d' ' -f1,2 | cut -c1-13 | uniq -c
# Failed tasks
grep -i "failed\|error\|exception" /home/agent/logs/market-scheduler.log | wc -l
Log Aggregation¶
Combine Recent Logs:
# All logs from last hour
find /home/agent/logs -name "*.log" -mmin -60 -exec tail -n 100 {} \; > /tmp/recent_logs.txt
# All errors from today
grep ERROR /home/agent/logs/*.log | grep "$(date +%Y-%m-%d)" > /tmp/errors_today.txt
JSON Log Parsing:
# Extract specific fields (if logs are JSON)
grep "health_check" /home/agent/logs/*.log | jq -r '.event, .status, .duration_ms'
# Count events by type
cat /home/agent/logs/*.log | jq -r '.event' | sort | uniq -c | sort -rn
Performance Analysis¶
Slow Operations:
# Find operations >1000ms
grep "duration_ms" /home/agent/logs/*.log | awk -F'duration_ms[: ]*' '{print $2}' | awk '{if($1>1000) print}'
# Average duration by operation
grep "task_completed" /home/agent/logs/*.log | jq '.duration_ms' | awk '{sum+=$1; count++} END {print sum/count}'
Real-Time Monitoring¶
Tail Multiple Logs¶
Follow All Application Logs:
# Using tail with wildcard
tail -f /home/agent/logs/*.log
# Using multitail (if installed)
multitail /home/agent/logs/*.log
# Filter for errors only
tail -f /home/agent/logs/*.log | grep --line-buffered ERROR
Watch Specific Events¶
Monitor Health Checks:
Monitor Scheduler Jobs:
Log Shipping (Advanced)¶
Send Logs to External Service¶
Option 1: Syslog:
# Configure rsyslog to forward to external server
# /etc/rsyslog.d/aegis.conf
*.* @@logs.example.com:514
Option 2: Loki (Grafana):
# docker-compose.yml addition
services:
promtail:
image: grafana/promtail:latest
volumes:
- /home/agent/logs:/logs
- ./promtail-config.yml:/etc/promtail/config.yml
command: -config.file=/etc/promtail/config.yml
Option 3: CloudWatch Logs:
# Using boto3 to ship logs
import boto3
import watchtower
logger = structlog.get_logger()
logger.addHandler(watchtower.CloudWatchLogHandler())
Troubleshooting Log Issues¶
No Logs Appearing¶
Check Log Permissions:
# Verify log directory permissions
ls -la /home/agent/logs
# Fix permissions if needed
chown -R agent:agent /home/agent/logs
chmod 755 /home/agent/logs
chmod 644 /home/agent/logs/*.log
Check Disk Space:
# Check if disk is full
df -h /home/agent
# Check largest log files
du -sh /home/agent/logs/* | sort -rh | head -10
Log Flooding¶
Identify Culprit:
# Find fastest-growing logs
ls -ltrh /home/agent/logs/*.log | tail -5
# Watch log growth
watch -n 1 'ls -lh /home/agent/logs/*.log | tail -5'
Temporary Mitigation:
# Truncate large log
truncate -s 0 /home/agent/logs/market-scheduler.log
# Or rotate immediately
mv /home/agent/logs/market-scheduler.log /home/agent/logs/market-scheduler.log.$(date +%Y%m%d)
gzip /home/agent/logs/market-scheduler.log.$(date +%Y%m%d)
touch /home/agent/logs/market-scheduler.log
Missing Docker Logs¶
Check Log Driver:
# Inspect container logging config
docker inspect aegis-dashboard | jq '.[0].HostConfig.LogConfig'
# Check if logs are being written
docker logs aegis-dashboard --since 5m
Fix Lost Logs:
# Restart container to reset logging
cd /home/agent/projects/aegis-core && docker compose restart dashboard
# Or recreate container
cd /home/agent/projects/aegis-core && docker compose up -d --force-recreate dashboard
Best Practices¶
- Use Structured Logging: Always use
structlogwith JSON format - Include Context: Add task_id, request_id, user_id to all logs
- Set Appropriate Levels: Don't log INFO as ERROR
- Rotate Regularly: Set up logrotate to prevent disk exhaustion
- Monitor Log Volume: Alert if logs grow >100MB/day unexpectedly
- Sanitize Logs: Never log API keys, passwords, or PII
- Centralize: Consider shipping logs to external service for long-term storage
- Index for Search: Use tools like Loki or Elasticsearch for searchability
- Retention Policy: Keep logs for at least 30 days, critical logs for 90 days
- Regular Review: Check logs during morning routine for anomalies
Related Documentation¶
- Monitoring - Health checks and alerting
- Troubleshooting - Common issues and solutions
- Maintenance - Routine maintenance tasks