Skip to content

Logging

Comprehensive guide to Aegis logging infrastructure, log locations, and analysis.

Log Locations

Application Logs

Directory: /home/agent/logs/

Log File Purpose Rotation Typical Size
balance-check.log Wallet balance monitoring Daily ~170KB
claude-session-worker.log Claude session management Daily ~50KB
claude_worker.log Claude worker tasks Daily ~13KB
feed-health.log RSS feed health checks Daily ~12KB
ideas-channel.log Discord #ideas monitoring Daily ~110KB
market-scheduler.log Market intelligence jobs Daily ~760KB
memory-consolidation.log Knowledge graph sync Daily ~8KB
proactive-feedback.log Proactive learning feedback Daily ~300B
proactive-news-scan.log News aggregation Daily ~5KB
proactive-pattern-learning.log Pattern detection Daily ~1KB
profile-analyzer.log Profile analysis jobs Daily ~100B
profile-channel.log Discord profile analysis Daily ~32KB
transcript-ingestion.log Journal/history ingestion Daily ~32KB

Docker Container Logs

Access via Docker:

# View container logs
docker logs aegis-dashboard
docker logs aegis-scheduler
docker logs aegis-playwright
docker logs falkordb

# Follow logs in real-time
docker logs -f aegis-dashboard

# Show last N lines
docker logs --tail 100 aegis-dashboard

# Show logs since timestamp
docker logs --since 2026-01-25T10:00:00 aegis-dashboard

# Show logs with timestamps
docker logs -t aegis-dashboard

Log Drivers: Docker uses the default json-file driver. Logs are stored in:

/var/lib/docker/containers/<container_id>/<container_id>-json.log

System Logs

Systemd Logs (if running services via systemd):

# View systemd logs
journalctl -u aegis-dashboard.service

# Follow systemd logs
journalctl -u aegis-dashboard.service -f

# Show logs since boot
journalctl -u aegis-dashboard.service -b

Cron Job Logs

Crontab Logs:

# View cron logs
grep CRON /var/log/syslog

# Check specific job output
grep "transcript-ingestion" /home/agent/logs/transcript-ingestion.log

Active Cron Jobs:

# List current cron jobs
crontab -l

# Example cron log redirect
0 2 * * * cd /home/agent/projects/aegis-core && source .venv/bin/activate && python scripts/ingest_transcripts.py >> /home/agent/logs/transcript-ingestion.log 2>&1

Log Format

Structured Logging (structlog)

Aegis uses structlog for structured logging throughout the application.

Example Log Entry:

{
  "event": "health_check_completed",
  "timestamp": "2026-01-25T12:34:56.789Z",
  "level": "info",
  "logger": "aegis.scheduler",
  "status": "healthy",
  "duration_ms": 45.3,
  "checks_passed": 5,
  "checks_failed": 0
}

Log Levels: - DEBUG: Detailed diagnostic information - INFO: Informational messages (normal operation) - WARNING: Warning messages (potential issues) - ERROR: Error messages (failures that don't stop execution) - CRITICAL: Critical errors (system-threatening issues)

Custom Log Context

Adding Context to Logs:

import structlog

logger = structlog.get_logger()

# Log with context
logger.info(
    "task_completed",
    task_id="123",
    duration_ms=1234.5,
    status="success"
)

# Bind context for multiple log calls
log = logger.bind(request_id="abc-123")
log.info("request_received")
log.info("processing_complete")

Log Rotation

Current Configuration

No Automatic Rotation: Currently, logs in /home/agent/logs/ are NOT automatically rotated.

Manual Rotation (recommended during maintenance window):

# Rotate logs manually
cd /home/agent/logs
for log in *.log; do
  if [ -f "$log" ]; then
    mv "$log" "$log.$(date +%Y%m%d)"
    gzip "$log.$(date +%Y%m%d)"
    touch "$log"
  fi
done

# Clean up old logs (>30 days)
find /home/agent/logs -name "*.log.*.gz" -mtime +30 -delete

Setting Up logrotate

Create logrotate Configuration:

# Create config file
sudo tee /etc/logrotate.d/aegis <<EOF
/home/agent/logs/*.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    notifempty
    create 0644 agent agent
    sharedscripts
    postrotate
        # Optional: send notification
        /usr/bin/docker exec aegis-dashboard python -c "import structlog; structlog.get_logger().info('logs_rotated')" || true
    endscript
}
EOF

# Test configuration
sudo logrotate -d /etc/logrotate.d/aegis

# Force rotation (testing)
sudo logrotate -f /etc/logrotate.d/aegis

Docker Log Rotation

Configure in docker-compose.yml:

services:
  dashboard:
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "5"

Apply Changes:

cd /home/agent/projects/aegis-core
docker compose up -d --force-recreate dashboard

Error Tracking

Error MCP Tools

Aegis includes dedicated error tracking tools via the MCP server.

Record Errors:

# Log error with context
mcp__aegis__error_record(
    error_type="DatabaseConnectionError",
    context="Failed to connect to PostgreSQL during health check",
    strike_count=1,
    severity="warning"
)

Search Past Errors:

# Find similar errors
results = mcp__aegis__error_search(
    query="DatabaseConnectionError"
)

# Returns past occurrences, resolutions, patterns

Record Resolution:

# Document how error was fixed
mcp__aegis__error_resolution(
    error_id="err_123",
    resolution="Restarted PostgreSQL service",
    worked=True,
    lessons_learned="Database had reached max connections, needed to increase max_connections setting"
)

Recent Errors:

# Get recent errors for pattern detection
errors = mcp__aegis__error_recent(limit=50)

Three-Strike Protocol

Strike 1: Log error, retry with modified approach

mcp__aegis__error_record(
    error_type="APITimeoutError",
    context="OpenAI API timeout after 30s",
    strike_count=1
)

Strike 2: Escalate, use local model, search for similar errors

# Search for past resolutions
similar = mcp__aegis__error_search(query="APITimeoutError")

# Record escalation
mcp__aegis__error_record(
    error_type="APITimeoutError",
    context="Second timeout, switching to local model",
    strike_count=2,
    severity="error"
)

Strike 3: STOP, document, alert human

# Record critical error
mcp__aegis__error_record(
    error_type="APITimeoutError",
    context="Third consecutive timeout, halting task",
    strike_count=3,
    severity="critical"
)

# Alert to Discord
mcp__discord__discord_send_message(
    channel_id="1455049130614329508",
    content="🚨 Three-strike protocol triggered: APITimeoutError. Task halted, awaiting human input."
)

Log Analysis

Search Logs

Grep for Errors:

# Find all errors in last 24 hours
find /home/agent/logs -name "*.log" -mtime -1 -exec grep -H "ERROR" {} \;

# Count errors by type
grep ERROR /home/agent/logs/*.log | cut -d: -f3 | sort | uniq -c | sort -rn

# Find errors in specific log
grep ERROR /home/agent/logs/market-scheduler.log | tail -20

Analyze Patterns:

# Most common log messages
cat /home/agent/logs/*.log | cut -d: -f2- | sort | uniq -c | sort -rn | head -20

# Errors by hour
grep ERROR /home/agent/logs/*.log | cut -d' ' -f1,2 | cut -c1-13 | uniq -c

# Failed tasks
grep -i "failed\|error\|exception" /home/agent/logs/market-scheduler.log | wc -l

Log Aggregation

Combine Recent Logs:

# All logs from last hour
find /home/agent/logs -name "*.log" -mmin -60 -exec tail -n 100 {} \; > /tmp/recent_logs.txt

# All errors from today
grep ERROR /home/agent/logs/*.log | grep "$(date +%Y-%m-%d)" > /tmp/errors_today.txt

JSON Log Parsing:

# Extract specific fields (if logs are JSON)
grep "health_check" /home/agent/logs/*.log | jq -r '.event, .status, .duration_ms'

# Count events by type
cat /home/agent/logs/*.log | jq -r '.event' | sort | uniq -c | sort -rn

Performance Analysis

Slow Operations:

# Find operations >1000ms
grep "duration_ms" /home/agent/logs/*.log | awk -F'duration_ms[: ]*' '{print $2}' | awk '{if($1>1000) print}'

# Average duration by operation
grep "task_completed" /home/agent/logs/*.log | jq '.duration_ms' | awk '{sum+=$1; count++} END {print sum/count}'

Real-Time Monitoring

Tail Multiple Logs

Follow All Application Logs:

# Using tail with wildcard
tail -f /home/agent/logs/*.log

# Using multitail (if installed)
multitail /home/agent/logs/*.log

# Filter for errors only
tail -f /home/agent/logs/*.log | grep --line-buffered ERROR

Watch Specific Events

Monitor Health Checks:

# Watch health check results
tail -f /home/agent/logs/*.log | grep --line-buffered "health_check"

Monitor Scheduler Jobs:

# Watch scheduled job execution
docker logs -f aegis-scheduler | grep --line-buffered "Running job"

Log Shipping (Advanced)

Send Logs to External Service

Option 1: Syslog:

# Configure rsyslog to forward to external server
# /etc/rsyslog.d/aegis.conf
*.* @@logs.example.com:514

Option 2: Loki (Grafana):

# docker-compose.yml addition
services:
  promtail:
    image: grafana/promtail:latest
    volumes:
      - /home/agent/logs:/logs
      - ./promtail-config.yml:/etc/promtail/config.yml
    command: -config.file=/etc/promtail/config.yml

Option 3: CloudWatch Logs:

# Using boto3 to ship logs
import boto3
import watchtower

logger = structlog.get_logger()
logger.addHandler(watchtower.CloudWatchLogHandler())

Troubleshooting Log Issues

No Logs Appearing

Check Log Permissions:

# Verify log directory permissions
ls -la /home/agent/logs

# Fix permissions if needed
chown -R agent:agent /home/agent/logs
chmod 755 /home/agent/logs
chmod 644 /home/agent/logs/*.log

Check Disk Space:

# Check if disk is full
df -h /home/agent

# Check largest log files
du -sh /home/agent/logs/* | sort -rh | head -10

Log Flooding

Identify Culprit:

# Find fastest-growing logs
ls -ltrh /home/agent/logs/*.log | tail -5

# Watch log growth
watch -n 1 'ls -lh /home/agent/logs/*.log | tail -5'

Temporary Mitigation:

# Truncate large log
truncate -s 0 /home/agent/logs/market-scheduler.log

# Or rotate immediately
mv /home/agent/logs/market-scheduler.log /home/agent/logs/market-scheduler.log.$(date +%Y%m%d)
gzip /home/agent/logs/market-scheduler.log.$(date +%Y%m%d)
touch /home/agent/logs/market-scheduler.log

Missing Docker Logs

Check Log Driver:

# Inspect container logging config
docker inspect aegis-dashboard | jq '.[0].HostConfig.LogConfig'

# Check if logs are being written
docker logs aegis-dashboard --since 5m

Fix Lost Logs:

# Restart container to reset logging
cd /home/agent/projects/aegis-core && docker compose restart dashboard

# Or recreate container
cd /home/agent/projects/aegis-core && docker compose up -d --force-recreate dashboard

Best Practices

  1. Use Structured Logging: Always use structlog with JSON format
  2. Include Context: Add task_id, request_id, user_id to all logs
  3. Set Appropriate Levels: Don't log INFO as ERROR
  4. Rotate Regularly: Set up logrotate to prevent disk exhaustion
  5. Monitor Log Volume: Alert if logs grow >100MB/day unexpectedly
  6. Sanitize Logs: Never log API keys, passwords, or PII
  7. Centralize: Consider shipping logs to external service for long-term storage
  8. Index for Search: Use tools like Loki or Elasticsearch for searchability
  9. Retention Policy: Keep logs for at least 30 days, critical logs for 90 days
  10. Regular Review: Check logs during morning routine for anomalies