Logging¶

Comprehensive guide to Aegis logging infrastructure, log locations, and analysis.

Log Locations¶

Application Logs¶

Directory: /home/agent/logs/

Log File	Purpose	Rotation	Typical Size
balance-check.log	Wallet balance monitoring	Daily	~170KB
claude-session-worker.log	Claude session management	Daily	~50KB
claude_worker.log	Claude worker tasks	Daily	~13KB
feed-health.log	RSS feed health checks	Daily	~12KB
ideas-channel.log	Discord #ideas monitoring	Daily	~110KB
market-scheduler.log	Market intelligence jobs	Daily	~760KB
memory-consolidation.log	Knowledge graph sync	Daily	~8KB
proactive-feedback.log	Proactive learning feedback	Daily	~300B
proactive-news-scan.log	News aggregation	Daily	~5KB
proactive-pattern-learning.log	Pattern detection	Daily	~1KB
profile-analyzer.log	Profile analysis jobs	Daily	~100B
profile-channel.log	Discord profile analysis	Daily	~32KB
transcript-ingestion.log	Journal/history ingestion	Daily	~32KB

Docker Container Logs¶

Access via Docker:

# View container logs
docker logs aegis-dashboard
docker logs aegis-scheduler
docker logs aegis-playwright
docker logs falkordb

# Follow logs in real-time
docker logs -f aegis-dashboard

# Show last N lines
docker logs --tail 100 aegis-dashboard

# Show logs since timestamp
docker logs --since 2026-01-25T10:00:00 aegis-dashboard

# Show logs with timestamps
docker logs -t aegis-dashboard

Log Drivers: Docker uses the default json-file driver. Logs are stored in:

/var/lib/docker/containers/<container_id>/<container_id>-json.log

System Logs¶

Systemd Logs (if running services via systemd):

# View systemd logs
journalctl -u aegis-dashboard.service

# Follow systemd logs
journalctl -u aegis-dashboard.service -f

# Show logs since boot
journalctl -u aegis-dashboard.service -b

Cron Job Logs¶

Crontab Logs:

# View cron logs
grep CRON /var/log/syslog

# Check specific job output
grep "transcript-ingestion" /home/agent/logs/transcript-ingestion.log

Active Cron Jobs:

# List current cron jobs
crontab -l

# Example cron log redirect
0 2 * * * cd /home/agent/projects/aegis-core && source .venv/bin/activate && python scripts/ingest_transcripts.py >> /home/agent/logs/transcript-ingestion.log 2>&1

Log Format¶

Structured Logging (structlog)¶

Aegis uses structlog for structured logging throughout the application.

Example Log Entry:

{
  "event": "health_check_completed",
  "timestamp": "2026-01-25T12:34:56.789Z",
  "level": "info",
  "logger": "aegis.scheduler",
  "status": "healthy",
  "duration_ms": 45.3,
  "checks_passed": 5,
  "checks_failed": 0
}

Log Levels: - DEBUG: Detailed diagnostic information - INFO: Informational messages (normal operation) - WARNING: Warning messages (potential issues) - ERROR: Error messages (failures that don't stop execution) - CRITICAL: Critical errors (system-threatening issues)

Custom Log Context¶

Adding Context to Logs:

import structlog

logger = structlog.get_logger()

# Log with context
logger.info(
    "task_completed",
    task_id="123",
    duration_ms=1234.5,
    status="success"
)

# Bind context for multiple log calls
log = logger.bind(request_id="abc-123")
log.info("request_received")
log.info("processing_complete")

Log Rotation¶

Current Configuration¶

No Automatic Rotation: Currently, logs in /home/agent/logs/ are NOT automatically rotated.

Manual Rotation (recommended during maintenance window):

# Rotate logs manually
cd /home/agent/logs
for log in *.log; do
  if [ -f "$log" ]; then
    mv "$log" "$log.$(date +%Y%m%d)"
    gzip "$log.$(date +%Y%m%d)"
    touch "$log"
  fi
done

# Clean up old logs (>30 days)
find /home/agent/logs -name "*.log.*.gz" -mtime +30 -delete

Setting Up logrotate¶

Create logrotate Configuration:

# Create config file
sudo tee /etc/logrotate.d/aegis <<EOF
/home/agent/logs/*.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    notifempty
    create 0644 agent agent
    sharedscripts
    postrotate
        # Optional: send notification
        /usr/bin/docker exec aegis-dashboard python -c "import structlog; structlog.get_logger().info('logs_rotated')" || true
    endscript
}
EOF

# Test configuration
sudo logrotate -d /etc/logrotate.d/aegis

# Force rotation (testing)
sudo logrotate -f /etc/logrotate.d/aegis

Docker Log Rotation¶

Configure in docker-compose.yml:

services:
  dashboard:
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "5"

Apply Changes:

cd /home/agent/projects/aegis-core
docker compose up -d --force-recreate dashboard

Error Tracking¶

Error MCP Tools¶

Aegis includes dedicated error tracking tools via the MCP server.

Record Errors:

# Log error with context
mcp__aegis__error_record(
    error_type="DatabaseConnectionError",
    context="Failed to connect to PostgreSQL during health check",
    strike_count=1,
    severity="warning"
)

Search Past Errors:

# Find similar errors
results = mcp__aegis__error_search(
    query="DatabaseConnectionError"
)

# Returns past occurrences, resolutions, patterns

Record Resolution:

# Document how error was fixed
mcp__aegis__error_resolution(
    error_id="err_123",
    resolution="Restarted PostgreSQL service",
    worked=True,
    lessons_learned="Database had reached max connections, needed to increase max_connections setting"
)

Recent Errors:

# Get recent errors for pattern detection
errors = mcp__aegis__error_recent(limit=50)

Three-Strike Protocol¶

Strike 1: Log error, retry with modified approach

mcp__aegis__error_record(
    error_type="APITimeoutError",
    context="OpenAI API timeout after 30s",
    strike_count=1
)

Strike 2: Escalate, use local model, search for similar errors

# Search for past resolutions
similar = mcp__aegis__error_search(query="APITimeoutError")

# Record escalation
mcp__aegis__error_record(
    error_type="APITimeoutError",
    context="Second timeout, switching to local model",
    strike_count=2,
    severity="error"
)

Strike 3: STOP, document, alert human

# Record critical error
mcp__aegis__error_record(
    error_type="APITimeoutError",
    context="Third consecutive timeout, halting task",
    strike_count=3,
    severity="critical"
)

# Alert to Discord
mcp__discord__discord_send_message(
    channel_id="1455049130614329508",
    content="🚨 Three-strike protocol triggered: APITimeoutError. Task halted, awaiting human input."
)

Log Analysis¶

Search Logs¶

Grep for Errors:

# Find all errors in last 24 hours
find /home/agent/logs -name "*.log" -mtime -1 -exec grep -H "ERROR" {} \;

# Count errors by type
grep ERROR /home/agent/logs/*.log | cut -d: -f3 | sort | uniq -c | sort -rn

# Find errors in specific log
grep ERROR /home/agent/logs/market-scheduler.log | tail -20

Analyze Patterns:

# Most common log messages
cat /home/agent/logs/*.log | cut -d: -f2- | sort | uniq -c | sort -rn | head -20

# Errors by hour
grep ERROR /home/agent/logs/*.log | cut -d' ' -f1,2 | cut -c1-13 | uniq -c

# Failed tasks
grep -i "failed\|error\|exception" /home/agent/logs/market-scheduler.log | wc -l

Log Aggregation¶

Combine Recent Logs:

# All logs from last hour
find /home/agent/logs -name "*.log" -mmin -60 -exec tail -n 100 {} \; > /tmp/recent_logs.txt

# All errors from today
grep ERROR /home/agent/logs/*.log | grep "$(date +%Y-%m-%d)" > /tmp/errors_today.txt

JSON Log Parsing:

# Extract specific fields (if logs are JSON)
grep "health_check" /home/agent/logs/*.log | jq -r '.event, .status, .duration_ms'

# Count events by type
cat /home/agent/logs/*.log | jq -r '.event' | sort | uniq -c | sort -rn

Performance Analysis¶

Slow Operations:

# Find operations >1000ms
grep "duration_ms" /home/agent/logs/*.log | awk -F'duration_ms[: ]*' '{print $2}' | awk '{if($1>1000) print}'

# Average duration by operation
grep "task_completed" /home/agent/logs/*.log | jq '.duration_ms' | awk '{sum+=$1; count++} END {print sum/count}'

Real-Time Monitoring¶

Tail Multiple Logs¶

Follow All Application Logs:

# Using tail with wildcard
tail -f /home/agent/logs/*.log

# Using multitail (if installed)
multitail /home/agent/logs/*.log

# Filter for errors only
tail -f /home/agent/logs/*.log | grep --line-buffered ERROR

Watch Specific Events¶

Monitor Health Checks:

# Watch health check results
tail -f /home/agent/logs/*.log | grep --line-buffered "health_check"

Monitor Scheduler Jobs:

# Watch scheduled job execution
docker logs -f aegis-scheduler | grep --line-buffered "Running job"

Log Shipping (Advanced)¶

Send Logs to External Service¶

Option 1: Syslog:

# Configure rsyslog to forward to external server
# /etc/rsyslog.d/aegis.conf
*.* @@logs.example.com:514

Option 2: Loki (Grafana):

# docker-compose.yml addition
services:
  promtail:
    image: grafana/promtail:latest
    volumes:
      - /home/agent/logs:/logs
      - ./promtail-config.yml:/etc/promtail/config.yml
    command: -config.file=/etc/promtail/config.yml

Option 3: CloudWatch Logs:

# Using boto3 to ship logs
import boto3
import watchtower

logger = structlog.get_logger()
logger.addHandler(watchtower.CloudWatchLogHandler())

Troubleshooting Log Issues¶

No Logs Appearing¶

Check Log Permissions:

# Verify log directory permissions
ls -la /home/agent/logs

# Fix permissions if needed
chown -R agent:agent /home/agent/logs
chmod 755 /home/agent/logs
chmod 644 /home/agent/logs/*.log

Check Disk Space:

# Check if disk is full
df -h /home/agent

# Check largest log files
du -sh /home/agent/logs/* | sort -rh | head -10

Log Flooding¶

Identify Culprit:

# Find fastest-growing logs
ls -ltrh /home/agent/logs/*.log | tail -5

# Watch log growth
watch -n 1 'ls -lh /home/agent/logs/*.log | tail -5'

Temporary Mitigation:

# Truncate large log
truncate -s 0 /home/agent/logs/market-scheduler.log

# Or rotate immediately
mv /home/agent/logs/market-scheduler.log /home/agent/logs/market-scheduler.log.$(date +%Y%m%d)
gzip /home/agent/logs/market-scheduler.log.$(date +%Y%m%d)
touch /home/agent/logs/market-scheduler.log

Missing Docker Logs¶

Check Log Driver:

# Inspect container logging config
docker inspect aegis-dashboard | jq '.[0].HostConfig.LogConfig'

# Check if logs are being written
docker logs aegis-dashboard --since 5m

Fix Lost Logs:

# Restart container to reset logging
cd /home/agent/projects/aegis-core && docker compose restart dashboard

# Or recreate container
cd /home/agent/projects/aegis-core && docker compose up -d --force-recreate dashboard

Best Practices¶

Use Structured Logging: Always use structlog with JSON format
Include Context: Add task_id, request_id, user_id to all logs
Set Appropriate Levels: Don't log INFO as ERROR
Rotate Regularly: Set up logrotate to prevent disk exhaustion
Monitor Log Volume: Alert if logs grow >100MB/day unexpectedly
Sanitize Logs: Never log API keys, passwords, or PII
Centralize: Consider shipping logs to external service for long-term storage
Index for Search: Use tools like Loki or Elasticsearch for searchability
Retention Policy: Keep logs for at least 30 days, critical logs for 90 days
Regular Review: Check logs during morning routine for anomalies

Monitoring - Health checks and alerting
Troubleshooting - Common issues and solutions
Maintenance - Routine maintenance tasks