Database Architecture¶

Overview¶

Aegis uses PostgreSQL 16 with pgvector extension as the primary relational database, providing structured storage, vector similarity search, and full-text search capabilities.

Database Stack¶

PostgreSQL 16¶

Installation: Native (apt package, not containerized) Host: localhost (LXC host) Port: 5432 Database: aegis User: agent

Version Info:

SELECT version();
-- PostgreSQL 16.x on x86_64-pc-linux-gnu

Why Not Containerized? - Better performance (direct filesystem access) - Easier backups (native pg_dump) - Simpler resource management - Persistent across container restarts

pgvector Extension¶

Version: 0.5.1+ Purpose: Vector similarity search for embeddings

Installation:

CREATE EXTENSION IF NOT EXISTS vector;

Capabilities: - Store embeddings as vector(N) type - Cosine similarity search (<=> operator) - Euclidean distance search (<-> operator) - Inner product search (<#> operator) - HNSW indexing for fast approximate nearest neighbor search

Database Size¶

Metric	Value
Total Database Size	~150 MB
Largest Table	container_metrics (58 MB)
Total Rows	~50K across all tables
Indexes	~80 indexes
Extensions	pgvector, uuid-ossp, pg_stat_statements

Schema Organization¶

Memory Tables¶

episodic_memory¶

Purpose: Event logs, interactions, experiences with temporal ordering

Schema:

CREATE TABLE episodic_memory (
    id SERIAL PRIMARY KEY,
    timestamp TIMESTAMPTZ DEFAULT NOW(),
    event_type VARCHAR(100) NOT NULL,
    summary TEXT NOT NULL,
    details JSONB,
    context JSONB,
    importance memory_importance DEFAULT 'medium',
    embedding vector(1536),
    tags TEXT[],
    related_memories INTEGER[],
    created_at TIMESTAMPTZ DEFAULT NOW(),
    decision TEXT,
    metadata JSONB,
    graphiti_synced_at TIMESTAMPTZ
);

Indexes: - Primary key: id - HNSW vector index: embedding vector_cosine_ops (for semantic search) - B-tree: timestamp DESC (recent events) - B-tree: event_type (filter by type) - B-tree: importance (priority queries) - GIN: tags (tag-based search)

Row Count: ~18,682 records Disk Size: ~10 MB

Common Event Types: - task_complete: Task completion events - decision: Decision tracking (linked to decision_outcomes) - error: Error occurrences - learning: Lessons learned - milestone: Significant achievements

Importance Enum:

CREATE TYPE memory_importance AS ENUM ('low', 'medium', 'high', 'critical');

semantic_memory¶

Purpose: Factual knowledge, concepts, learnings

Schema:

CREATE TABLE semantic_memory (
    id SERIAL PRIMARY KEY,
    concept VARCHAR(200) NOT NULL,
    content TEXT NOT NULL,
    category VARCHAR(100),
    source VARCHAR(200),
    confidence FLOAT DEFAULT 1.0,
    metadata JSONB,
    tags TEXT[],
    embedding vector(1536),
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

Indexes: - Primary key: id - HNSW vector index: embedding - B-tree: concept (exact lookup) - B-tree: category (categorical queries) - B-tree: updated_at DESC (recent updates)

Row Count: ~500 records Disk Size: ~240 KB

decision_outcomes¶

Purpose: Track decisions with alternatives and outcomes

Schema:

CREATE TABLE decision_outcomes (
    id SERIAL PRIMARY KEY,
    decision_id INTEGER REFERENCES episodic_memory(id),
    task_context TEXT NOT NULL,
    alternatives JSONB NOT NULL,
    chosen_option TEXT NOT NULL,
    reasoning TEXT NOT NULL,
    outcome_status VARCHAR(20) DEFAULT 'pending',
    outcome_details JSONB,
    lessons_learned TEXT,
    user_feedback VARCHAR(20),
    created_at TIMESTAMPTZ DEFAULT NOW(),
    resolved_at TIMESTAMPTZ
);

Outcome Status Values: pending, success, partial, failure User Feedback Values: positive, negative, neutral

API & Monetization Tables¶

api_keys¶

Purpose: API key management for customer access

Schema:

CREATE TABLE api_keys (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    customer_id VARCHAR(100) NOT NULL,
    key_hash TEXT NOT NULL UNIQUE,
    key_prefix VARCHAR(20) NOT NULL,
    name VARCHAR(200),
    tier VARCHAR(50) DEFAULT 'free',
    rate_limit INTEGER DEFAULT 60,
    monthly_quota INTEGER DEFAULT 1000,
    enabled BOOLEAN DEFAULT TRUE,
    expires_at TIMESTAMPTZ,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    last_used_at TIMESTAMPTZ
);

Indexes: - Primary key: id - Unique: key_hash (fast API key lookup) - B-tree: customer_id (customer queries) - B-tree: enabled (active keys only)

Disk Size: ~2.3 MB

Tiers: free, developer, pro, enterprise

api_usage_events¶

Purpose: Track API usage for billing and rate limiting

Schema:

CREATE TABLE api_usage_events (
    id BIGSERIAL PRIMARY KEY,
    api_key_id UUID REFERENCES api_keys(id),
    customer_id VARCHAR(100) NOT NULL,
    endpoint VARCHAR(200) NOT NULL,
    method VARCHAR(10) NOT NULL,
    status_code INTEGER NOT NULL,
    response_time_ms INTEGER,
    tokens_used INTEGER DEFAULT 0,
    cost_usd DECIMAL(10, 6) DEFAULT 0,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

Partitioning: By month (future optimization) Retention: 13 months (rolling)

Indexes: - Primary key: id - B-tree: api_key_id (per-key metrics) - B-tree: customer_id (per-customer billing) - B-tree: created_at DESC (time-series queries)

billing_events¶

Purpose: Track billable events and charges

Schema:

CREATE TABLE billing_events (
    id BIGSERIAL PRIMARY KEY,
    customer_id VARCHAR(100) NOT NULL,
    event_type VARCHAR(50) NOT NULL,
    amount_usd DECIMAL(10, 2) NOT NULL,
    currency VARCHAR(3) DEFAULT 'USD',
    description TEXT,
    metadata JSONB,
    stripe_invoice_id VARCHAR(100),
    created_at TIMESTAMPTZ DEFAULT NOW()
);

Event Types: api_usage, subscription, overage, credit

System Tables¶

claude_sessions¶

Purpose: Track Claude Code sessions and commands

Schema:

CREATE TABLE claude_sessions (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    session_id VARCHAR(100) UNIQUE NOT NULL,
    user_id VARCHAR(100),
    command TEXT,
    status VARCHAR(20) DEFAULT 'running',
    started_at TIMESTAMPTZ DEFAULT NOW(),
    ended_at TIMESTAMPTZ,
    metadata JSONB
);

Disk Size: ~232 KB

container_metrics¶

Purpose: Docker container resource usage over time

Schema:

CREATE TABLE container_metrics (
    id BIGSERIAL PRIMARY KEY,
    container_name VARCHAR(100) NOT NULL,
    cpu_percent FLOAT,
    mem_usage_mb FLOAT,
    mem_percent FLOAT,
    net_io_mb FLOAT,
    block_io_mb FLOAT,
    timestamp TIMESTAMPTZ DEFAULT NOW()
);

Disk Size: ~58 MB (largest table) Row Count: ~500K records Retention: 30 days (rolling)

Indexes: - Primary key: id - B-tree: container_name, timestamp DESC (time-series per container)

infra_anomaly_alerts¶

Purpose: Infrastructure anomaly detection alerts

Disk Size: ~11 MB Purpose: Track unusual system behavior (CPU spikes, memory leaks, disk issues)

Workflow Tables¶

workflows¶

Purpose: Store workflow definitions and execution state

Schema:

CREATE TABLE workflows (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    name VARCHAR(100) NOT NULL,
    graph JSONB NOT NULL,
    status VARCHAR(20) DEFAULT 'active',
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

workflow_executions¶

Purpose: Track workflow execution history

Schema:

CREATE TABLE workflow_executions (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    workflow_id UUID REFERENCES workflows(id),
    context JSONB NOT NULL,
    current_node VARCHAR(100),
    status VARCHAR(20) DEFAULT 'running',
    error TEXT,
    started_at TIMESTAMPTZ DEFAULT NOW(),
    completed_at TIMESTAMPTZ
);

Status Values: running, completed, failed, interrupted

Research & Intel Tables¶

deep_briefing_jobs¶

Purpose: Track Deep Research jobs (Gemini/Perplexity integration)

Schema:

CREATE TABLE deep_briefing_jobs (
    id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    query TEXT NOT NULL,
    status VARCHAR(20) DEFAULT 'pending',
    result JSONB,
    provider VARCHAR(50),
    cost_usd DECIMAL(10, 4),
    created_at TIMESTAMPTZ DEFAULT NOW(),
    completed_at TIMESTAMPTZ
);

Providers: gemini, perplexity, claude

competitors¶

Purpose: Track competitor websites for monitoring

Schema:

CREATE TABLE competitors (
    id SERIAL PRIMARY KEY,
    name VARCHAR(200) NOT NULL,
    domain VARCHAR(200) NOT NULL,
    category VARCHAR(100),
    last_checked_at TIMESTAMPTZ,
    enabled BOOLEAN DEFAULT TRUE,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

competitor_pages¶

Purpose: Track specific competitor pages (pricing, features, etc.)

Schema:

CREATE TABLE competitor_pages (
    id SERIAL PRIMARY KEY,
    competitor_id INTEGER REFERENCES competitors(id),
    url TEXT NOT NULL,
    page_type VARCHAR(50),
    content_hash VARCHAR(64),
    screenshot_url TEXT,
    last_checked_at TIMESTAMPTZ,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

Page Types: pricing, features, homepage, docs, blog

competitor_changes¶

Purpose: Track detected changes to competitor pages

Schema:

CREATE TABLE competitor_changes (
    id SERIAL PRIMARY KEY,
    page_id INTEGER REFERENCES competitor_pages(id),
    change_type VARCHAR(50),
    old_value TEXT,
    new_value TEXT,
    diff TEXT,
    detected_at TIMESTAMPTZ DEFAULT NOW()
);

Performance Optimization¶

Indexing Strategy¶

HNSW Vector Indexes: - Used for approximate nearest neighbor search - 10x faster than sequential scan for similarity queries - Trade-off: 90%+ accuracy vs. 100% (IVFFlat)

B-tree Indexes: - Primary keys and foreign keys (automatic) - Timestamp columns (time-series queries) - Frequently filtered columns (event_type, customer_id)

GIN Indexes: - Array columns (tags) - JSONB columns (when querying specific keys)

Partial Indexes (future optimization):

CREATE INDEX idx_active_api_keys ON api_keys(customer_id) WHERE enabled = TRUE;

Query Optimization¶

Explain Analyze Example:

EXPLAIN ANALYZE
SELECT * FROM episodic_memory
WHERE embedding <=> '[0.1, 0.2, ...]'::vector < 0.5
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;

Typical Plan:

Limit  (cost=10.23..12.45 rows=10 width=...)
  -> Index Scan using idx_episodic_embedding on episodic_memory
     Order By: embedding <=> '...'::vector
     Filter: (embedding <=> '...'::vector) < 0.5

Slow Query Logging:

-- Enable slow query log (queries > 1 second)
ALTER DATABASE aegis SET log_min_duration_statement = 1000;

Connection Pooling¶

pgBouncer: Not currently used Application-Level Pooling: psycopg2 connection pool - Min connections: 2 - Max connections: 10 - Idle timeout: 300 seconds

Configuration (Python):

from psycopg2 import pool

db_pool = pool.ThreadedConnectionPool(
    minconn=2,
    maxconn=10,
    host='localhost',
    port=5432,
    database='aegis',
    user='agent',
    password='agent'
)

Backup Strategy¶

Nightly Backups¶

Schedule: Daily at 02:00 UTC

Script (~/scripts/backup_db.sh):

#!/bin/bash
DATE=$(date +%Y-%m-%d)
BACKUP_DIR="/home/agent/backups/postgres"
mkdir -p $BACKUP_DIR

pg_dump -h localhost -U agent -d aegis \
  -F c -b -v -f "$BACKUP_DIR/aegis_$DATE.dump"

# Retain last 14 days
find $BACKUP_DIR -name "aegis_*.dump" -mtime +14 -delete

Backup Format: Custom format (compressed) Typical Size: ~30 MB compressed Retention: 14 days local, 90 days remote (Hetzner Storage Box)

Point-in-Time Recovery¶

WAL Archiving: Not currently enabled Future Enhancement: Continuous WAL archiving to S3-compatible storage

Restore Procedure¶

Full Restore:

# Drop and recreate database
dropdb -h localhost -U agent aegis
createdb -h localhost -U agent aegis

# Restore from backup
pg_restore -h localhost -U agent -d aegis \
  /home/agent/backups/postgres/aegis_2024-01-25.dump

# Verify
psql -h localhost -U agent -d aegis -c "SELECT COUNT(*) FROM episodic_memory;"

Selective Restore:

# Restore single table
pg_restore -h localhost -U agent -d aegis \
  -t episodic_memory \
  /home/agent/backups/postgres/aegis_2024-01-25.dump

Security¶

Authentication¶

Method: Password-based (local connections only) pg_hba.conf:

# TYPE  DATABASE  USER   ADDRESS       METHOD
local   all       agent                peer
host    all       agent  127.0.0.1/32  md5
host    all       agent  ::1/128       md5

Why No Remote Access? - Database on same host as application - Reduced attack surface - Better performance (no network overhead)

Encryption¶

At Rest: Not enabled (LXC filesystem encryption provides this) In Transit: Not applicable (local-only connections)

Future: Enable pgcrypto for sensitive column encryption

Permissions¶

Database Owner: agent Application User: agent (same user, simplified security model)

Schema Permissions:

GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO agent;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO agent;

Audit Logging¶

pg_stat_statements Extension:

CREATE EXTENSION pg_stat_statements;

-- View top 10 slowest queries
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

Maintenance¶

Vacuum and Analyze¶

Auto Vacuum: Enabled (default PostgreSQL settings) Manual Vacuum (weekly):

psql -h localhost -U agent -d aegis -c "VACUUM ANALYZE;"

Full Vacuum (monthly):

psql -h localhost -U agent -d aegis -c "VACUUM FULL ANALYZE;"

Reindexing¶

Monthly Reindex (to reduce index bloat):

psql -h localhost -U agent -d aegis -c "REINDEX DATABASE aegis;"

Concurrent Reindex (zero downtime):

REINDEX INDEX CONCURRENTLY idx_episodic_embedding;

Statistics Updates¶

Analyze After Bulk Inserts:

ANALYZE episodic_memory;

Migration Management¶

Schema Versioning¶

Tool: Alembic (SQLAlchemy migrations) Location: /home/agent/projects/aegis-core/alembic/

Current Version: Check with:

cd /home/agent/projects/aegis-core
alembic current

Create Migration:

alembic revision --autogenerate -m "Add new table"

Apply Migration:

alembic upgrade head

Rollback Migration:

alembic downgrade -1

Manual Schema Changes¶

Always Use Migrations: Never apply manual DDL in production

Emergency Rollback Procedure:

# Restore from backup
pg_restore -h localhost -U agent -d aegis \
  --clean --if-exists \
  /home/agent/backups/postgres/aegis_BEFORE_MIGRATION.dump

Monitoring¶

Key Metrics¶

Database Size:

SELECT pg_size_pretty(pg_database_size('aegis'));

Table Sizes:

SELECT schemaname, tablename,
       pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;

Active Connections:

SELECT COUNT(*) FROM pg_stat_activity WHERE state = 'active';

Cache Hit Ratio:

SELECT
  sum(heap_blks_hit) / nullif(sum(heap_blks_hit) + sum(heap_blks_read), 0) AS cache_hit_ratio
FROM pg_statio_user_tables;

Target: > 0.99 (99% cache hit rate)

Alerts¶

Trigger Conditions: - Connection count > 80% of max_connections - Cache hit ratio < 95% - Slow queries > 5 seconds - Database size > 80% of allocated storage - Replication lag > 1 minute (when replication enabled)

Future Database Improvements¶

Q1 2026: Performance¶

Enable connection pooling (pgBouncer)
Implement table partitioning (api_usage_events by month)
Add materialized views for common aggregations
Optimize vector indexes (tune HNSW parameters)

Q2 2026: High Availability¶

Set up streaming replication (read replica)
Enable WAL archiving (point-in-time recovery)
Implement automatic failover (Patroni)
Multi-region replication (future)

Q3 2026: Advanced Features¶

Full-text search with pg_search
Time-series optimization (TimescaleDB extension)
Graph queries (Apache AGE extension)
Column-level encryption (pgcrypto)

Q4 2026: Data Warehouse¶

Separate OLAP database (ClickHouse/Snowflake)
ETL pipeline for analytics
Data lake (S3 + Parquet)
Business intelligence dashboards