Skip to content

StackWiz MCP Server

Overview

StackWiz is Aegis's custom MCP server for deploying Docker services to the Dockerhost (10.10.10.10) with automatic Traefik routing, DNS management, and health monitoring. It provides a declarative interface for managing the production stack at rbnk.uk.

Tool Prefix: mcp__stackwiz__

Domain: rbnk.uk

Infrastructure: - Dockerhost: 10.10.10.10 - Traefik: Reverse proxy with automatic HTTPS - DNS: Cloudflare API integration

Core Concepts

Stacks

A stack is a deployed service with: - Name: Unique identifier (e.g., "aegis-dashboard") - Image: Docker image (e.g., "aegis/dashboard:latest") - Port: Internal port (e.g., 8080) - Domain: Subdomain under rbnk.uk (e.g., "aegis.rbnk.uk") - Environment: Environment variables - Volumes: Persistent storage - Networks: Docker networks - Health Check: HTTP endpoint for monitoring

Routing

Traefik automatically routes based on labels:

labels:
  - "traefik.enable=true"
  - "traefik.http.routers.aegis.rule=Host(`aegis.rbnk.uk`)"
  - "traefik.http.routers.aegis.tls.certresolver=letsencrypt"
  - "traefik.http.services.aegis.loadbalancer.server.port=8080"

DNS Management

StackWiz automatically creates/updates Cloudflare A records pointing to Dockerhost.

Operations

create_stack

mcp__stackwiz__create_stack(
    name: str,
    image: str,
    port: int,
    domain: str = None,
    environment: dict = None,
    volumes: list = None,
    networks: list = None,
    health_check: str = None,
    restart_policy: str = "unless-stopped"
)

Create and deploy a new service stack.

Example:

await mcp__stackwiz__create_stack(
    name="aegis-api",
    image="aegis/api:latest",
    port=8000,
    domain="api.aegis.rbnk.uk",
    environment={
        "DATABASE_URL": "postgresql://aegis:password@postgres:5432/aegis",
        "REDIS_URL": "redis://redis:6379/0",
        "ENV": "production"
    },
    volumes=[
        "/srv/aegis/data:/data",
        "/srv/aegis/logs:/logs"
    ],
    networks=["aegis-net"],
    health_check="/health"
)

What Happens: 1. Pulls image to Dockerhost 2. Creates container with Traefik labels 3. Starts container 4. Creates DNS A record (if domain provided) 5. Waits for health check (if provided) 6. Returns deployment status

Returns:

{
  "success": true,
  "stack_name": "aegis-api",
  "container_id": "abc123...",
  "url": "https://api.aegis.rbnk.uk",
  "status": "running",
  "health": "healthy"
}

update_stack

mcp__stackwiz__update_stack(
    name: str,
    image: str = None,
    environment: dict = None,
    volumes: list = None,
    recreate: bool = True
)

Update an existing stack. If recreate=True, stops old container and starts new one (zero-downtime via Traefik).

Example:

# Update image
await mcp__stackwiz__update_stack(
    name="aegis-api",
    image="aegis/api:v2.0.0",
    recreate=True
)

# Update environment
await mcp__stackwiz__update_stack(
    name="aegis-api",
    environment={"LOG_LEVEL": "debug"},
    recreate=True
)

delete_stack

mcp__stackwiz__delete_stack(
    name: str,
    remove_volumes: bool = False,
    remove_dns: bool = True
)

Delete a stack and optionally remove associated resources.

Example:

await mcp__stackwiz__delete_stack(
    name="old-service",
    remove_volumes=True,  # Delete persistent data
    remove_dns=True       # Remove DNS record
)

list_stacks

mcp__stackwiz__list_stacks(
    filter_status: str = None,  # "running", "stopped", "paused"
    filter_network: str = None
)

List all deployed stacks with status.

Returns:

{
  "stacks": [
    {
      "name": "aegis-dashboard",
      "image": "aegis/dashboard:latest",
      "status": "running",
      "url": "https://aegis.rbnk.uk",
      "uptime": "15d 6h 23m",
      "health": "healthy"
    },
    {
      "name": "aegis-api",
      "image": "aegis/api:v2.0.0",
      "status": "running",
      "url": "https://api.aegis.rbnk.uk",
      "uptime": "2h 15m",
      "health": "healthy"
    }
  ]
}

get_stack_status

mcp__stackwiz__get_stack_status(name: str)

Get detailed status for a specific stack.

Returns:

{
  "name": "aegis-dashboard",
  "status": "running",
  "health": "healthy",
  "container_id": "abc123...",
  "image": "aegis/dashboard:latest",
  "created_at": "2026-01-10T10:00:00Z",
  "uptime": "15d 6h 23m",
  "url": "https://aegis.rbnk.uk",
  "ports": ["8080:8080"],
  "networks": ["aegis-net"],
  "volumes": ["/srv/aegis/data:/data"],
  "environment": {
    "ENV": "production"
  },
  "resource_usage": {
    "cpu_percent": 2.5,
    "memory_usage": "256MB",
    "memory_limit": "512MB"
  }
}

restart_stack

mcp__stackwiz__restart_stack(name: str)

Restart a stack (graceful shutdown then start).

get_stack_logs

mcp__stackwiz__get_stack_logs(
    name: str,
    tail: int = 100,
    since: str = None,
    follow: bool = False
)

Get logs for a stack.

Example:

# Get last 100 lines
logs = await mcp__stackwiz__get_stack_logs("aegis-dashboard", tail=100)

# Get logs since timestamp
logs = await mcp__stackwiz__get_stack_logs(
    "aegis-api",
    since="2026-01-25T10:00:00Z"
)

DNS Management

create_dns_record

mcp__stackwiz__create_dns_record(
    subdomain: str,
    ip: str = "10.10.10.10",  # Defaults to Dockerhost
    type: str = "A",
    proxied: bool = False
)

Create a DNS record in Cloudflare.

Example:

# Create A record
await mcp__stackwiz__create_dns_record(
    subdomain="new-service",
    ip="10.10.10.10"
)
# Result: new-service.rbnk.uk → 10.10.10.10

update_dns_record

mcp__stackwiz__update_dns_record(
    subdomain: str,
    ip: str = None,
    proxied: bool = None
)

Update an existing DNS record.

delete_dns_record

mcp__stackwiz__delete_dns_record(subdomain: str)

Delete a DNS record.

list_dns_records

mcp__stackwiz__list_dns_records(filter: str = None)

List all DNS records for rbnk.uk.

Health Monitoring

health_check

mcp__stackwiz__health_check(
    name: str,
    endpoint: str = "/health"
)

Perform health check on a stack.

Returns:

{
  "stack_name": "aegis-dashboard",
  "healthy": true,
  "status_code": 200,
  "response_time_ms": 45,
  "checked_at": "2026-01-25T12:00:00Z"
}

health_check_all

mcp__stackwiz__health_check_all()

Run health checks on all stacks with configured health endpoints.

Returns:

{
  "healthy": 3,
  "unhealthy": 1,
  "unknown": 2,
  "stacks": [
    {
      "name": "aegis-dashboard",
      "healthy": true,
      "response_time_ms": 45
    },
    {
      "name": "aegis-api",
      "healthy": false,
      "error": "Connection timeout"
    }
  ]
}

Deployment Patterns

Standard Web Service

# Deploy a web application
await mcp__stackwiz__create_stack(
    name="web-app",
    image="myapp:latest",
    port=3000,
    domain="app.rbnk.uk",
    environment={
        "NODE_ENV": "production",
        "DATABASE_URL": "postgresql://..."
    },
    volumes=["/srv/app/uploads:/app/uploads"],
    health_check="/api/health"
)

API Service

# Deploy FastAPI backend
await mcp__stackwiz__create_stack(
    name="api-backend",
    image="api:v1.0.0",
    port=8000,
    domain="api.example.rbnk.uk",
    environment={
        "ENV": "production",
        "WORKERS": "4"
    },
    networks=["backend-net"],
    health_check="/health"
)

Database Service (No Public Domain)

# Deploy PostgreSQL (internal only)
await mcp__stackwiz__create_stack(
    name="postgres-db",
    image="postgres:16-alpine",
    port=5432,
    domain=None,  # No public domain
    environment={
        "POSTGRES_PASSWORD": "secure-password",
        "POSTGRES_DB": "appdb"
    },
    volumes=["/srv/postgres/data:/var/lib/postgresql/data"],
    networks=["backend-net"]
)

Blue-Green Deployment

# Deploy new version (green)
await mcp__stackwiz__create_stack(
    name="app-green",
    image="app:v2.0.0",
    port=3000,
    domain="app-green.rbnk.uk",
    health_check="/health"
)

# Wait for health check
status = await mcp__stackwiz__get_stack_status("app-green")
if status.health == "healthy":
    # Switch DNS to green
    await mcp__stackwiz__update_dns_record(
        subdomain="app",
        ip="10.10.10.10"  # Green container
    )

    # Delete old version (blue)
    await mcp__stackwiz__delete_stack("app-blue")

Integration with Aegis

Deployment Workflow

from aegis.workflows import run_workflow

# Run deployment workflow with approval gate
result = await run_workflow(
    "deployment-approval",
    {
        "service": "aegis-dashboard",
        "image": "aegis/dashboard:v2.0.0",
        "environment": "production"
    }
)

# If approved, deploy via StackWiz
if result.context.get("approved"):
    await mcp__stackwiz__update_stack(
        name="aegis-dashboard",
        image="aegis/dashboard:v2.0.0",
        recreate=True
    )

Automated Deployment

from aegis.agents import spawn_agent

# Spawn executor agent for deployment
result = await spawn_agent(
    task="Deploy new API version to production",
    template="executor",
    context={
        "service": "aegis-api",
        "image": "aegis/api:v2.1.0",
        "run_tests": True,
        "notify_discord": True
    }
)

Health Monitoring

# In /home/agent/.claude/commands/morning.md

# Check all stack health
health = await mcp__stackwiz__health_check_all()

if health.unhealthy > 0:
    await mcp__discord__discord_send(
        channel_id=ALERTS_CHANNEL,
        content=f"⚠️ {health.unhealthy} unhealthy services detected"
    )

Configuration

Cloudflare API

StackWiz requires Cloudflare credentials in ~/.secure/:

# ~/.secure/cloudflare.env
export CLOUDFLARE_API_TOKEN="your-api-token"
export CLOUDFLARE_ZONE_ID="rbnk.uk-zone-id"
{
  "mcpServers": {
    "stackwiz": {
      "command": "python",
      "args": ["-m", "stackwiz_mcp.server"],
      "env": {
        "CLOUDFLARE_API_TOKEN_FILE": "/home/agent/.secure/cloudflare.env"
      }
    }
  }
}

Dockerhost SSH

StackWiz uses SSH to connect to Dockerhost:

# ~/.ssh/config
Host dockerhost
    HostName 10.10.10.10
    User aegis
    IdentityFile ~/.ssh/id_ed25519_github
    StrictHostKeyChecking no

Traefik Configuration

On Dockerhost, Traefik is configured at:

/srv/dockerdata/traefik/traefik.yml
/srv/dockerdata/traefik/dynamic/

Security Considerations

1. SSH Key Management

StackWiz uses SSH for Dockerhost access. Ensure key is secure:

chmod 600 ~/.ssh/id_ed25519_github

2. Environment Variables

Never commit secrets to Git. Use environment variables:

await create_stack(
    name="app",
    environment={
        "API_KEY": os.environ["APP_API_KEY"],  # From ~/.secure/
        "DB_PASSWORD": os.environ["DB_PASSWORD"]
    }
)

3. Network Isolation

Use Docker networks to isolate services:

# Backend services on private network
await create_stack(
    name="api",
    networks=["backend-net"],  # Not accessible publicly
    domain=None
)

# Frontend on public network
await create_stack(
    name="web",
    networks=["frontend-net", "backend-net"],
    domain="app.rbnk.uk"
)

4. Volume Permissions

Set correct permissions on mounted volumes:

# On Dockerhost
sudo chown -R 1000:1000 /srv/app/data

Performance Tips

1. Image Size

Keep images small for faster deployments:

# Use multi-stage builds
FROM python:3.11-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

FROM python:3.11-slim
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY . .
CMD ["python", "app.py"]

2. Health Checks

Define health checks for faster failure detection:

await create_stack(
    name="api",
    health_check="/health",
    # Traefik will route traffic when healthy
)

3. Resource Limits

Set resource limits to prevent runaway containers:

await create_stack(
    name="app",
    image="app:latest",
    # Add to container config
    mem_limit="512m",
    cpus="0.5"
)

Troubleshooting

Deployment Fails

# Check stack logs
logs = await mcp__stackwiz__get_stack_logs("failed-service", tail=200)

# Check stack status
status = await mcp__stackwiz__get_stack_status("failed-service")

# Delete and redeploy
await mcp__stackwiz__delete_stack("failed-service", remove_volumes=True)
await mcp__stackwiz__create_stack(...)

DNS Not Resolving

# Check DNS record exists
dig +short subdomain.rbnk.uk

# Manually create if missing
await mcp__stackwiz__create_dns_record("subdomain")

Health Check Fails

# Check health endpoint manually
import httpx
response = await httpx.get("https://service.rbnk.uk/health")

# Check Traefik logs on Dockerhost
ssh dockerhost "docker logs traefik | tail -100"

Connection Timeout

# Test SSH connection
ssh dockerhost "echo 'Connected'"

# Check Dockerhost firewall
ssh dockerhost "sudo ufw status"

Next Steps

References