StackWiz MCP Server¶
Overview¶
StackWiz is Aegis's custom MCP server for deploying Docker services to the Dockerhost (10.10.10.10) with automatic Traefik routing, DNS management, and health monitoring. It provides a declarative interface for managing the production stack at rbnk.uk.
Tool Prefix: mcp__stackwiz__
Domain: rbnk.uk
Infrastructure: - Dockerhost: 10.10.10.10 - Traefik: Reverse proxy with automatic HTTPS - DNS: Cloudflare API integration
Core Concepts¶
Stacks¶
A stack is a deployed service with: - Name: Unique identifier (e.g., "aegis-dashboard") - Image: Docker image (e.g., "aegis/dashboard:latest") - Port: Internal port (e.g., 8080) - Domain: Subdomain under rbnk.uk (e.g., "aegis.rbnk.uk") - Environment: Environment variables - Volumes: Persistent storage - Networks: Docker networks - Health Check: HTTP endpoint for monitoring
Routing¶
Traefik automatically routes based on labels:
labels:
- "traefik.enable=true"
- "traefik.http.routers.aegis.rule=Host(`aegis.rbnk.uk`)"
- "traefik.http.routers.aegis.tls.certresolver=letsencrypt"
- "traefik.http.services.aegis.loadbalancer.server.port=8080"
DNS Management¶
StackWiz automatically creates/updates Cloudflare A records pointing to Dockerhost.
Operations¶
create_stack¶
mcp__stackwiz__create_stack(
name: str,
image: str,
port: int,
domain: str = None,
environment: dict = None,
volumes: list = None,
networks: list = None,
health_check: str = None,
restart_policy: str = "unless-stopped"
)
Create and deploy a new service stack.
Example:
await mcp__stackwiz__create_stack(
name="aegis-api",
image="aegis/api:latest",
port=8000,
domain="api.aegis.rbnk.uk",
environment={
"DATABASE_URL": "postgresql://aegis:password@postgres:5432/aegis",
"REDIS_URL": "redis://redis:6379/0",
"ENV": "production"
},
volumes=[
"/srv/aegis/data:/data",
"/srv/aegis/logs:/logs"
],
networks=["aegis-net"],
health_check="/health"
)
What Happens: 1. Pulls image to Dockerhost 2. Creates container with Traefik labels 3. Starts container 4. Creates DNS A record (if domain provided) 5. Waits for health check (if provided) 6. Returns deployment status
Returns:
{
"success": true,
"stack_name": "aegis-api",
"container_id": "abc123...",
"url": "https://api.aegis.rbnk.uk",
"status": "running",
"health": "healthy"
}
update_stack¶
mcp__stackwiz__update_stack(
name: str,
image: str = None,
environment: dict = None,
volumes: list = None,
recreate: bool = True
)
Update an existing stack. If recreate=True, stops old container and starts new one (zero-downtime via Traefik).
Example:
# Update image
await mcp__stackwiz__update_stack(
name="aegis-api",
image="aegis/api:v2.0.0",
recreate=True
)
# Update environment
await mcp__stackwiz__update_stack(
name="aegis-api",
environment={"LOG_LEVEL": "debug"},
recreate=True
)
delete_stack¶
Delete a stack and optionally remove associated resources.
Example:
await mcp__stackwiz__delete_stack(
name="old-service",
remove_volumes=True, # Delete persistent data
remove_dns=True # Remove DNS record
)
list_stacks¶
mcp__stackwiz__list_stacks(
filter_status: str = None, # "running", "stopped", "paused"
filter_network: str = None
)
List all deployed stacks with status.
Returns:
{
"stacks": [
{
"name": "aegis-dashboard",
"image": "aegis/dashboard:latest",
"status": "running",
"url": "https://aegis.rbnk.uk",
"uptime": "15d 6h 23m",
"health": "healthy"
},
{
"name": "aegis-api",
"image": "aegis/api:v2.0.0",
"status": "running",
"url": "https://api.aegis.rbnk.uk",
"uptime": "2h 15m",
"health": "healthy"
}
]
}
get_stack_status¶
Get detailed status for a specific stack.
Returns:
{
"name": "aegis-dashboard",
"status": "running",
"health": "healthy",
"container_id": "abc123...",
"image": "aegis/dashboard:latest",
"created_at": "2026-01-10T10:00:00Z",
"uptime": "15d 6h 23m",
"url": "https://aegis.rbnk.uk",
"ports": ["8080:8080"],
"networks": ["aegis-net"],
"volumes": ["/srv/aegis/data:/data"],
"environment": {
"ENV": "production"
},
"resource_usage": {
"cpu_percent": 2.5,
"memory_usage": "256MB",
"memory_limit": "512MB"
}
}
restart_stack¶
Restart a stack (graceful shutdown then start).
get_stack_logs¶
mcp__stackwiz__get_stack_logs(
name: str,
tail: int = 100,
since: str = None,
follow: bool = False
)
Get logs for a stack.
Example:
# Get last 100 lines
logs = await mcp__stackwiz__get_stack_logs("aegis-dashboard", tail=100)
# Get logs since timestamp
logs = await mcp__stackwiz__get_stack_logs(
"aegis-api",
since="2026-01-25T10:00:00Z"
)
DNS Management¶
create_dns_record¶
mcp__stackwiz__create_dns_record(
subdomain: str,
ip: str = "10.10.10.10", # Defaults to Dockerhost
type: str = "A",
proxied: bool = False
)
Create a DNS record in Cloudflare.
Example:
# Create A record
await mcp__stackwiz__create_dns_record(
subdomain="new-service",
ip="10.10.10.10"
)
# Result: new-service.rbnk.uk → 10.10.10.10
update_dns_record¶
Update an existing DNS record.
delete_dns_record¶
Delete a DNS record.
list_dns_records¶
List all DNS records for rbnk.uk.
Health Monitoring¶
health_check¶
Perform health check on a stack.
Returns:
{
"stack_name": "aegis-dashboard",
"healthy": true,
"status_code": 200,
"response_time_ms": 45,
"checked_at": "2026-01-25T12:00:00Z"
}
health_check_all¶
Run health checks on all stacks with configured health endpoints.
Returns:
{
"healthy": 3,
"unhealthy": 1,
"unknown": 2,
"stacks": [
{
"name": "aegis-dashboard",
"healthy": true,
"response_time_ms": 45
},
{
"name": "aegis-api",
"healthy": false,
"error": "Connection timeout"
}
]
}
Deployment Patterns¶
Standard Web Service¶
# Deploy a web application
await mcp__stackwiz__create_stack(
name="web-app",
image="myapp:latest",
port=3000,
domain="app.rbnk.uk",
environment={
"NODE_ENV": "production",
"DATABASE_URL": "postgresql://..."
},
volumes=["/srv/app/uploads:/app/uploads"],
health_check="/api/health"
)
API Service¶
# Deploy FastAPI backend
await mcp__stackwiz__create_stack(
name="api-backend",
image="api:v1.0.0",
port=8000,
domain="api.example.rbnk.uk",
environment={
"ENV": "production",
"WORKERS": "4"
},
networks=["backend-net"],
health_check="/health"
)
Database Service (No Public Domain)¶
# Deploy PostgreSQL (internal only)
await mcp__stackwiz__create_stack(
name="postgres-db",
image="postgres:16-alpine",
port=5432,
domain=None, # No public domain
environment={
"POSTGRES_PASSWORD": "secure-password",
"POSTGRES_DB": "appdb"
},
volumes=["/srv/postgres/data:/var/lib/postgresql/data"],
networks=["backend-net"]
)
Blue-Green Deployment¶
# Deploy new version (green)
await mcp__stackwiz__create_stack(
name="app-green",
image="app:v2.0.0",
port=3000,
domain="app-green.rbnk.uk",
health_check="/health"
)
# Wait for health check
status = await mcp__stackwiz__get_stack_status("app-green")
if status.health == "healthy":
# Switch DNS to green
await mcp__stackwiz__update_dns_record(
subdomain="app",
ip="10.10.10.10" # Green container
)
# Delete old version (blue)
await mcp__stackwiz__delete_stack("app-blue")
Integration with Aegis¶
Deployment Workflow¶
from aegis.workflows import run_workflow
# Run deployment workflow with approval gate
result = await run_workflow(
"deployment-approval",
{
"service": "aegis-dashboard",
"image": "aegis/dashboard:v2.0.0",
"environment": "production"
}
)
# If approved, deploy via StackWiz
if result.context.get("approved"):
await mcp__stackwiz__update_stack(
name="aegis-dashboard",
image="aegis/dashboard:v2.0.0",
recreate=True
)
Automated Deployment¶
from aegis.agents import spawn_agent
# Spawn executor agent for deployment
result = await spawn_agent(
task="Deploy new API version to production",
template="executor",
context={
"service": "aegis-api",
"image": "aegis/api:v2.1.0",
"run_tests": True,
"notify_discord": True
}
)
Health Monitoring¶
# In /home/agent/.claude/commands/morning.md
# Check all stack health
health = await mcp__stackwiz__health_check_all()
if health.unhealthy > 0:
await mcp__discord__discord_send(
channel_id=ALERTS_CHANNEL,
content=f"⚠️ {health.unhealthy} unhealthy services detected"
)
Configuration¶
Cloudflare API¶
StackWiz requires Cloudflare credentials in ~/.secure/:
# ~/.secure/cloudflare.env
export CLOUDFLARE_API_TOKEN="your-api-token"
export CLOUDFLARE_ZONE_ID="rbnk.uk-zone-id"
{
"mcpServers": {
"stackwiz": {
"command": "python",
"args": ["-m", "stackwiz_mcp.server"],
"env": {
"CLOUDFLARE_API_TOKEN_FILE": "/home/agent/.secure/cloudflare.env"
}
}
}
}
Dockerhost SSH¶
StackWiz uses SSH to connect to Dockerhost:
# ~/.ssh/config
Host dockerhost
HostName 10.10.10.10
User aegis
IdentityFile ~/.ssh/id_ed25519_github
StrictHostKeyChecking no
Traefik Configuration¶
On Dockerhost, Traefik is configured at:
Security Considerations¶
1. SSH Key Management¶
StackWiz uses SSH for Dockerhost access. Ensure key is secure:
2. Environment Variables¶
Never commit secrets to Git. Use environment variables:
await create_stack(
name="app",
environment={
"API_KEY": os.environ["APP_API_KEY"], # From ~/.secure/
"DB_PASSWORD": os.environ["DB_PASSWORD"]
}
)
3. Network Isolation¶
Use Docker networks to isolate services:
# Backend services on private network
await create_stack(
name="api",
networks=["backend-net"], # Not accessible publicly
domain=None
)
# Frontend on public network
await create_stack(
name="web",
networks=["frontend-net", "backend-net"],
domain="app.rbnk.uk"
)
4. Volume Permissions¶
Set correct permissions on mounted volumes:
Performance Tips¶
1. Image Size¶
Keep images small for faster deployments:
# Use multi-stage builds
FROM python:3.11-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
FROM python:3.11-slim
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY . .
CMD ["python", "app.py"]
2. Health Checks¶
Define health checks for faster failure detection:
3. Resource Limits¶
Set resource limits to prevent runaway containers:
await create_stack(
name="app",
image="app:latest",
# Add to container config
mem_limit="512m",
cpus="0.5"
)
Troubleshooting¶
Deployment Fails¶
# Check stack logs
logs = await mcp__stackwiz__get_stack_logs("failed-service", tail=200)
# Check stack status
status = await mcp__stackwiz__get_stack_status("failed-service")
# Delete and redeploy
await mcp__stackwiz__delete_stack("failed-service", remove_volumes=True)
await mcp__stackwiz__create_stack(...)
DNS Not Resolving¶
# Check DNS record exists
dig +short subdomain.rbnk.uk
# Manually create if missing
await mcp__stackwiz__create_dns_record("subdomain")
Health Check Fails¶
# Check health endpoint manually
import httpx
response = await httpx.get("https://service.rbnk.uk/health")
# Check Traefik logs on Dockerhost
ssh dockerhost "docker logs traefik | tail -100"
Connection Timeout¶
# Test SSH connection
ssh dockerhost "echo 'Connected'"
# Check Dockerhost firewall
ssh dockerhost "sudo ufw status"
Next Steps¶
- Docker MCP - Local container management
- Aegis MCP - Workflow orchestration
- MCP Overview - Protocol basics