Skip to content

Runbooks

Step-by-step operational procedures for FluxiQ Core.

Deployment

Deploy API Server

bash
# 1. Build and push image
gcloud builds submit backend/ \
  --tag southamerica-east1-docker.pkg.dev/PROJECT_ID/fluxiq/api:v1.2.0

# 2. Deploy to Cloud Run
gcloud run deploy fluxiq-api \
  --image=southamerica-east1-docker.pkg.dev/PROJECT_ID/fluxiq/api:v1.2.0 \
  --region=southamerica-east1

# 3. Verify health
curl -s https://api.fluxiq.com.br/health | jq .

# 4. Monitor for 15 minutes in Grafana

Rollback Deployment

bash
# 1. List recent revisions
gcloud run revisions list --service=fluxiq-api --region=southamerica-east1 --limit=5

# 2. Route 100% traffic to previous revision
gcloud run services update-traffic fluxiq-api \
  --to-revisions=fluxiq-api-PREVIOUS_REVISION=100 \
  --region=southamerica-east1

# 3. Verify rollback
curl -s https://api.fluxiq.com.br/health | jq .version

Database Operations

Run Migrations

bash
gcloud run jobs execute fluxiq-migrate --region=southamerica-east1
gcloud logging read "resource.labels.job_name=fluxiq-migrate" --limit=10

Database Backup

bash
# On-demand backup
gcloud sql backups create --instance=fluxiq-db

# List backups
gcloud sql backups list --instance=fluxiq-db --limit=5

# Restore from backup (creates new instance)
gcloud sql instances clone fluxiq-db fluxiq-db-restored \
  --point-in-time="2026-02-03T12:00:00Z"

Emergency Database Access

bash
gcloud compute ssh fluxiq-admin --zone=southamerica-east1-a --tunnel-through-iap
# On the VM:
PGPASSWORD='PASSWORD' psql -h DB_IP -U fluxiq -d fluxiq

TigerBeetle Operations

Check TigerBeetle Health

bash
kubectl get pods -n tigerbeetle
kubectl logs -n tigerbeetle deployment/tigerbeetle --tail=50

Enable TigerBeetle Backend

bash
# 1. Verify TigerBeetle is healthy
kubectl get pods -n tigerbeetle

# 2. Enable in ledger service
kubectl set env deployment/ledger -n fluxiq \
  TIGERBEETLE_ENABLED=true \
  LEDGER_BACKEND=tigerbeetle

# 3. Monitor circuit breaker (should stay at 0 = closed)

Switch to Fallback (Manual)

bash
kubectl set env deployment/ledger -n fluxiq \
  TIGERBEETLE_ENABLED=false \
  LEDGER_BACKEND=midaz

NATS Operations

Check Stream Health

bash
kubectl exec -it nats-0 -- nats stream ls
kubectl exec -it nats-0 -- nats stream info FLUXIQ
kubectl exec -it nats-0 -- nats consumer ls FLUXIQ

Purge Stream (Emergency)

bash
# WARNING: This deletes all unprocessed messages
kubectl exec -it nats-0 -- nats stream purge FLUXIQ --force

Scaling Operations

Scale Cloud Run Services

bash
# Scale up for high traffic
gcloud run services update fluxiq-api \
  --min-instances=5 --max-instances=200 \
  --region=southamerica-east1

# Scale down after traffic spike
gcloud run services update fluxiq-api \
  --min-instances=1 --max-instances=100 \
  --region=southamerica-east1

Incident Response

Both Backends Down

Severity: CRITICAL | Time to resolve: <15 minutes

  1. Check TigerBeetle pod status:

    bash
    kubectl get pods -n tigerbeetle
    kubectl describe pod -n tigerbeetle -l app=tigerbeetle
  2. Check PostgreSQL status:

    bash
    gcloud sql instances describe fluxiq-db --format="value(state)"
  3. If TigerBeetle is down, restart it:

    bash
    kubectl rollout restart deployment/tigerbeetle -n tigerbeetle
  4. If PostgreSQL is down, failover:

    bash
    gcloud sql instances failover fluxiq-db
  5. Monitor recovery via Grafana dashboard

Maintenance Mode

bash
# Enable maintenance mode
kubectl set env deployment/fluxiq-api MAINTENANCE_MODE=true

# Disable maintenance mode
kubectl set env deployment/fluxiq-api MAINTENANCE_MODE=false

During maintenance mode:

  • Health checks continue to pass
  • All API requests return 503 with maintenance message
  • Webhooks are queued in NATS for later processing

Regular Maintenance

Weekly

  • [ ] Review error rate trends
  • [ ] Check NATS consumer lag
  • [ ] Verify backup integrity
  • [ ] Review security alerts

Monthly

  • [ ] Rotate API keys and secrets
  • [ ] Review and optimize slow queries
  • [ ] Update dependencies (security patches)
  • [ ] Review infrastructure costs

Quarterly

  • [ ] Full security audit
  • [ ] Load testing with updated traffic patterns
  • [ ] Review and update SLO targets
  • [ ] Update documentation

FluxiQ Core - PIX Payment Gateway