Runbooks
Step-by-step operational procedures for FluxiQ Core.
Deployment
Deploy API Server
bash
# 1. Build and push image
gcloud builds submit backend/ \
--tag southamerica-east1-docker.pkg.dev/PROJECT_ID/fluxiq/api:v1.2.0
# 2. Deploy to Cloud Run
gcloud run deploy fluxiq-api \
--image=southamerica-east1-docker.pkg.dev/PROJECT_ID/fluxiq/api:v1.2.0 \
--region=southamerica-east1
# 3. Verify health
curl -s https://api.fluxiq.com.br/health | jq .
# 4. Monitor for 15 minutes in GrafanaRollback Deployment
bash
# 1. List recent revisions
gcloud run revisions list --service=fluxiq-api --region=southamerica-east1 --limit=5
# 2. Route 100% traffic to previous revision
gcloud run services update-traffic fluxiq-api \
--to-revisions=fluxiq-api-PREVIOUS_REVISION=100 \
--region=southamerica-east1
# 3. Verify rollback
curl -s https://api.fluxiq.com.br/health | jq .versionDatabase Operations
Run Migrations
bash
gcloud run jobs execute fluxiq-migrate --region=southamerica-east1
gcloud logging read "resource.labels.job_name=fluxiq-migrate" --limit=10Database Backup
bash
# On-demand backup
gcloud sql backups create --instance=fluxiq-db
# List backups
gcloud sql backups list --instance=fluxiq-db --limit=5
# Restore from backup (creates new instance)
gcloud sql instances clone fluxiq-db fluxiq-db-restored \
--point-in-time="2026-02-03T12:00:00Z"Emergency Database Access
bash
gcloud compute ssh fluxiq-admin --zone=southamerica-east1-a --tunnel-through-iap
# On the VM:
PGPASSWORD='PASSWORD' psql -h DB_IP -U fluxiq -d fluxiqTigerBeetle Operations
Check TigerBeetle Health
bash
kubectl get pods -n tigerbeetle
kubectl logs -n tigerbeetle deployment/tigerbeetle --tail=50Enable TigerBeetle Backend
bash
# 1. Verify TigerBeetle is healthy
kubectl get pods -n tigerbeetle
# 2. Enable in ledger service
kubectl set env deployment/ledger -n fluxiq \
TIGERBEETLE_ENABLED=true \
LEDGER_BACKEND=tigerbeetle
# 3. Monitor circuit breaker (should stay at 0 = closed)Switch to Fallback (Manual)
bash
kubectl set env deployment/ledger -n fluxiq \
TIGERBEETLE_ENABLED=false \
LEDGER_BACKEND=midazNATS Operations
Check Stream Health
bash
kubectl exec -it nats-0 -- nats stream ls
kubectl exec -it nats-0 -- nats stream info FLUXIQ
kubectl exec -it nats-0 -- nats consumer ls FLUXIQPurge Stream (Emergency)
bash
# WARNING: This deletes all unprocessed messages
kubectl exec -it nats-0 -- nats stream purge FLUXIQ --forceScaling Operations
Scale Cloud Run Services
bash
# Scale up for high traffic
gcloud run services update fluxiq-api \
--min-instances=5 --max-instances=200 \
--region=southamerica-east1
# Scale down after traffic spike
gcloud run services update fluxiq-api \
--min-instances=1 --max-instances=100 \
--region=southamerica-east1Incident Response
Both Backends Down
Severity: CRITICAL | Time to resolve: <15 minutes
Check TigerBeetle pod status:
bashkubectl get pods -n tigerbeetle kubectl describe pod -n tigerbeetle -l app=tigerbeetleCheck PostgreSQL status:
bashgcloud sql instances describe fluxiq-db --format="value(state)"If TigerBeetle is down, restart it:
bashkubectl rollout restart deployment/tigerbeetle -n tigerbeetleIf PostgreSQL is down, failover:
bashgcloud sql instances failover fluxiq-dbMonitor recovery via Grafana dashboard
Maintenance Mode
bash
# Enable maintenance mode
kubectl set env deployment/fluxiq-api MAINTENANCE_MODE=true
# Disable maintenance mode
kubectl set env deployment/fluxiq-api MAINTENANCE_MODE=falseDuring maintenance mode:
- Health checks continue to pass
- All API requests return 503 with maintenance message
- Webhooks are queued in NATS for later processing
Regular Maintenance
Weekly
- [ ] Review error rate trends
- [ ] Check NATS consumer lag
- [ ] Verify backup integrity
- [ ] Review security alerts
Monthly
- [ ] Rotate API keys and secrets
- [ ] Review and optimize slow queries
- [ ] Update dependencies (security patches)
- [ ] Review infrastructure costs
Quarterly
- [ ] Full security audit
- [ ] Load testing with updated traffic patterns
- [ ] Review and update SLO targets
- [ ] Update documentation