Troubleshooting
Common issues and their solutions for FluxiQ Core.
Application Issues
Service Not Starting
Symptom: Cloud Run service fails to start, health check failures.
Check logs:
gcloud logging read "resource.labels.service_name=fluxiq-api" --limit=20Common causes:
- Database connection failed — Verify Cloud SQL IP and VPC connector
- Missing environment variable — Check all required env vars are set
- Port mismatch — Ensure the app listens on
$PORT(default 8080 for Cloud Run)
High Latency
Symptom: P95 latency exceeds 200ms.
Investigation steps:
- Check TigerBeetle health:
ledger_backend_healthmetric - Check PostgreSQL query duration:
pg_query_duration_seconds - Check Redis connection count:
redis_connections_active - Check NATS consumer lag:
nats_consumer_lag
Common causes:
- Database connection pool exhaustion — Increase
DATABASE_POOL_SIZE - TigerBeetle circuit open — Traffic routed to slower fallback
- Cold start — Increase
MIN_INSTANCESto avoid cold starts
429 Rate Limit Errors
Symptom: Clients receiving 429 Too Many Requests.
Solution: Check rate limiting configuration. Health check endpoints should be excluded from rate limiting.
PIX Issues
Duplicate Payments
Prevention:
- Always use
idempotency_keyfor PIX payouts - Check for existing transactions before creating new ones
- Use database-level unique constraints on reference codes
Payout Stuck in Processing
Symptom: PIX payout remains in processing status for more than 5 minutes.
Investigation:
# Check NATS consumer lag
nats consumer info FLUXIQ payout
# Check provider status
curl -s https://api.provider.com.br/healthResolution:
- Check if the provider received the request
- If not, retry the payout:
POST /api/v1/pix/payouts/{id}/retry - If provider confirms but status not updated, manually sync
Infrastructure Issues
TigerBeetle Connection Failure
Symptom: ledger_circuit_breaker_state = 1 (Open)
Investigation:
kubectl get pods -n tigerbeetle
kubectl logs -n tigerbeetle deployment/tigerbeetle --tail=50Common causes:
- Version mismatch — Client version must be <= server version
- Disk full — TigerBeetle needs disk space for WAL
- OOM killed — Increase memory limits
Automatic recovery: Circuit breaker will attempt recovery after 30s timeout.
NATS Connection Issues
Symptom: Messages not being processed, consumer lag increasing.
kubectl exec -it nats-0 -- nats server info
kubectl exec -it nats-0 -- nats stream info FLUXIQ
kubectl exec -it nats-0 -- nats consumer info FLUXIQ defaultRedis Memory Issues
Symptom: Redis approaching memory limit.
redis-cli info memorySolution:
- Review TTL settings for cached data
- Clear expired idempotency keys
- Scale up Redis instance if needed
PostgreSQL Issues
Symptom: Slow queries or connection errors.
# Check active connections via admin VM
psql -c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC
LIMIT 10;"Emergency Procedures
Complete Service Outage
- Check GCP status page: https://status.cloud.google.com/
- Verify load balancer health:
gcloud compute backend-services get-health - Check Cloud Run service status:
gcloud run services describe fluxiq-api - Review recent deployments:
gcloud run revisions list --service=fluxiq-api --limit=5 - Rollback if recent deployment caused the issue
Security Incident
- Rotate all API keys and secrets immediately
- Revoke compromised JWT tokens (add to Redis blacklist)
- Review access logs for unauthorized activity
- Notify affected merchants
- Document incident and conduct post-mortem