Skip to content

Troubleshooting

Common issues and their solutions for FluxiQ Core.

Application Issues

Service Not Starting

Symptom: Cloud Run service fails to start, health check failures.

Check logs:

bash
gcloud logging read "resource.labels.service_name=fluxiq-api" --limit=20

Common causes:

  1. Database connection failed — Verify Cloud SQL IP and VPC connector
  2. Missing environment variable — Check all required env vars are set
  3. Port mismatch — Ensure the app listens on $PORT (default 8080 for Cloud Run)

High Latency

Symptom: P95 latency exceeds 200ms.

Investigation steps:

  1. Check TigerBeetle health: ledger_backend_health metric
  2. Check PostgreSQL query duration: pg_query_duration_seconds
  3. Check Redis connection count: redis_connections_active
  4. Check NATS consumer lag: nats_consumer_lag

Common causes:

  1. Database connection pool exhaustion — Increase DATABASE_POOL_SIZE
  2. TigerBeetle circuit open — Traffic routed to slower fallback
  3. Cold start — Increase MIN_INSTANCES to avoid cold starts

429 Rate Limit Errors

Symptom: Clients receiving 429 Too Many Requests.

Solution: Check rate limiting configuration. Health check endpoints should be excluded from rate limiting.

PIX Issues

Duplicate Payments

Prevention:

  • Always use idempotency_key for PIX payouts
  • Check for existing transactions before creating new ones
  • Use database-level unique constraints on reference codes

Payout Stuck in Processing

Symptom: PIX payout remains in processing status for more than 5 minutes.

Investigation:

bash
# Check NATS consumer lag
nats consumer info FLUXIQ payout

# Check provider status
curl -s https://api.provider.com.br/health

Resolution:

  1. Check if the provider received the request
  2. If not, retry the payout: POST /api/v1/pix/payouts/{id}/retry
  3. If provider confirms but status not updated, manually sync

Infrastructure Issues

TigerBeetle Connection Failure

Symptom: ledger_circuit_breaker_state = 1 (Open)

Investigation:

bash
kubectl get pods -n tigerbeetle
kubectl logs -n tigerbeetle deployment/tigerbeetle --tail=50

Common causes:

  1. Version mismatch — Client version must be <= server version
  2. Disk full — TigerBeetle needs disk space for WAL
  3. OOM killed — Increase memory limits

Automatic recovery: Circuit breaker will attempt recovery after 30s timeout.

NATS Connection Issues

Symptom: Messages not being processed, consumer lag increasing.

bash
kubectl exec -it nats-0 -- nats server info
kubectl exec -it nats-0 -- nats stream info FLUXIQ
kubectl exec -it nats-0 -- nats consumer info FLUXIQ default

Redis Memory Issues

Symptom: Redis approaching memory limit.

bash
redis-cli info memory

Solution:

  1. Review TTL settings for cached data
  2. Clear expired idempotency keys
  3. Scale up Redis instance if needed

PostgreSQL Issues

Symptom: Slow queries or connection errors.

bash
# Check active connections via admin VM
psql -c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query
         FROM pg_stat_activity
         WHERE state != 'idle'
         ORDER BY duration DESC
         LIMIT 10;"

Emergency Procedures

Complete Service Outage

  1. Check GCP status page: https://status.cloud.google.com/
  2. Verify load balancer health: gcloud compute backend-services get-health
  3. Check Cloud Run service status: gcloud run services describe fluxiq-api
  4. Review recent deployments: gcloud run revisions list --service=fluxiq-api --limit=5
  5. Rollback if recent deployment caused the issue

Security Incident

  1. Rotate all API keys and secrets immediately
  2. Revoke compromised JWT tokens (add to Redis blacklist)
  3. Review access logs for unauthorized activity
  4. Notify affected merchants
  5. Document incident and conduct post-mortem

FluxiQ Core - PIX Payment Gateway