Troubleshooting

Common issues and their solutions for FluxiQ Core.

Application Issues

Service Not Starting

Symptom: Cloud Run service fails to start, health check failures.

Check logs:

bash

gcloud logging read "resource.labels.service_name=fluxiq-api" --limit=20

Common causes:

Database connection failed — Verify Cloud SQL IP and VPC connector
Missing environment variable — Check all required env vars are set
Port mismatch — Ensure the app listens on $PORT (default 8080 for Cloud Run)

High Latency

Symptom: P95 latency exceeds 200ms.

Investigation steps:

Check TigerBeetle health: ledger_backend_health metric
Check PostgreSQL query duration: pg_query_duration_seconds
Check Redis connection count: redis_connections_active
Check NATS consumer lag: nats_consumer_lag

Common causes:

Database connection pool exhaustion — Increase DATABASE_POOL_SIZE
TigerBeetle circuit open — Traffic routed to slower fallback
Cold start — Increase MIN_INSTANCES to avoid cold starts

429 Rate Limit Errors

Symptom: Clients receiving 429 Too Many Requests.

Solution: Check rate limiting configuration. Health check endpoints should be excluded from rate limiting.

PIX Issues

Duplicate Payments

Prevention:

Always use idempotency_key for PIX payouts
Check for existing transactions before creating new ones
Use database-level unique constraints on reference codes

Payout Stuck in Processing

Symptom: PIX payout remains in processing status for more than 5 minutes.

Investigation:

bash

# Check NATS consumer lag
nats consumer info FLUXIQ payout

# Check provider status
curl -s https://api.provider.com.br/health

Resolution:

Check if the provider received the request
If not, retry the payout: POST /api/v1/pix/payouts/{id}/retry
If provider confirms but status not updated, manually sync

Infrastructure Issues

TigerBeetle Connection Failure

Symptom: ledger_circuit_breaker_state = 1 (Open)

Investigation:

bash

kubectl get pods -n tigerbeetle
kubectl logs -n tigerbeetle deployment/tigerbeetle --tail=50

Common causes:

Version mismatch — Client version must be <= server version
Disk full — TigerBeetle needs disk space for WAL
OOM killed — Increase memory limits

Automatic recovery: Circuit breaker will attempt recovery after 30s timeout.

NATS Connection Issues

Symptom: Messages not being processed, consumer lag increasing.

bash

kubectl exec -it nats-0 -- nats server info
kubectl exec -it nats-0 -- nats stream info FLUXIQ
kubectl exec -it nats-0 -- nats consumer info FLUXIQ default

Redis Memory Issues

Symptom: Redis approaching memory limit.

bash

redis-cli info memory

Solution:

Review TTL settings for cached data
Clear expired idempotency keys
Scale up Redis instance if needed

PostgreSQL Issues

Symptom: Slow queries or connection errors.

bash

# Check active connections via admin VM
psql -c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query
         FROM pg_stat_activity
         WHERE state != 'idle'
         ORDER BY duration DESC
         LIMIT 10;"

Emergency Procedures

Complete Service Outage

Check GCP status page: https://status.cloud.google.com/
Verify load balancer health: gcloud compute backend-services get-health
Check Cloud Run service status: gcloud run services describe fluxiq-api
Review recent deployments: gcloud run revisions list --service=fluxiq-api --limit=5
Rollback if recent deployment caused the issue

Security Incident

Rotate all API keys and secrets immediately
Revoke compromised JWT tokens (add to Redis blacklist)
Review access logs for unauthorized activity
Notify affected merchants
Document incident and conduct post-mortem

Troubleshooting ​

Application Issues ​

Service Not Starting ​

High Latency ​

429 Rate Limit Errors ​

PIX Issues ​

Duplicate Payments ​

Payout Stuck in Processing ​

Infrastructure Issues ​

TigerBeetle Connection Failure ​

NATS Connection Issues ​

Redis Memory Issues ​

PostgreSQL Issues ​

Emergency Procedures ​

Complete Service Outage ​

Security Incident ​

Troubleshooting

Application Issues

Service Not Starting

High Latency

429 Rate Limit Errors

PIX Issues

Duplicate Payments

Payout Stuck in Processing

Infrastructure Issues

TigerBeetle Connection Failure

NATS Connection Issues

Redis Memory Issues

PostgreSQL Issues

Emergency Procedures

Complete Service Outage

Security Incident