Monitoring

FluxiQ Core includes comprehensive monitoring with Prometheus metrics, Grafana dashboards, and alerting.

Metrics Overview

Application Metrics

Metric	Type	Description
`fluxiq_http_requests_total`	Counter	Total HTTP requests by method, path, status
`fluxiq_http_request_duration_seconds`	Histogram	Request latency distribution
`fluxiq_pix_transactions_total`	Counter	PIX transactions by type and status
`fluxiq_pix_transaction_amount_total`	Counter	Total PIX transaction amount (centavos)
`fluxiq_account_balance`	Gauge	Current account balance
`fluxiq_active_connections`	Gauge	Active WebSocket connections

Ledger Metrics

Metric	Type	Description
`ledger_operations_total`	Counter	Operations by backend (tigerbeetle/fallback) and status
`ledger_operation_duration_seconds`	Histogram	Ledger operation latency
`ledger_circuit_breaker_state`	Gauge	Circuit breaker state (0=closed, 1=open, 2=half-open)
`ledger_fallback_total`	Counter	Fallback events
`ledger_backend_health`	Gauge	Backend health (1=healthy, 0=unhealthy)
`ledger_active_backend`	Gauge	Currently active backend

Infrastructure Metrics

Metric	Type	Description
`nats_messages_total`	Counter	NATS messages by stream and subject
`nats_consumer_lag`	Gauge	Consumer message lag
`redis_connections_active`	Gauge	Active Redis connections
`redis_memory_used_bytes`	Gauge	Redis memory usage
`pg_connections_active`	Gauge	Active PostgreSQL connections
`pg_query_duration_seconds`	Histogram	Query latency

Grafana Dashboards

Main Operations Dashboard

The primary dashboard provides a real-time view of system health:

Request rate (TPS) with SLO overlay
Error rate by status code
P50/P95/P99 latency
PIX In/Out transaction volume
Active accounts gauge
Revenue tracking (daily/monthly)

Import: infrastructure/monitoring/main-operations-dashboard.json

TigerBeetle Performance Dashboard

Transfer throughput (TPS)
Account lookup latency
Batch size distribution
io_uring operations
Memory usage and Disk I/O

Import: infrastructure/monitoring/tigerbeetle-performance-dashboard.json

Ledger Fallback Dashboard

Circuit breaker state (gauge)
Active backend indicator
Operations by backend (timeseries)
Fallback rate and Error rate
P50/P95/P99 latency by backend
Recent fallback events

Import: infrastructure/monitoring/ledger-fallback-dashboard.json

Alert Policies

Critical Alerts

Alert	Condition	Action
Both Backends Down	TigerBeetle AND fallback unhealthy for 30s	Page on-call
High Error Rate	>5% error rate for 5 minutes	Page on-call
Database Down	PostgreSQL unreachable for 1 minute	Page on-call
Zero Transactions	No PIX transactions for 15 minutes (business hours)	Investigate

Warning Alerts

Alert	Condition	Action
Circuit Breaker Open	TigerBeetle circuit open for 1 minute	Monitor
High Fallback Rate	>10 fallback events/min for 2 minutes	Investigate
P95 Latency High	>200ms for 5 minutes	Investigate
NATS Consumer Lag	>1000 messages lag for 5 minutes	Check workers
Redis Memory High	>80% capacity	Scale up

Health Checks

Application Health

http

GET /health

json

{
  "status": "healthy",
  "version": "1.2.0",
  "uptime": 864000,
  "checks": {
    "database": "ok",
    "redis": "ok",
    "nats": "ok",
    "tigerbeetle": "ok"
  }
}

Readiness and Liveness

GET /ready — Returns 200 when all dependencies are connected
GET /live — Returns 200 if the process is running (Kubernetes liveness probe)

Log Aggregation

Structured JSON logs are shipped to Cloud Logging:

json

{
  "timestamp": "2026-02-03T12:00:00.000Z",
  "level": "info",
  "message": "PIX charge paid",
  "request_id": "req_01HQGX...",
  "charge_id": "chg_01HQGX...",
  "amount": 15000,
  "duration_ms": 8,
  "backend": "tigerbeetle"
}

Useful Log Queries

bash

# All errors in the last hour
gcloud logging read 'severity>=ERROR AND resource.type="cloud_run_revision"' --limit=50

# PIX transaction logs
gcloud logging read 'jsonPayload.message=~"PIX"' --limit=20

# Slow queries (>100ms)
gcloud logging read 'jsonPayload.duration_ms>100' --limit=20

SLO Targets

Metric	Target	Current
Availability	99.95%	99.99%
P95 Latency	<200ms	45ms
Error Rate	<0.1%	0.03%
PIX Processing Time	<5s	<2s
Failover Time	<10s	<5s

Monitoring ​

Metrics Overview ​

Application Metrics ​

Ledger Metrics ​

Infrastructure Metrics ​

Grafana Dashboards ​

Main Operations Dashboard ​

TigerBeetle Performance Dashboard ​

Ledger Fallback Dashboard ​

Alert Policies ​

Critical Alerts ​

Warning Alerts ​

Health Checks ​

Application Health ​

Readiness and Liveness ​

Log Aggregation ​

Useful Log Queries ​

SLO Targets ​