Operation
Monitoring

Monitoring and metrics

Network-level and upstream-level metrics are available via Prometheus (opens in a new tab) and Grafana (opens in a new tab).

To enable metrics via config:

erpc.yaml
# ...
metrics:
  enabled: true
  listenV4: true
  hostV4: "0.0.0.0"
  listenV6: false
  hostV6: "[::]"
  port: 4001

Refer to erpc/docker-compose.yml (opens in a new tab) and erpc/monitoring (opens in a new tab) for ready-made templates to bring up montoring.

Available metrics

To get full list of available metrics check the source code of erpc/health/metrics.go (opens in a new tab).

eRPC Grafana Dashboard

Here is a list of some of the most important metrics:

MetricTypeDescription
erpc_upstream_request_totalCounterTotal number of actual requests to upstreams.
erpc_upstream_request_duration_secondsHistogramDuration of requests to upstreams.
erpc_upstream_request_errors_totalCounterTotal number of errors for requests to upstreams.
erpc_upstream_request_self_rate_limited_totalCounterTotal number of self-imposed rate limited requests before sending to upstreams.
erpc_upstream_request_remote_rate_limited_totalCounterTotal number of remote rate limited requests by upstreams.
erpc_upstream_request_skipped_totalCounterTotal number of requests skipped by upstreams.
erpc_upstream_request_missing_data_error_totalCounterTotal number of requests where upstream is missing data or not synced yet.
erpc_upstream_request_empty_response_totalCounterTotal number of empty responses from upstreams.
erpc_upstream_block_head_lagGaugeTotal number of blocks (head) behind the most up-to-date upstream.
erpc_upstream_finalization_lagGaugeTotal number of finalized blocks behind the most up-to-date upstream.
erpc_upstream_score_overallGaugeOverall score of upstreams.
erpc_upstream_latest_block_numberGaugeLatest block number of upstreams.
erpc_upstream_finalized_block_numberGaugeFinalized block number of upstreams.
erpc_upstream_cordonedGaugeWhether upstream is excluded from routing by selection policy. (0=uncordoned or 1=cordoned)
erpc_network_request_received_totalCounterTotal number of requests received by the network.
erpc_network_multiplexed_request_totalCounterTotal number of multiplexed requests received by the network.
erpc_network_failed_request_totalCounterTotal number of failed requests received by the network.
erpc_network_request_self_rate_limited_totalCounterTotal number of self-imposed rate limited requests before sending to upstreams.
erpc_network_successful_request_totalCounterTotal number of successful requests received by the network.
erpc_network_cache_hits_totalCounterTotal number of cache hits for requests received by the network.
erpc_network_cache_misses_totalCounterTotal number of cache misses for requests received by the network.
erpc_network_request_duration_secondsHistogramDuration of requests received by the network.
erpc_project_request_self_rate_limited_totalCounterTotal number of self-imposed rate limited requests towards the project.
erpc_rate_limiter_budget_max_countGaugeMaximum number of requests allowed per second for a rate limiter budget
erpc_auth_request_self_rate_limited_totalCounterTotal number of self-imposed rate limited requests due to auth config for a project.
erpc_cache_set_success_totalCounterTotal number of cache set operations.
erpc_cache_set_error_totalCounterTotal number of cache set errors.
erpc_cache_set_skipped_totalCounterTotal number of cache set skips.
erpc_cache_get_success_hit_totalCounterTotal number of cache get hits.
erpc_cache_get_success_miss_totalCounterTotal number of cache get misses.
erpc_cache_get_error_totalCounterTotal number of cache get errors.
erpc_cache_get_skipped_totalCounterTotal number of cache get skips (i.e. no matching policy found).
erpc_cors_requests_totalCounterTotal number of CORS requests received.
erpc_cors_preflight_requests_totalCounterTotal number of CORS preflight requests received.
erpc_cors_disallowed_origin_totalCounterTotal number of CORS requests from disallowed origins.

PromQL examples

# Request rate per second by network over last 5 minutes
sum(rate(erpc_network_request_received_total{}[5m])) by (network)
 
# Total daily requests by project and network
sum(increase(erpc_network_request_received_total{}[24h])) by (project, network)
 
# Top 5 project and networks by request volume
topk(5, sum(rate(erpc_network_request_received_total{}[5m])) by (project, network))
 
# Error rate percentage by network and upstream
100 * sum(rate(erpc_upstream_request_errors_total{}[5m])) by (network, upstream) /
sum(rate(erpc_upstream_request_total{}[5m])) by (network, upstream)
 
# Top error types in the last hour
topk(10, sum(increase(erpc_upstream_request_errors_total{}[1h])) by (error))
 
# Missing data errors by network and upstream
sum(rate(erpc_upstream_request_missing_data_error_total{}[5m])) by (network, upstream)
 
# 95th percentile request duration by network
histogram_quantile(0.95, sum(rate(erpc_network_request_duration_seconds_bucket{}[5m])) by (le,network))
 
# Average request duration for eth_call methods
sum(rate(erpc_upstream_request_duration_seconds_sum{category="eth_call"}[5m])) by (network, upstream) /
sum(rate(erpc_upstream_request_duration_seconds_count{category="eth_call"}[5m])) by (network, upstream)
 
# Identify slow upstreams (avg duration > 500ms)
sum(rate(erpc_upstream_request_duration_seconds_sum{}[5m])) by (network, upstream) /
sum(rate(erpc_upstream_request_duration_seconds_count{}[5m])) by (network, upstream) > 0.5
 
# Cache hit ratio by network
sum(rate(erpc_network_cache_hits_total{}[5m])) by (network) /
(
  sum(rate(erpc_network_cache_hits_total{}[5m])) by (network) +
  sum(rate(erpc_network_cache_misses_total{}[5m])) by (network)
)
 
# Cache miss rate for eth_getBlockByNumber
rate(erpc_network_cache_misses_total{category="eth_getBlockByNumber"}[5m])
 
# Self rate-limited requests by project and network
sum(rate(erpc_network_request_self_rate_limited_total{}[5m])) by (project,network)
 
# Authentication rate limiting by strategy
sum(rate(erpc_auth_request_self_rate_limited_total{strategy="jwt"}[5m])) by (project)
 
# Remote rate limiting from upstreams
sum(rate(erpc_upstream_request_remote_rate_limited_total{}[5m])) by (upstream)
 
# Block lag by network and upstream
max(erpc_upstream_block_head_lag) by (network,upstream)
 
# Finalization lag alert (lag > 5 blocks)
max(erpc_upstream_finalization_lag) by (network) > 5
 
# Block height difference between upstreams
max(erpc_upstream_latest_block_number) by (network) -
min(erpc_upstream_latest_block_number) by (network)
 
# Overall upstream health score
avg(erpc_upstream_score_overall) by (network, upstream)
 
# CORS issues by origin
sum(rate(erpc_cors_disallowed_origin_total{}[5m])) by (project, origin)