Operation
Monitoring

Monitoring and metrics

Network-level and upstream-level metrics are available via Prometheus (opens in a new tab) and Grafana (opens in a new tab).

To enable metrics via config:

erpc.yaml
# ...
metrics:
  enabled: true
  listenV4: true
  hostV4: "0.0.0.0"
  listenV6: false
  hostV6: "[::]"
  port: 4001
  errorLabelMode: "verbose" # Optional: "verbose" (default) or "compact"
  histogramBuckets: "0.01,0.1,0.5,1,5,10,60,300" # Optional: custom histogram buckets

Reducing Metrics Cardinality

eRPC provides two configuration options to help reduce metrics cardinality, which can significantly decrease the storage requirements and query performance of your monitoring system.

Error Label Mode

The errorLabelMode setting controls how detailed error information is included in metrics labels:

  • verbose: Uses the full error message as labels (default for backward compatibility)
  • compact: Uses only the error type as labels, reducing cardinality significantly
erpc.yaml
metrics:
  errorLabelMode: "compact" # "verbose" or "compact"

Histogram Buckets

You can customize histogram buckets to reduce cardinality and focus on relevant latency ranges:

erpc.yaml
metrics:
  histogramBuckets: "0.01,0.1,0.5,1,5,10,60,300"

Setting fewer buckets or focusing on relevant latency ranges can significantly reduce the number of time series stored in your monitoring system.

Refer to erpc/docker-compose.yml (opens in a new tab) and erpc/monitoring (opens in a new tab) for ready-made templates to bring up montoring.

Available metrics

To get full list of available metrics check the source code of erpc/health/metrics.go (opens in a new tab).

eRPC Grafana Dashboard

Here is a list of some of the most important metrics:

MetricTypeDescription
erpc_upstream_request_totalCounterTotal number of actual requests to upstreams.
erpc_upstream_request_duration_secondsHistogramDuration of requests to upstreams.
erpc_upstream_request_errors_totalCounterTotal number of errors for requests to upstreams.
erpc_upstream_request_self_rate_limited_totalCounterTotal number of self-imposed rate limited requests before sending to upstreams.
erpc_upstream_request_remote_rate_limited_totalCounterTotal number of remote rate limited requests by upstreams.
erpc_upstream_request_skipped_totalCounterTotal number of requests skipped by upstreams.
erpc_upstream_request_missing_data_error_totalCounterTotal number of requests where upstream is missing data or not synced yet.
erpc_upstream_request_empty_response_totalCounterTotal number of empty responses from upstreams.
erpc_upstream_block_head_lagGaugeTotal number of blocks (head) behind the most up-to-date upstream.
erpc_upstream_finalization_lagGaugeTotal number of finalized blocks behind the most up-to-date upstream.
erpc_upstream_score_overallGaugeOverall score of upstreams.
erpc_upstream_latest_block_numberGaugeLatest block number of upstreams.
erpc_upstream_finalized_block_numberGaugeFinalized block number of upstreams.
erpc_upstream_cordonedGaugeWhether upstream is excluded from routing by selection policy. (0=uncordoned or 1=cordoned)
erpc_upstream_stale_latest_block_totalCounterTotal number of times an upstream returned a stale latest block number (vs others).
erpc_upstream_stale_finalized_block_totalCounterTotal number of times an upstream returned a stale finalized block number (vs others).
erpc_upstream_evm_get_logs_stale_upper_bound_totalCounterTotal number of times eth_getLogs was skipped due to upstream latest block being less than requested toBlock.
erpc_upstream_evm_get_logs_stale_lower_bound_totalCounterTotal number of times eth_getLogs was skipped due to fromBlock being less than upstream's available block range.
erpc_upstream_evm_get_logs_range_exceeded_auto_splitting_threshold_totalCounterTotal number of times eth_getLogs request exceeded the block range threshold and needed splitting (based on upstream config for "upstream.evm.getLogsAutoSplittingRangeThreshold").
erpc_upstream_evm_get_logs_forced_splits_totalCounterTotal number of eth_getLogs request splits by dimension (block_range, addresses, topics), due to a complain/error from upstream (e.g. "Returned too many results use a smaller block range").
erpc_upstream_evm_get_logs_split_success_totalCounterTotal number of successful split eth_getLogs sub-requests.
erpc_upstream_evm_get_logs_split_failure_totalCounterTotal number of failed split eth_getLogs sub-requests.
erpc_upstream_latest_block_polled_totalCounterTotal number of times the latest block was pro-actively polled from an upstream.
erpc_upstream_finalized_block_polled_totalCounterTotal number of times the finalized block was pro-actively polled from an upstream.
erpc_network_request_received_totalCounterTotal number of requests received by the network.
erpc_network_multiplexed_request_totalCounterTotal number of multiplexed requests received by the network.
erpc_network_failed_request_totalCounterTotal number of failed requests received by the network.
erpc_network_request_self_rate_limited_totalCounterTotal number of self-imposed rate limited requests before sending to upstreams.
erpc_network_successful_request_totalCounterTotal number of successful requests received by the network.
erpc_network_cache_hits_totalCounterTotal number of cache hits for requests received by the network.
erpc_network_cache_misses_totalCounterTotal number of cache misses for requests received by the network.
erpc_network_request_duration_secondsHistogramDuration of requests received by the network.
erpc_project_request_self_rate_limited_totalCounterTotal number of self-imposed rate limited requests towards the project.
erpc_rate_limiter_budget_max_countGaugeMaximum number of requests allowed per second for a rate limiter budget
erpc_auth_request_self_rate_limited_totalCounterTotal number of self-imposed rate limited requests due to auth config for a project.
erpc_cache_set_success_totalCounterTotal number of cache set operations.
erpc_cache_set_error_totalCounterTotal number of cache set errors.
erpc_cache_set_skipped_totalCounterTotal number of cache set skips.
erpc_cache_get_success_hit_totalCounterTotal number of cache get hits.
erpc_cache_get_success_miss_totalCounterTotal number of cache get misses.
erpc_cache_get_error_totalCounterTotal number of cache get errors.
erpc_cache_get_skipped_totalCounterTotal number of cache get skips (i.e. no matching policy found).
erpc_cors_requests_totalCounterTotal number of CORS requests received.
erpc_cors_preflight_requests_totalCounterTotal number of CORS preflight requests received.
erpc_cors_disallowed_origin_totalCounterTotal number of CORS requests from disallowed origins.

PromQL examples

# Request rate per second by network over last 5 minutes
sum(rate(erpc_network_request_received_total{}[5m])) by (network)
 
# Total daily requests by project and network
sum(increase(erpc_network_request_received_total{}[24h])) by (project, network)
 
# Top 5 project and networks by request volume
topk(5, sum(rate(erpc_network_request_received_total{}[5m])) by (project, network))
 
# Error rate percentage by network and upstream
100 * sum(rate(erpc_upstream_request_errors_total{}[5m])) by (network, upstream) /
sum(rate(erpc_upstream_request_total{}[5m])) by (network, upstream)
 
# Top error types in the last hour
topk(10, sum(increase(erpc_upstream_request_errors_total{}[1h])) by (error))
 
# Missing data errors by network and upstream
sum(rate(erpc_upstream_request_missing_data_error_total{}[5m])) by (network, upstream)
 
# 95th percentile request duration by network
histogram_quantile(0.95, sum(rate(erpc_network_request_duration_seconds_bucket{}[5m])) by (le,network))
 
# Average request duration for eth_call methods
sum(rate(erpc_upstream_request_duration_seconds_sum{category="eth_call"}[5m])) by (network, upstream) /
sum(rate(erpc_upstream_request_duration_seconds_count{category="eth_call"}[5m])) by (network, upstream)
 
# Identify slow upstreams (avg duration > 500ms)
sum(rate(erpc_upstream_request_duration_seconds_sum{}[5m])) by (network, upstream) /
sum(rate(erpc_upstream_request_duration_seconds_count{}[5m])) by (network, upstream) > 0.5
 
# Cache hit ratio by network
sum(rate(erpc_network_cache_hits_total{}[5m])) by (network) /
(
  sum(rate(erpc_network_cache_hits_total{}[5m])) by (network) +
  sum(rate(erpc_network_cache_misses_total{}[5m])) by (network)
)
 
# Cache miss rate for eth_getBlockByNumber
rate(erpc_network_cache_misses_total{category="eth_getBlockByNumber"}[5m])
 
# Self rate-limited requests by project and network
sum(rate(erpc_network_request_self_rate_limited_total{}[5m])) by (project,network)
 
# Authentication rate limiting by strategy
sum(rate(erpc_auth_request_self_rate_limited_total{strategy="jwt"}[5m])) by (project)
 
# Remote rate limiting from upstreams
sum(rate(erpc_upstream_request_remote_rate_limited_total{}[5m])) by (upstream)
 
# Block lag by network and upstream
max(erpc_upstream_block_head_lag) by (network,upstream)
 
# Finalization lag alert (lag > 5 blocks)
max(erpc_upstream_finalization_lag) by (network) > 5
 
# Block height difference between upstreams
max(erpc_upstream_latest_block_number) by (network) -
min(erpc_upstream_latest_block_number) by (network)
 
# Overall upstream health score
avg(erpc_upstream_score_overall) by (network, upstream)
 
# CORS issues by origin
sum(rate(erpc_cors_disallowed_origin_total{}[5m])) by (project, origin)