Monitoring and metrics

Network-level and upstream-level metrics are available via Prometheus (opens in a new tab) and Grafana (opens in a new tab).

To enable metrics via config:

erpc.yaml

# ...
metrics:
  enabled: true
  listenV4: true
  hostV4: "0.0.0.0"
  listenV6: false
  hostV6: "[::]"
  port: 4001
  errorLabelMode: "verbose" # Optional: "verbose" (default) or "compact"
  histogramBuckets: "0.01,0.1,0.5,1,5,10,60,300" # Optional: custom histogram buckets

Reducing Metrics Cardinality

eRPC provides two configuration options to help reduce metrics cardinality, which can significantly decrease the storage requirements and query performance of your monitoring system.

Error Label Mode

The errorLabelMode setting controls how detailed error information is included in metrics labels:

verbose: Uses the full error message as labels (default for backward compatibility)
compact: Uses only the error type as labels, reducing cardinality significantly

erpc.yaml

metrics:
  errorLabelMode: "compact" # "verbose" or "compact"

Histogram Buckets

You can customize histogram buckets to reduce cardinality and focus on relevant latency ranges:

erpc.yaml

metrics:
  histogramBuckets: "0.01,0.1,0.5,1,5,10,60,300"

Setting fewer buckets or focusing on relevant latency ranges can significantly reduce the number of time series stored in your monitoring system.

Refer to erpc/docker-compose.yml (opens in a new tab) and erpc/monitoring (opens in a new tab) for ready-made templates to bring up montoring.

Available metrics

To get full list of available metrics check the source code of erpc/health/metrics.go (opens in a new tab).

eRPC Grafana Dashboard

Here is a list of some of the most important metrics:

Metric	Type	Description
erpc_upstream_request_total	Counter	Total number of actual requests to upstreams.
erpc_upstream_request_duration_seconds	Histogram	Duration of requests to upstreams.
erpc_upstream_request_errors_total	Counter	Total number of errors for requests to upstreams.
erpc_upstream_request_self_rate_limited_total	Counter	Total number of self-imposed rate limited requests before sending to upstreams.
erpc_upstream_request_remote_rate_limited_total	Counter	Total number of remote rate limited requests by upstreams.
erpc_upstream_request_skipped_total	Counter	Total number of requests skipped by upstreams.
erpc_upstream_request_missing_data_error_total	Counter	Total number of requests where upstream is missing data or not synced yet.
erpc_upstream_request_empty_response_total	Counter	Total number of empty responses from upstreams.
erpc_upstream_block_head_lag	Gauge	Total number of blocks (head) behind the most up-to-date upstream.
erpc_upstream_finalization_lag	Gauge	Total number of finalized blocks behind the most up-to-date upstream.
erpc_upstream_score_overall	Gauge	Overall score of upstreams.
erpc_upstream_latest_block_number	Gauge	Latest block number of upstreams.
erpc_upstream_finalized_block_number	Gauge	Finalized block number of upstreams.
erpc_upstream_cordoned	Gauge	Whether upstream is excluded from routing by selection policy. (0=uncordoned or 1=cordoned)
erpc_upstream_stale_latest_block_total	Counter	Total number of times an upstream returned a stale latest block number (vs others).
erpc_upstream_stale_finalized_block_total	Counter	Total number of times an upstream returned a stale finalized block number (vs others).
erpc_upstream_evm_get_logs_stale_upper_bound_total	Counter	Total number of times eth_getLogs was skipped due to upstream latest block being less than requested toBlock.
erpc_upstream_evm_get_logs_stale_lower_bound_total	Counter	Total number of times eth_getLogs was skipped due to fromBlock being less than upstream's available block range.
erpc_upstream_evm_get_logs_range_exceeded_auto_splitting_threshold_total	Counter	Total number of times eth_getLogs request exceeded the block range threshold and needed splitting (based on upstream config for "upstream.evm.getLogsAutoSplittingRangeThreshold").
erpc_upstream_evm_get_logs_forced_splits_total	Counter	Total number of eth_getLogs request splits by dimension (block_range, addresses, topics), due to a complain/error from upstream (e.g. "Returned too many results use a smaller block range").
erpc_upstream_evm_get_logs_split_success_total	Counter	Total number of successful split eth_getLogs sub-requests.
erpc_upstream_evm_get_logs_split_failure_total	Counter	Total number of failed split eth_getLogs sub-requests.
erpc_upstream_latest_block_polled_total	Counter	Total number of times the latest block was pro-actively polled from an upstream.
erpc_upstream_finalized_block_polled_total	Counter	Total number of times the finalized block was pro-actively polled from an upstream.
erpc_network_request_received_total	Counter	Total number of requests received by the network.
erpc_network_multiplexed_request_total	Counter	Total number of multiplexed requests received by the network.
erpc_network_failed_request_total	Counter	Total number of failed requests received by the network.
erpc_network_request_self_rate_limited_total	Counter	Total number of self-imposed rate limited requests before sending to upstreams.
erpc_network_successful_request_total	Counter	Total number of successful requests received by the network.
erpc_network_cache_hits_total	Counter	Total number of cache hits for requests received by the network.
erpc_network_cache_misses_total	Counter	Total number of cache misses for requests received by the network.
erpc_network_request_duration_seconds	Histogram	Duration of requests received by the network.
erpc_project_request_self_rate_limited_total	Counter	Total number of self-imposed rate limited requests towards the project.
erpc_rate_limiter_budget_max_count	Gauge	Maximum number of requests allowed per second for a rate limiter budget
erpc_auth_request_self_rate_limited_total	Counter	Total number of self-imposed rate limited requests due to auth config for a project.
erpc_cache_set_success_total	Counter	Total number of cache set operations.
erpc_cache_set_error_total	Counter	Total number of cache set errors.
erpc_cache_set_skipped_total	Counter	Total number of cache set skips.
erpc_cache_get_success_hit_total	Counter	Total number of cache get hits.
erpc_cache_get_success_miss_total	Counter	Total number of cache get misses.
erpc_cache_get_error_total	Counter	Total number of cache get errors.
erpc_cache_get_skipped_total	Counter	Total number of cache get skips (i.e. no matching policy found).
erpc_cors_requests_total	Counter	Total number of CORS requests received.
erpc_cors_preflight_requests_total	Counter	Total number of CORS preflight requests received.
erpc_cors_disallowed_origin_total	Counter	Total number of CORS requests from disallowed origins.

PromQL examples

# Request rate per second by network over last 5 minutes
sum(rate(erpc_network_request_received_total{}[5m])) by (network)
 
# Total daily requests by project and network
sum(increase(erpc_network_request_received_total{}[24h])) by (project, network)
 
# Top 5 project and networks by request volume
topk(5, sum(rate(erpc_network_request_received_total{}[5m])) by (project, network))
 
# Error rate percentage by network and upstream
100 * sum(rate(erpc_upstream_request_errors_total{}[5m])) by (network, upstream) /
sum(rate(erpc_upstream_request_total{}[5m])) by (network, upstream)
 
# Top error types in the last hour
topk(10, sum(increase(erpc_upstream_request_errors_total{}[1h])) by (error))
 
# Missing data errors by network and upstream
sum(rate(erpc_upstream_request_missing_data_error_total{}[5m])) by (network, upstream)
 
# 95th percentile request duration by network
histogram_quantile(0.95, sum(rate(erpc_network_request_duration_seconds_bucket{}[5m])) by (le,network))
 
# Average request duration for eth_call methods
sum(rate(erpc_upstream_request_duration_seconds_sum{category="eth_call"}[5m])) by (network, upstream) /
sum(rate(erpc_upstream_request_duration_seconds_count{category="eth_call"}[5m])) by (network, upstream)
 
# Identify slow upstreams (avg duration > 500ms)
sum(rate(erpc_upstream_request_duration_seconds_sum{}[5m])) by (network, upstream) /
sum(rate(erpc_upstream_request_duration_seconds_count{}[5m])) by (network, upstream) > 0.5
 
# Cache hit ratio by network
sum(rate(erpc_network_cache_hits_total{}[5m])) by (network) /
(
  sum(rate(erpc_network_cache_hits_total{}[5m])) by (network) +
  sum(rate(erpc_network_cache_misses_total{}[5m])) by (network)
)
 
# Cache miss rate for eth_getBlockByNumber
rate(erpc_network_cache_misses_total{category="eth_getBlockByNumber"}[5m])
 
# Self rate-limited requests by project and network
sum(rate(erpc_network_request_self_rate_limited_total{}[5m])) by (project,network)
 
# Authentication rate limiting by strategy
sum(rate(erpc_auth_request_self_rate_limited_total{strategy="jwt"}[5m])) by (project)
 
# Remote rate limiting from upstreams
sum(rate(erpc_upstream_request_remote_rate_limited_total{}[5m])) by (upstream)
 
# Block lag by network and upstream
max(erpc_upstream_block_head_lag) by (network,upstream)
 
# Finalization lag alert (lag > 5 blocks)
max(erpc_upstream_finalization_lag) by (network) > 5
 
# Block height difference between upstreams
max(erpc_upstream_latest_block_number) by (network) -
min(erpc_upstream_latest_block_number) by (network)
 
# Overall upstream health score
avg(erpc_upstream_score_overall) by (network, upstream)
 
# CORS issues by origin
sum(rate(erpc_cors_disallowed_origin_total{}[5m])) by (project, origin)

Production Tracing