Monitoring and metrics
Network-level and upstream-level metrics are available via Prometheus (opens in a new tab) and Grafana (opens in a new tab).
To enable metrics via config:
# ...
metrics:
enabled: true
listenV4: true
hostV4: "0.0.0.0"
listenV6: false
hostV6: "[::]"
port: 4001
errorLabelMode: "verbose" # Optional: "verbose" (default) or "compact"
histogramBuckets: "0.01,0.1,0.5,1,5,10,60,300" # Optional: custom histogram buckets
Reducing Metrics Cardinality
eRPC provides two configuration options to help reduce metrics cardinality, which can significantly decrease the storage requirements and query performance of your monitoring system.
Error Label Mode
The errorLabelMode
setting controls how detailed error information is included in metrics labels:
verbose
: Uses the full error message as labels (default for backward compatibility)compact
: Uses only the error type as labels, reducing cardinality significantly
metrics:
errorLabelMode: "compact" # "verbose" or "compact"
Histogram Buckets
You can customize histogram buckets to reduce cardinality and focus on relevant latency ranges:
metrics:
histogramBuckets: "0.01,0.1,0.5,1,5,10,60,300"
Setting fewer buckets or focusing on relevant latency ranges can significantly reduce the number of time series stored in your monitoring system.
Refer to erpc/docker-compose.yml (opens in a new tab) and erpc/monitoring (opens in a new tab) for ready-made templates to bring up montoring.
Available metrics
To get full list of available metrics check the source code of erpc/health/metrics.go (opens in a new tab).
Here is a list of some of the most important metrics:
Metric | Type | Description |
---|---|---|
erpc_upstream_request_total | Counter | Total number of actual requests to upstreams. |
erpc_upstream_request_duration_seconds | Histogram | Duration of requests to upstreams. |
erpc_upstream_request_errors_total | Counter | Total number of errors for requests to upstreams. |
erpc_upstream_request_self_rate_limited_total | Counter | Total number of self-imposed rate limited requests before sending to upstreams. |
erpc_upstream_request_remote_rate_limited_total | Counter | Total number of remote rate limited requests by upstreams. |
erpc_upstream_request_skipped_total | Counter | Total number of requests skipped by upstreams. |
erpc_upstream_request_missing_data_error_total | Counter | Total number of requests where upstream is missing data or not synced yet. |
erpc_upstream_request_empty_response_total | Counter | Total number of empty responses from upstreams. |
erpc_upstream_block_head_lag | Gauge | Total number of blocks (head) behind the most up-to-date upstream. |
erpc_upstream_finalization_lag | Gauge | Total number of finalized blocks behind the most up-to-date upstream. |
erpc_upstream_score_overall | Gauge | Overall score of upstreams. |
erpc_upstream_latest_block_number | Gauge | Latest block number of upstreams. |
erpc_upstream_finalized_block_number | Gauge | Finalized block number of upstreams. |
erpc_upstream_cordoned | Gauge | Whether upstream is excluded from routing by selection policy. (0=uncordoned or 1=cordoned) |
erpc_upstream_stale_latest_block_total | Counter | Total number of times an upstream returned a stale latest block number (vs others). |
erpc_upstream_stale_finalized_block_total | Counter | Total number of times an upstream returned a stale finalized block number (vs others). |
erpc_upstream_evm_get_logs_stale_upper_bound_total | Counter | Total number of times eth_getLogs was skipped due to upstream latest block being less than requested toBlock. |
erpc_upstream_evm_get_logs_stale_lower_bound_total | Counter | Total number of times eth_getLogs was skipped due to fromBlock being less than upstream's available block range. |
erpc_upstream_evm_get_logs_range_exceeded_auto_splitting_threshold_total | Counter | Total number of times eth_getLogs request exceeded the block range threshold and needed splitting (based on upstream config for "upstream.evm.getLogsAutoSplittingRangeThreshold"). |
erpc_upstream_evm_get_logs_forced_splits_total | Counter | Total number of eth_getLogs request splits by dimension (block_range, addresses, topics), due to a complain/error from upstream (e.g. "Returned too many results use a smaller block range"). |
erpc_upstream_evm_get_logs_split_success_total | Counter | Total number of successful split eth_getLogs sub-requests. |
erpc_upstream_evm_get_logs_split_failure_total | Counter | Total number of failed split eth_getLogs sub-requests. |
erpc_upstream_latest_block_polled_total | Counter | Total number of times the latest block was pro-actively polled from an upstream. |
erpc_upstream_finalized_block_polled_total | Counter | Total number of times the finalized block was pro-actively polled from an upstream. |
erpc_network_request_received_total | Counter | Total number of requests received by the network. |
erpc_network_multiplexed_request_total | Counter | Total number of multiplexed requests received by the network. |
erpc_network_failed_request_total | Counter | Total number of failed requests received by the network. |
erpc_network_request_self_rate_limited_total | Counter | Total number of self-imposed rate limited requests before sending to upstreams. |
erpc_network_successful_request_total | Counter | Total number of successful requests received by the network. |
erpc_network_cache_hits_total | Counter | Total number of cache hits for requests received by the network. |
erpc_network_cache_misses_total | Counter | Total number of cache misses for requests received by the network. |
erpc_network_request_duration_seconds | Histogram | Duration of requests received by the network. |
erpc_project_request_self_rate_limited_total | Counter | Total number of self-imposed rate limited requests towards the project. |
erpc_rate_limiter_budget_max_count | Gauge | Maximum number of requests allowed per second for a rate limiter budget |
erpc_auth_request_self_rate_limited_total | Counter | Total number of self-imposed rate limited requests due to auth config for a project. |
erpc_cache_set_success_total | Counter | Total number of cache set operations. |
erpc_cache_set_error_total | Counter | Total number of cache set errors. |
erpc_cache_set_skipped_total | Counter | Total number of cache set skips. |
erpc_cache_get_success_hit_total | Counter | Total number of cache get hits. |
erpc_cache_get_success_miss_total | Counter | Total number of cache get misses. |
erpc_cache_get_error_total | Counter | Total number of cache get errors. |
erpc_cache_get_skipped_total | Counter | Total number of cache get skips (i.e. no matching policy found). |
erpc_cors_requests_total | Counter | Total number of CORS requests received. |
erpc_cors_preflight_requests_total | Counter | Total number of CORS preflight requests received. |
erpc_cors_disallowed_origin_total | Counter | Total number of CORS requests from disallowed origins. |
PromQL examples
# Request rate per second by network over last 5 minutes
sum(rate(erpc_network_request_received_total{}[5m])) by (network)
# Total daily requests by project and network
sum(increase(erpc_network_request_received_total{}[24h])) by (project, network)
# Top 5 project and networks by request volume
topk(5, sum(rate(erpc_network_request_received_total{}[5m])) by (project, network))
# Error rate percentage by network and upstream
100 * sum(rate(erpc_upstream_request_errors_total{}[5m])) by (network, upstream) /
sum(rate(erpc_upstream_request_total{}[5m])) by (network, upstream)
# Top error types in the last hour
topk(10, sum(increase(erpc_upstream_request_errors_total{}[1h])) by (error))
# Missing data errors by network and upstream
sum(rate(erpc_upstream_request_missing_data_error_total{}[5m])) by (network, upstream)
# 95th percentile request duration by network
histogram_quantile(0.95, sum(rate(erpc_network_request_duration_seconds_bucket{}[5m])) by (le,network))
# Average request duration for eth_call methods
sum(rate(erpc_upstream_request_duration_seconds_sum{category="eth_call"}[5m])) by (network, upstream) /
sum(rate(erpc_upstream_request_duration_seconds_count{category="eth_call"}[5m])) by (network, upstream)
# Identify slow upstreams (avg duration > 500ms)
sum(rate(erpc_upstream_request_duration_seconds_sum{}[5m])) by (network, upstream) /
sum(rate(erpc_upstream_request_duration_seconds_count{}[5m])) by (network, upstream) > 0.5
# Cache hit ratio by network
sum(rate(erpc_network_cache_hits_total{}[5m])) by (network) /
(
sum(rate(erpc_network_cache_hits_total{}[5m])) by (network) +
sum(rate(erpc_network_cache_misses_total{}[5m])) by (network)
)
# Cache miss rate for eth_getBlockByNumber
rate(erpc_network_cache_misses_total{category="eth_getBlockByNumber"}[5m])
# Self rate-limited requests by project and network
sum(rate(erpc_network_request_self_rate_limited_total{}[5m])) by (project,network)
# Authentication rate limiting by strategy
sum(rate(erpc_auth_request_self_rate_limited_total{strategy="jwt"}[5m])) by (project)
# Remote rate limiting from upstreams
sum(rate(erpc_upstream_request_remote_rate_limited_total{}[5m])) by (upstream)
# Block lag by network and upstream
max(erpc_upstream_block_head_lag) by (network,upstream)
# Finalization lag alert (lag > 5 blocks)
max(erpc_upstream_finalization_lag) by (network) > 5
# Block height difference between upstreams
max(erpc_upstream_latest_block_number) by (network) -
min(erpc_upstream_latest_block_number) by (network)
# Overall upstream health score
avg(erpc_upstream_score_overall) by (network, upstream)
# CORS issues by origin
sum(rate(erpc_cors_disallowed_origin_total{}[5m])) by (project, origin)