Integrity

When requesting latest block or logs data from RPC nodes, there's no guarantee that the responding node has the most recent data. eRPC has a built-in mechanism to track highest known block number across all upstreams and enforce additional checks on certain methods such as:

eth_getBlockByNumber
eth_blockNumber
eth_getLogs

How does eRPC handle data integrity?

There are various mechanisms that aim to keep the data integrity as high as possible:

Retry directives such as "retry-empty" or "retry-pending" where request will be sent to another node if one returns an empty or pending response.
Integrity module utilizes various components (e.g. EVM state poller) running for every Upstream, to track latest/finalized blocks and intercept certain methods (e.g. eth_blockNumber or eth_getBlockByNumber(latest)) to ensure the highest known block number is used. This behavior is essential for use-cases like "indexers", to always receive the highest known block number.
Selection policy is used to exclude nodes that have too much block-head lag.

Config

In this page we focus on the (2.) integrity module and its configuration options:

projects:
  - id: main
 
    # Enable for all networks:
    networkDefaults:
      evm:
        integrity:
          # eth_blockNumber and eth_getBlockByNumber if the chosen upstream returns a block older than the highest known block,
          # it will return the highest known block instead.
          # This _might_ cause an extra upstream request for eth_getBlockByNumber(latest) calls.
          enforceHighestBlock: true
 
          # When eth_getLogs end-range is a specific blockNumber ensure only upstreams whose
          # latest block is equal or more handle it, otherwise return an error (i.e. MissingData) for client to retry.
          # This might cause more requests if evm state poller has lagging latest block vs eth_getLogs end-range;
          # it _might_ send an additional eth_blockNumber to other upstreams to see which one definitively has the requested range.
          enforceGetLogsBlockRange: true
 
    # Enable for a specific network:
    networks:
      - type: evm
        evm:
          chainId: 137
          integrity:
            enforceHighestBlock: true
            enforceGetLogsBlockRange: true

How highest-known block number is tracked?

For every upstream, eRPC has a running state poller running in the background to fetch the latest/finalized block number regularly (plus ad-hoc scenarios explained below). The state poller interval can be configured with:

projects:
  - id: main
 
    # Configure for all upstreams:
    upstreamDefaults:
      evm:
        statePollerInterval: 30s # Default: 30s
        statePollerDebounce: 5s # Default: 5s (or equal to block time if the chainId is a known chain)
 
    # Configure per-upstream:
    upstreams:
      - endpoint: https://xxxxxxxxxx.xyz/chain-137
        evm:
          chainId: 42161
          statePollerInterval: 30s
          statePollerDebounce: 5s

Additionally these are the ad-hoc scenarios where highest-known block number is updated:

When a call to eth_blockNumber is made, the result will be checked against existing highest-known block number, and if it's higher it updates the state poller proactivley.
When a call to eth_getBlockByNumber(latest) is made, the result will be checked against existing highest-known block number, and if it's higher it updates the state poller proactivley.
When a call to eth_getLogs is made and range-end is a block higher than chosen upstream's latest block, a proactive poll happens on state poller (respecting the statePollerDebounce interval) to update the highest-known block number. If still upstream is behind the requested range-end then it will be skipped.

In the config above statePollerInterval is the interval to poll the latest/finalized block number from the upstream. statePollerDebounce is the minimum interval required before next poll happens, this is mainly useful to prevent redundant calls when proactive 'fetch latest block' polls are triggered, as explained for eth_getLogs. The statePollerDebounce value ideally must be equal to block time of the chain (or lower to be safe).

How is large ranges handled?

eRPC has three mechanisms to deal with large ranges for eth_getLogs:

Manual splitting based on upstream.evm.getLogsAutoSplittingRangeThreshold config per-upstream when you set what is the max block range they support.
Automatic splitting when a node returns an error that indicates the range is too large.
Hard limit on below dimensions set per upstream (or globally with upstreamDefaults) will reject getLogs request if it is beyond the limit and returns an error to the caller of eRPC.
- Max allowed range: set with upstream.evm.GetLogsMaxAllowedRange
- Max allowed addresses: set with upstream.evm.GetLogsMaxAllowedAddresses
- Max allowed topics: set with upstream.evm.GetLogsMaxAllowedTopics

`eth_blockNumber` behavior

This method is intercepted by the integrity module to ensure the highest known block number is returned.

First request will be sent to the chosen upstream.
The response will be checked against highest-known block number (across all upstreams state pollers).
- During this check if any upstream's "last update to latest block" is older than statePollerDebounce it will try to fetch the latest block from the upstream (i.e. we make sure we have most recent view of upstream's latest block).
If the response is older than the highest-known block number, the response will be replaced with the highest-known block number.

Relevant Prometheus metrics:

erpc_upstream_stale_latest_block_total - Total number of times an upstream returned a stale latest block number (vs others).
erpc_upstream_stale_finalized_block_total - Total number of times an upstream returned a stale finalized block number (vs others).

`eth_getBlockByNumber` behavior

This method is intercepted by the integrity module to ensure the highest known block number is used for "latest" and "finalized" block tags.

First request will be sent to the chosen upstream.
The response will be checked against highest-known block number (across all upstreams state pollers).
- During this check if any upstream's "last update to latest block" is older than statePollerDebounce it will try to fetch the latest block from the upstream (i.e. we make sure we have most recent view of upstream's latest block).
If the response is older than the highest-known block number, then the request will be sent to all 'other' upstreams (excluding the current one).

Relevant Prometheus metrics:

erpc_upstream_stale_latest_block_total - Total number of times an upstream returned a stale latest block number (vs others).
erpc_upstream_stale_finalized_block_total - Total number of times an upstream returned a stale finalized block number (vs others).

`eth_getLogs` behavior

This method is intercepted by the integrity module to ensure fromBlock and toBlock parameters are within the available block range of the chosen upstream.

We check the toBlock against the chosen upstream's latest block number. If latest block is stale (previously updated < statePollerDebounce) then we force a poll to update the latest block number.
If the toBlock is still higher than the latest block number, then we skip to the next upstream.
We check the fromBlock is within the range of the chosen upstream relative to its "maxAvailableRecentBlocks" if configured, otherwise we skip to the next upstream.
If both fromBlock and toBlock are within the range of the chosen upstream, then request is sent to the chosen upstream.

Relevant Prometheus metrics:

erpc_upstream_evm_get_logs_stale_upper_bound_total - Total number of times eth_getLogs was skipped due to upstream latest block being less than requested toBlock.
erpc_upstream_evm_get_logs_stale_lower_bound_total - Total number of times eth_getLogs was skipped due to fromBlock being less than upstream's available block range.
erpc_upstream_evm_get_logs_range_exceeded_auto_splitting_threshold_total - Total number of times eth_getLogs request exceeded the block range threshold and needed splitting (based on upstream config for "upstream.evm.getLogsAutoSplittingRangeThreshold").
erpc_upstream_evm_get_logs_forced_splits_total - Total number of eth_getLogs request splits by dimension (block_range, addresses, topics), due to a complain/error from upstream (e.g. "Returned too many results use a smaller block range").

Why eRPC?

Integrity

How does eRPC handle data integrity?

Config

How highest-known block number is tracked?

How is large ranges handled?

eth_blockNumber behavior

eth_getBlockByNumber behavior

eth_getLogs behavior

`eth_blockNumber` behavior

`eth_getBlockByNumber` behavior

`eth_getLogs` behavior