Failsafe

Failsafe policies help with intermittent issues and increase resiliency. They can be configured at both Network and Upstream levels, with support for per-method configuration.

Available policies

timeout: prevents requests from hanging indefinitely
retry: recovers from transient failures
hedge: runs parallel requests when upstreams are slow
circuitBreaker: temporarily removes failing upstreams
consensus: verifies multiple upstreams agree on results
Integrity increases data quality for specific methods

Per-method configuration

Failsafe policies can optionally be configured per-method using matchMethod and matchFinality fields. This allows fine-tuned behavior for different RPC methods and different block finality states.

matchMethod: Pattern to match RPC methods (a matcher supports wildcards * and OR operator |)
matchFinality: Array of finality states to match

When multiple failsafe configs are defined, they are evaluated in order and the first matching config is used.

Finality States

The matchFinality field can match against these data finality states:

finalized: Data from blocks that are confirmed as finalized and safe from reorgs. This is determined by comparing the block number with the upstream's finalized block.
- Example methods: eth_getBlockByNumber (for old blocks), eth_getLogs (for finalized ranges)
- Use case: Can have relaxed failsafe policies since data won't change
unfinalized: Data from recent blocks that could still be reorganized. Also includes any data from pending blocks.
- Example methods: eth_getBlockByNumber("latest"), eth_call with recent blocks
- Use case: May need more aggressive retries and shorter timeouts
realtime: Data that changes frequently, typically with every new block.
- Example methods: eth_blockNumber, eth_gasPrice, eth_maxPriorityFeePerGas, net_peerCount
- Use case: Often needs fast timeouts and may benefit from hedging
unknown: When the block number cannot be determined from the request/response.
- Example methods: eth_getTransactionByHash, trace_transaction, debug_traceTransaction
- Use case: Data is typically immutable once included, but block context is unknown

erpc.yaml

projects:
  - id: main
    upstreams:
      - id: my-upstream
        failsafe:
          # Default policy for all methods
          - matchMethod: "*"        # matches any method (default if omitted)
            timeout:
              duration: 30s
            retry:
              maxAttempts: 3
          
          # Fast timeout for simple queries
          - matchMethod: "eth_getBlock*|eth_getTransaction*"
            timeout:
              duration: 5s
            retry:
              maxAttempts: 2
              delay: 100ms
          
          # Longer timeout for heavy trace methods
          - matchMethod: "trace_*|debug_*"
            timeout:
              duration: 60s
            retry:
              maxAttempts: 1    # expensive operations, minimize retries
          
          # Different policy for finalized vs unfinalized data
          - matchMethod: "eth_call|eth_estimateGas"
            matchFinality: ["unfinalized", "realtime"]
            timeout:
              duration: 10s
            retry:
              maxAttempts: 5    # unfinalized data changes frequently, retry more

`timeout` policy

Sets a timeout for requests. Network-level timeout applies to the entire request lifecycle (including retries), while upstream-level timeout applies to each individual attempt.

erpc.yaml

projects:
  - id: main
    networks:
      - architecture: evm
        evm:
          chainId: 42161
        failsafe:
          - matchMethod: "*"
            timeout:
              duration: 30s        # Total time including all retries
    
    upstreams:
      - id: blastapi-chain-42161
        failsafe:
          - matchMethod: "*"
            timeout:
              duration: 15s        # Per-attempt timeout

`retry` policy

Automatically retries failed requests with configurable backoff strategies.

Retryable Errors

5xx server errors (intermittent issues)
408 request timeout
429 rate limit exceeded
Empty responses for certain methods (e.g., eth_getLogs when node is lagging)

Non-Retryable Errors

4xx client errors (invalid requests)
Unsupported method errors

erpc.yaml

projects:
  - id: main
    upstreams:
      - id: my-upstream
        failsafe:
          - matchMethod: "*"
            retry:
              maxAttempts: 3        # Total attempts (initial + 2 retries)
              delay: 1000ms         # Initial delay between retries
              backoffMaxDelay: 10s  # Maximum delay after backoff
              backoffFactor: 0.3    # Exponential backoff multiplier
              jitter: 500ms         # Random jitter (0-500ms) to prevent thundering herd

`hedge` policy

Starts parallel requests when an upstream is slow to respond. Highly recommended at network level for optimal performance.

Quantile-based hedging (recommended) uses response time statistics to determine optimal hedge timing, while fixed-delay hedging uses a static delay.

erpc.yaml

projects:
  - id: main
    networks:
      - architecture: evm
        evm:
          chainId: 1
        failsafe:
          - matchMethod: "*"
            hedge:
              # Quantile-based (recommended): hedge after p99 response time
              quantile: 0.99
              minDelay: 100ms     # Minimum wait before hedging
              maxDelay: 2s        # Maximum wait before hedging
              maxCount: 1         # Max parallel hedged requests
              
              # Alternative: Fixed-delay hedging
              # delay: 500ms
              # maxCount: 1

Monitor effectiveness via Prometheus metrics:

erpc_network_hedged_request_total - total hedged requests
erpc_network_hedge_discards_total - wasted hedges (original responded first)

`circuitBreaker` policy

Temporarily removes consistently failing upstreams to allow recovery time.

Circuit breaker states:

Closed: Normal operation, upstream is healthy
Open: Upstream is failing, temporarily removed from rotation
Half-open: Testing if upstream has recovered with limited traffic

erpc.yaml

projects:
  - id: main
    upstreams:
      - id: my-upstream
        failsafe:
          - matchMethod: "*"
            circuitBreaker:
              # Open circuit when 80% (160/200) of recent requests fail
              failureThresholdCount: 160
              failureThresholdCapacity: 200
              halfOpenAfter: 60s          # Try recovery after 1 minute
              # Close circuit when 80% (8/10) succeed in half-open state
              successThresholdCount: 8
              successThresholdCapacity: 10

Real-World Examples

High-Performance DeFi Configuration

erpc.yaml

projects:
  - id: defi-prod
    networks:
      - architecture: evm
        evm:
          chainId: 1
        failsafe:
          # Aggressive hedging for all methods
          - matchMethod: "*"
            hedge:
              quantile: 0.9      # p90 latency
              minDelay: 50ms
              maxCount: 2        # Up to 2 parallel hedges
            timeout:
              duration: 10s
    
    upstreams:
      - id: primary-node
        failsafe:
          # Price feeds need fast response
          - matchMethod: "eth_call"
            matchFinality: ["latest"]
            timeout:
              duration: 1s
            retry:
              maxAttempts: 1     # No time for retries
          
          # Block data can be slower but must succeed
          - matchMethod: "eth_getBlock*"
            timeout:
              duration: 5s
            retry:
              maxAttempts: 5
              delay: 100ms

Finality-Based Configuration

erpc.yaml

failsafe:
  # Finalized data: relaxed policies
  - matchMethod: "*"
    matchFinality: ["finalized"]
    timeout:
      duration: 30s
    retry:
      maxAttempts: 5
      backoffFactor: 2
  
  # Unfinalized data: aggressive timeouts
  - matchMethod: "*"
    matchFinality: ["unfinalized"]
    timeout:
      duration: 5s
    retry:
      maxAttempts: 2
      delay: 100ms
  
  # Realtime data: fast with hedging
  - matchMethod: "*"
    matchFinality: ["realtime"]
    timeout:
      duration: 2s
    hedge:
      delay: 500ms
      maxCount: 1
  
  # Unknown finality: moderate settings
  - matchMethod: "*"
    matchFinality: ["unknown"]
    timeout:
      duration: 15s
    retry:
      maxAttempts: 3

Indexer Configuration

erpc.yaml

projects:
  - id: indexer
    upstreams:
      - id: archive-node
        failsafe:
          # Bulk log queries need long timeouts
          - matchMethod: "eth_getLogs"
            timeout:
              duration: 120s
            retry:
              maxAttempts: 3
              backoffFactor: 2
          
          # Trace methods are expensive but critical
          - matchMethod: "trace_*|arbtrace_*"
            timeout:
              duration: 180s
            retry:
              maxAttempts: 2
            circuitBreaker:
              failureThresholdCount: 10    # More tolerant for slow methods
              failureThresholdCapacity: 20
              halfOpenAfter: 5m

Disabling Policies

To disable any policy, set it to null or ~ (YAML):

erpc.yaml

failsafe:
  - matchMethod: "*"
    hedge: ~              # Disable hedging
    circuitBreaker: ~     # Disable circuit breaker

CORS Circuit breaker

Failsafe

Available policies

Per-method configuration

Finality States

timeout policy

retry policy

Retryable Errors

Non-Retryable Errors

hedge policy

circuitBreaker policy

Real-World Examples

High-Performance DeFi Configuration

Finality-Based Configuration

Indexer Configuration

Disabling Policies

`timeout` policy

`retry` policy

`hedge` policy

`circuitBreaker` policy