Failsafe
AIOpen as plain markdown for AIFailsafe policies handle intermittent upstream issues — timeouts, slowdowns, rate limits, transient errors, disagreement between upstreams. They live on networks (the whole request lifecycle, including failover across upstreams) and on upstreams (one attempt against one endpoint).
Six policies
Each one has its own page. This page covers what's common to all of them: scoping, defaults, the observability layer that records every attempt.
- Timeout — bound how long a request may take. Fixed or quantile-adaptive.
- Retry — replay transient failures with backoff. Empty-result and block-unavailable get separate knobs.
- Hedge — race a backup request when the primary is slow.
- Circuit breaker — temporarily remove an upstream after sustained failure.
- Consensus — query multiple upstreams in parallel and require agreement.
- Integrity — empty-response handling and data-correctness checks.
Scoping each policy
Every entry in failsafe[] can be scoped by method and by finality. Entries are evaluated in order — the first whose matchMethod + matchFinality matches the request wins.
matchMethod— matcher syntax:*(wildcard),|(OR),!(NOT). E.g."eth_call|trace_*","!debug_*".matchFinality— list of finality states. Omit to match every finality.
projects: - id: main networks: - architecture: evm evm: chainId: 1 failsafe: # Per-method scoping: heavy methods get a longer ceiling. - matchMethod: "trace_*|debug_*" timeout: { duration: 60s } retry: { maxAttempts: 1 } # Finality scoping: realtime data deserves shorter timeouts and aggressive hedges. - matchMethod: "*" matchFinality: ["realtime", "unfinalized"] timeout: { duration: 5s } retry: { maxAttempts: 3, delay: 100ms } hedge: { quantile: 0.9, minDelay: 50ms, maxCount: 2 } # Catch-all (matched last). - matchMethod: "*" timeout: { duration: 30s } retry: { maxAttempts: 3, delay: 0ms }Finality states
matchFinality accepts these four values. There is no latest value — that's a block tag, not a finality state. Using matchFinality: ["latest"] silently never matches.
| State | What it means |
|---|---|
finalized | Block past the chain's finalization horizon. Safe from reorgs. e.g. eth_getBlockByNumber on an old block, finalized eth_getLogs ranges. Relaxed failsafe is fine. |
unfinalized | Recent block that could still reorg. Pending-block data also counts. e.g. eth_getBlockByNumber("latest") on a fresh block. May need more aggressive retries and shorter timeouts. |
realtime | Data that updates every block: eth_blockNumber, eth_gasPrice, eth_maxPriorityFeePerGas, net_peerCount. Short timeouts + hedge are common. |
unknown | Block number not derivable from request/response: eth_getTransactionByHash, trace_transaction, debug_traceTransaction. Data is typically immutable once mined; block context just isn't surfaced. |
matchFinality: ["latest"] is invalid. latest is a block tag, not a finality state. Use realtime or unfinalized instead.
Where each policy is valid
| Policy | Network level | Upstream level | Notes |
|---|---|---|---|
timeout | ✅ | ✅ | Network timeout covers the full lifecycle (including every upstream retry). Upstream timeout bounds one attempt. |
retry | ✅ | ✅ | Network-level retries rotate across upstreams. Upstream-level retries hit the same upstream. Empty-result retries (emptyResultAccept, etc.) only fire at the network level. |
hedge | ✅ | (no-op) | Hedge races across upstreams; setting it at the upstream level is meaningless. |
circuitBreaker | ❌ | ✅ | Trips one upstream out of the rotation; the network's selection policy then routes elsewhere. |
consensus | ✅ | ✅ | Most commonly network-level; per-upstream usage is rare. |
Disabling a policy
Set the policy's value to null (YAML) or undefined/omit (TypeScript) to opt out of a default that would otherwise apply.
failsafe:
- matchMethod: "*"
timeout: { duration: 30s }
retry: null # explicitly disable retry on this methodPer-attempt observability
Every request carries a full execution trace exposed via trace spans, Prometheus metrics, and HTTP response headers. Useful for debugging retry/hedge/consensus decisions without server-side traces.
Trace span attributes
The Network.Forward span carries:
execution.attempts/execution.retries/execution.hedges(totals across all scopes)execution.network_attempts/execution.network_retries/execution.network_hedgesupstreams.tried— ordered list of upstream IDs touchedupstreams.outcomes— per-attempt outcome:success/empty/transport_error/server_error/client_error/rate_limited/missing_data/exec_revert/block_unavailable/breaker_open/cancelled/timeout/skippedupstreams.reasons— why each upstream was selected:primary/retry/hedge/consensus_slot/sweepupstreams.durations_ms
Each individual attempt also produces Upstream.tryForward.SendRequest and Upstream.forwardAttempt child spans with upstream.id, request.method, attempt counters, and the per-attempt outcome classification.
HTTP response headers
The same trace is mirrored into HTTP response headers for client-side debugging. Headers are emitted on every response path (success, JSON-RPC error, validation reject, auth reject, rate-limit). Default mode is all.
| Header | Mode | Description |
|---|---|---|
X-ERPC-Cache | summary, all | HIT / MISS |
X-ERPC-Upstream | summary, all | Winning upstream ID (single-winner case) |
X-ERPC-Duration | summary, all | Wall-clock ms |
X-ERPC-Attempts | summary, all | Total physical operations across all scopes (Upstream + Cache) |
X-ERPC-Upstream-Attempts / -Retries / -Hedges | summary, all | Upstream-scope counters |
X-ERPC-Network-Attempts / -Retries / -Hedges | summary, all | Network-scope rotation and retry counters |
X-ERPC-Cache-Attempts / -Retries / -Hedges | summary, all (when non-zero) | Cache-scope counters |
X-ERPC-Consensus-Slots / -Disputes / -Low-Participants | all (when non-zero) | Consensus participation counters |
X-ERPC-Upstreams | all | Per-attempt participation log (see format below) |
X-ERPC-Upstreams format: each segment is <id>=<reason>:<outcome>:<duration>ms[:won], joined by ;:
X-ERPC-Upstreams: alchemy=primary:success:50ms:won;quicknode=hedge:timeout:5000ms;drpc=consensus_slot:exec_revert:20ms:won is present when this attempt contributed to the final response. For single-winner requests exactly one segment carries :won; for consensus every participant in the winning agreement group does.
Toggle via server.executionHeaders:
server:
executionHeaders: all # default — full per-attempt trace
# executionHeaders: summary # counters only (no X-ERPC-Upstreams slice)
# executionHeaders: off # no X-ERPC-* headers at allCommon pitfalls
matchFinality: ["latest"]— silently never matches. Valid values arefinalized,unfinalized,realtime,unknown.- Mixing upstream + network retry without thinking about the product —
upstream.retry.maxAttempts: 3×network.retry.maxAttempts: 3= up to 9 attempts per request. Easy to accidentally 9× your upstream traffic. - Network timeout shorter than
upstream.timeout × maxAttempts— the network gives up before the upstream's retry budget is exhausted. Set the network timeout generously. circuitBreakerat network level — silently ignored; only valid at the upstream level.emptyResultIgnoreis deprecated — rename toemptyResultAccept. For network-wide empty-retry control, usedirectiveDefaults.retryEmpty: false(or per-request?retryEmpty=false).- Single-object legacy
failsafe: { ... }form — still accepted, but the array form withmatchMethod: "*"is canonical. The single-object form is implicitlymatchMethod: "*". retry.delay: 0msdoesn't disable retry — it means "no wait between attempts". UsemaxAttempts: 1to disable retry entirely.- Write methods aren't retried even when retry is configured. Set
network.evm.idempotentTransactionBroadcast: trueif you wanteth_sendRawTransactionto be safe under retry/hedge.
Copy for your AI assistant — failsafe scoping & observability referenceExpand for every option, default, and edge case — or copy this entire section into your AI assistant.
FailsafeConfig — top-level fields
| Field | Type | Notes |
|---|---|---|
matchMethod | string | Matcher pattern. Defaults to "*". Supports * (wildcard), | (OR), ! (NOT). |
matchFinality | ("finalized"|"unfinalized"|"realtime"|"unknown")[] | When omitted, matches every finality. Do not use "latest" here — that's a block tag, not a state. |
timeout | TimeoutPolicyConfig | See Timeout policy. |
retry | RetryPolicyConfig | See Retry policy. |
hedge | HedgePolicyConfig | See Hedge policy. |
circuitBreaker | CircuitBreakerPolicyConfig | Upstream-only. See Circuit breaker. |
consensus | ConsensusPolicyConfig | See Consensus. |
Each policy is independent — you can set any subset on a single failsafe[] entry. Evaluation order within failsafe[] is top-to-bottom; first match wins.
Where each policy lives — at a glance
| Policy | Network | Upstream | Cache (failsafeForGets/failsafeForSets) |
|---|---|---|---|
timeout | ✅ | ✅ | ✅ |
retry | ✅ | ✅ | ✅ |
hedge | ✅ | (no-op) | ✅ |
circuitBreaker | ❌ | ✅ | ❌ |
consensus | ✅ | (rare) | ❌ |
Retryable vs non-retryable errors (canonical list)
Retryable (retry will replay these):
- HTTP
5xxfrom the upstream - HTTP
408(request timeout) - HTTP
429(rate limit) — but preferrateLimitAutoTunefor sustained pressure - Network errors (TCP reset, DNS failure)
- Empty responses for methods NOT in
retry.emptyResultAccept, whenretryEmptydirective is set - Block-unavailable conditions where the request's block reference is beyond every upstream's known head
Non-retryable (single-attempt; never retried):
- HTTP
4xxother than408/429 MethodNotSupportedfrom the upstream- Empty responses for methods in
retry.emptyResultAcceptat-or-belowemptyResultConfidencehorizon - Write methods (
eth_sendRawTransaction,eth_sendTransaction) — unlessevm.idempotentTransactionBroadcastis enabled on the network
Per-method scoping recipes
Different policy per finality:
failsafe:
- matchMethod: "*"
matchFinality: ["realtime", "unfinalized"]
timeout: { duration: 5s }
retry: { maxAttempts: 3, delay: 100ms }
- matchMethod: "*"
matchFinality: ["finalized"]
timeout: { duration: 60s } # tolerate long backfill reads
retry: { maxAttempts: 5, delay: 200ms }
- matchMethod: "*"
matchFinality: ["unknown"] # tx-hash keyed (receipts, traces by hash)
timeout: { duration: 30s }
retry: { maxAttempts: 3 }Different policy per method group:
failsafe:
- matchMethod: "trace_*|debug_*" # expensive — don't multiply
timeout: { duration: 60s }
retry: { maxAttempts: 1 }
- matchMethod: "eth_getLogs"
timeout: { duration: 30s }
retry: { maxAttempts: 3, delay: 100ms }
- matchMethod: "*"
timeout: { duration: 15s }
retry: { maxAttempts: 3 }Real-world example — high-throughput DeFi with hedging
Aggressive hedge across upstreams at network level; per-method fine-tuning at upstream level.
projects:
- id: defi-prod
networks:
- architecture: evm
evm: { chainId: 1 }
failsafe:
- matchMethod: "*"
hedge: { quantile: 0.9, minDelay: 50ms, maxCount: 2 }
timeout: { duration: 10s }
upstreams:
- id: primary-node
endpoint: https://primary.example
failsafe:
# Price-feed reads — fast and unforgiving
- matchMethod: "eth_call"
matchFinality: ["realtime", "unfinalized"]
timeout: { duration: 1s }
retry: { maxAttempts: 1 }
# Block lookups — slower but must succeed
- matchMethod: "eth_getBlock*"
timeout: { duration: 5s }
retry: { maxAttempts: 5, delay: 100ms }Real-world example — indexer chasing tip with broad empty retries
Network-wide retry-empty (caches will catch up shortly), tight per-method scoping for the long-tail backfill.
projects:
- id: indexer
networks:
- architecture: evm
evm: { chainId: 1 }
directiveDefaults:
retryEmpty: true # treat empty as retryable across the board
failsafe:
- matchMethod: "eth_getLogs|eth_call"
retry:
maxAttempts: 5
delay: 100ms
backoffFactor: 1.2
jitter: 50ms
emptyResultConfidence: finalizedBlock
emptyResultMaxAttempts: 2
- matchMethod: "*"
retry: { maxAttempts: 3, delay: 200ms }
timeout: { duration: 30s }