# Failsafe > Source: https://docs.erpc.cloud/config/failsafe > Six composable failsafe policies that keep every RPC request succeeding — even when upstreams are slow, wrong, or temporarily down. > Format: machine-readable markdown export of the docs page above. > All collapsible AI sections are inlined and fully expanded. # Failsafe A broken upstream becomes invisible to your callers. eRPC wraps every request in six independently tunable policies — retry, hedge, timeout, circuit breaker, consensus, and integrity — that run in a fixed, deterministic chain. Configure once per method or finality tier; let eRPC handle the rest. - **[Timeout](/config/failsafe/timeout.llms.txt)** — Bound how long a request may take — static or quantile-adaptive, at the network lifecycle or single-upstream level. - **[Retry](/config/failsafe/retry.llms.txt)** — Replay transient failures with exponential backoff and jitter. Separate knobs for hard errors vs. data-unavailability conditions. - **[Hedge](/config/failsafe/hedge.llms.txt)** — Race a speculative duplicate request to a second upstream when the primary is slow — quantile-adaptive delay with min/max guardrails. - **[Circuit breaker](/config/failsafe/circuit-breaker.llms.txt)** — Per-upstream state machine that stops routing to a failing endpoint after a rolling-window failure threshold, then self-heals. - **[Consensus](/config/failsafe/consensus.llms.txt)** — Fan out to N upstreams, group identical responses by canonical hash, and select a winner — with misbehavior tracking and punishment. - **[Integrity](/config/failsafe/integrity.llms.txt)** — Layered EVM response-validation rules that discard stale, malformed, or logically inconsistent upstream responses before they reach callers. ## Agent reference Copy one of these prompts into your AI agent session (Claude Code, Cursor, …) — each one points the agent at this page's machine-readable reference so it can do the work correctly: **Prompt Example #1: make my RPC survive provider outages** ```text My app uses a single RPC provider and goes down when it has an outage. Configure eRPC's failsafe layer in my eRPC config — retry, hedge, timeout, and circuit breaker — so requests automatically route around failures without my callers noticing. Read the full reference: https://docs.erpc.cloud/config/failsafe.llms.txt ``` **Prompt Example #2: tune failsafe policies for a latency-sensitive workload** ```text My frontend makes a lot of eth_call and eth_getLogs requests and I want to minimize p99 latency while still having retry coverage for transient errors. Tune the failsafe entries in my eRPC config — including per-method matchMethod and matchFinality — so realtime calls get short timeouts and archival calls get longer budgets. Reference: https://docs.erpc.cloud/config/failsafe.llms.txt ``` **Prompt Example #3: debug ErrFailsafeConfiguration at startup** ```text eRPC is crashing at startup with ErrFailsafeConfiguration. Inspect my eRPC config and find any circuit breaker blocks placed at network scope or consensus blocks placed at upstream scope — these are the two scope mismatch errors that cause this. Fix each one and explain why the scope restriction exists. Reference: https://docs.erpc.cloud/config/failsafe.llms.txt ``` **Prompt Example #4: add alerts for failsafe effectiveness** ```text I want Prometheus alerts to catch when eRPC's failsafe layer is under stress: high retry rates, frequent circuit-breaker trips, or hedge fire rates spiking. Using the metrics in my eRPC config's Prometheus output, write PromQL alert expressions for each condition and explain the thresholds. Reference: https://docs.erpc.cloud/config/failsafe.llms.txt ``` --- ### Resilience — full agent reference ### How it works **Executor composition.** The non-consensus executor chain is `retry(hedge(runUpstreamSweep))`; with consensus active it becomes `consensus(retry(hedge(tryOneUpstream)))`. The network-scope timeout wraps the entire `networkExecutor.Run` call — so it bounds ALL retries and hedges, not individual attempts. ([`erpc/network_executor.go:L183-203`](https://github.com/erpc/erpc/blob/main/erpc/network_executor.go#L183-L203)) **Scope restrictions.** - Circuit breakers are **upstream-scope only**. Placing a `circuitBreaker` block in a network-level `failsafe` entry causes startup failure with `ErrFailsafeConfiguration`. ([`erpc/network_executor.go:L68-73`](https://github.com/erpc/erpc/blob/main/erpc/network_executor.go#L68-L73)) - Consensus is **network-scope only**. Placing a `consensus` block in an upstream-level `failsafe` entry causes startup failure. ([`upstream/upstream_executor.go:L46-53`](https://github.com/erpc/erpc/blob/main/upstream/upstream_executor.go#L46-L53)) **Failsafe entry matching (`matchMethod` + `matchFinality`).** Each `failsafe[]` entry is matched top-to-bottom by a 4-tier priority (`SelectExecutor` in `common/match.go`): 1. Specific method + specific finality (highest priority) 2. Specific method, any finality 3. Wildcard method (`"*"`), specific finality 4. Wildcard method, any finality — catch-all (lowest priority) Within each tier, the first matching entry in config order wins. ([`common/match.go:L18-78`](https://github.com/erpc/erpc/blob/main/common/match.go#L18-L78)) **`matchFinality` valid values.** `finalized`, `unfinalized`, `realtime`, `unknown`. A value like `"latest"` is not a valid finality token and silently never matches any request. **Defaults merge algorithm.** When a network has its own `failsafe` array AND `networkDefaults.failsafe` is also set, each entry in the network array is matched against the defaults array using the same `WildcardMatch + MatchFinalities` algorithm. When a match is found, unset sub-fields (retry, hedge, timeout, etc.) inherit from the matched default entry. A catch-all default (`matchMethod: "*"`) acts as a universal base for all network-specific entries. If the network has no `failsafe` entries but the defaults do, the entire defaults array is cloned. ([`common/defaults.go:L1793-1832`](https://github.com/erpc/erpc/blob/main/common/defaults.go#L1793-L1832)) **Network-scope timeout is lifecycle-scoped, not per-attempt.** A 500ms network timeout with `maxAttempts: 3` has a 500ms total budget shared across all 3 retry rounds. There is no per-attempt timeout at the network scope. **`retryEmpty` directive is the master gate** for all empty/missing-data retries. When off (`false`, the default), neither the network scope nor the upstream scope retries empty/missing-data results, and the empty-as-error conversion hook is also suppressed. ([`erpc/network_executor.go:L401-404`](https://github.com/erpc/erpc/blob/main/erpc/network_executor.go#L401-L404)) ### Worked examples **1. Standard production failsafe — latency-sensitive workload.** Adaptive p70 hedge fires after the method's p70 latency; network retries up to 5 times on errors; network timeout bounds the entire request lifecycle: **Config path:** `projects[].networks[].failsafe[]` **YAML — `erpc.yaml`:** ```yaml failsafe: - matchMethod: "*" timeout: duration: 30s retry: maxAttempts: 5 delay: 200ms backoffFactor: 1.5 backoffMaxDelay: 5s hedge: delay: quantile: 0.7 min: 100ms max: 2s maxCount: 1 ``` **TypeScript — `erpc.ts`:** ```typescript failsafe: [{ matchMethod: "*", timeout: { duration: "30s" }, retry: { maxAttempts: 5, delay: "200ms", backoffFactor: 1.5, backoffMaxDelay: "5s", }, hedge: { delay: { quantile: 0.7, min: "100ms", max: "2s" }, maxCount: 1, }, }] ``` **2. Per-method finality tuning.** `eth_getLogs` on finalized data gets a longer timeout and more retries; all other finalized methods get a tighter budget: **Config path:** `projects[].networks[].failsafe[]` **YAML — `erpc.yaml`:** ```yaml failsafe: - matchMethod: "eth_getLogs" matchFinality: [finalized] timeout: duration: 60s retry: maxAttempts: 5 - matchMethod: "*" matchFinality: [finalized] timeout: duration: 10s retry: maxAttempts: 3 ``` **TypeScript — `erpc.ts`:** ```typescript failsafe: [ { matchMethod: "eth_getLogs", matchFinality: ["finalized"], timeout: { duration: "60s" }, retry: { maxAttempts: 5 }, }, { matchMethod: "*", matchFinality: ["finalized"], timeout: { duration: "10s" }, retry: { maxAttempts: 3 }, }, ] ``` **3. Upstream-scope circuit breaker.** Upstream-level failsafe adds an auto-healing circuit breaker; the network-level failsafe above still applies for retries and hedging: **Config path:** `projects[].upstreams[].failsafe[]` **YAML — `erpc.yaml`:** ```yaml failsafe: - matchMethod: "*" circuitBreaker: failureThresholdCount: 20 failureThresholdCapacity: 80 halfOpenAfter: 5m successThresholdCount: 8 successThresholdCapacity: 200 ``` **TypeScript — `erpc.ts`:** ```typescript failsafe: [{ matchMethod: "*", circuitBreaker: { failureThresholdCount: 20, failureThresholdCapacity: 80, halfOpenAfter: "5m", successThresholdCount: 8, successThresholdCapacity: 200, }, }] ``` ### Best practices - Place **circuit breakers at upstream scope only**; placing them at network scope causes `ErrFailsafeConfiguration` at startup. - Place **consensus at network scope only**; placing it at upstream scope also causes `ErrFailsafeConfiguration` at startup. - Always use a **catch-all defaults entry** (`matchMethod: "*"`) in `networkDefaults.failsafe` so method-specific entries inherit the base policy without repeating every field. - The **network-scope timeout** wraps all retries and hedges — always set it shorter than `server.maxTimeout` (default 150s), or it will never fire on its own. - Stacking `upstreams[].failsafe[].retry.maxAttempts: 3` with `networks[].failsafe[].retry.maxAttempts: 3` can produce up to 9 upstream calls per request — size budgets accordingly. - `emptyResultMaxAttempts` (default 2) is a **shared counter across ALL network retry rounds**, not per-round. With default `maxAttempts=5`, empty-type retries are capped at 1 extra call total. - Enable `retryEmpty: true` in `directiveDefaults` (or per-request via `X-ERPC-Retry-Empty: true`) if you need missing-data retries — it is off by default. ### Edge cases & gotchas 1. `matchFinality: ["latest"]` silently never matches. Valid values: `finalized`, `unfinalized`, `realtime`, `unknown`. 2. Stacking upstream retry (`maxAttempts: 3`) with network retry (`maxAttempts: 3`) can produce up to 9 upstream calls per request. 3. Write methods (`eth_sendTransaction`, filter-creation methods) are excluded from hedging. `eth_sendRawTransaction` is intentionally NOT excluded — it supports idempotent broadcast. 4. `emptyResultMaxAttempts` is a shared counter across all network retry rounds, not per-round. With the default value of 2, one original attempt plus one empty-type retry fires — at most 1 empty-type retry across the entire request lifetime. 5. Consensus `ignoreFields` is a set-replacement, not a merge. Setting any entry replaces the entire built-in default map (which suppresses `blockTimestamp` disagreements for `eth_getLogs` and receipts methods). 6. The circuit breaker's rolling window must fill to `failureThresholdCapacity` (default 80) before it can trip — early-startup behavior is Closed regardless of error rate. 7. `ErrFailsafeConfiguration` is startup-only — any scope mismatch (circuit breaker at network, consensus at upstream) aborts before any request is served. 8. Failsafe defaults merge uses **wildcard method matching**, not exact match. A default entry with `matchMethod: "eth_*"` will provide defaults for any network entry whose method starts with `eth_`. A catch-all `matchMethod: "*"` is required to cover all entries uniformly. ### Observability | Metric | Type | When it fires | |--------|------|---------------| | `erpc_network_request_received_total` | counter | Every inbound request reaching the network executor | | `erpc_network_timeout_fired_total` | counter | Network-scope or upstream-scope failsafe timeout exceeded | | `erpc_network_retry_attempt_total` | counter | Every network-scope retry round that fires | | `erpc_network_hedged_request_total` | counter | Each hedge attempt fired | | `erpc_network_hedge_discards_total` | counter | Losing hedge response cancelled | | `erpc_upstream_breaker_state_change_total` | counter | Circuit breaker state transition (Closed/Open/HalfOpen) | | `erpc_consensus_misbehavior_detected_total` | counter | Upstream response differed from consensus group | ### Source code entry points - [`erpc/network_executor.go:L183-L203`](https://github.com/erpc/erpc/blob/main/erpc/network_executor.go#L183-L203) — executor composition: consensus / retry / hedge / sweep wiring - [`common/match.go:L18-L78`](https://github.com/erpc/erpc/blob/main/common/match.go#L18-L78) — `SelectExecutor` 4-tier failsafe entry selection by (matchMethod, matchFinality) - [`common/defaults.go:L1793-L1832`](https://github.com/erpc/erpc/blob/main/common/defaults.go#L1793-L1832) — failsafe entry merge algorithm (network scope) - [`common/defaults.go:L1621-L1672`](https://github.com/erpc/erpc/blob/main/common/defaults.go#L1621-L1672) — failsafe entry merge algorithm (upstream scope) - [`erpc/network_executor.go:L68-L73`](https://github.com/erpc/erpc/blob/main/erpc/network_executor.go#L68-L73) — circuit breaker network-scope rejection at startup - [`upstream/upstream_executor.go:L46-L53`](https://github.com/erpc/erpc/blob/main/upstream/upstream_executor.go#L46-L53) — consensus upstream-scope rejection at startup - [`common/errors.go:L1573-L1584`](https://github.com/erpc/erpc/blob/main/common/errors.go#L1573-L1584) — `ErrFailsafeConfiguration` type (startup-only) ### Related pages - [Retry](/config/failsafe/retry.llms.txt) — rotate across upstreams with exponential backoff and jitter. - [Hedge](/config/failsafe/hedge.llms.txt) — speculative racing to cut tail latency. - [Timeout](/config/failsafe/timeout.llms.txt) — lifecycle-scoped and per-upstream time budgets. - [Circuit breaker](/config/failsafe/circuit-breaker.llms.txt) — automatic upstream quarantine with self-healing. - [Consensus](/config/failsafe/consensus.llms.txt) — multi-upstream agreement with misbehavior punishment. - [Integrity](/config/failsafe/integrity.llms.txt) — EVM response validation to discard stale or malformed data. - [Rate limiters](/config/rate-limiters.llms.txt) — cap hedge and retry cost on expensive vendors. - [Selection & scoring](/config/projects/selection-policies.llms.txt) — controls which upstream each failsafe attempt targets. - [Survive provider outages](/use-cases/survive-provider-outages.llms.txt) — the outcome this layer serves. --- ## Navigation (machine-readable surface) - Up: [All pages index](https://docs.erpc.cloud/llms.txt) - Root index of every page: [llms.txt](https://docs.erpc.cloud/llms.txt) · everything in one file: [llms-full.txt](https://docs.erpc.cloud/llms-full.txt) ### Child pages - [Circuit breaker](https://docs.erpc.cloud/config/failsafe/circuit-breaker.llms.txt) — When an upstream starts failing, eRPC stops sending it traffic automatically — and quietly brings it back once it recovers. - [Consensus](https://docs.erpc.cloud/config/failsafe/consensus.llms.txt) — Fan out every request to multiple providers simultaneously, agree on a single canonical answer, and automatically flag — or silence — the ones that lie. - [Hedge](https://docs.erpc.cloud/config/failsafe/hedge.llms.txt) — When a provider is having a slow moment, eRPC quietly races a backup request — your slowest responses simply disappear. - [Integrity checks](https://docs.erpc.cloud/config/failsafe/integrity.llms.txt) — eRPC silently discards stale or structurally broken upstream responses and retries on another provider — callers always get the correct answer. - [Retry](https://docs.erpc.cloud/config/failsafe/retry.llms.txt) — When a provider misbehaves, eRPC automatically rotates to the next one — and paces retries for missing data to match the chain's own block time. - [Timeout](https://docs.erpc.cloud/config/failsafe/timeout.llms.txt) — Give every request a hard latency budget — three nested layers keep stalled upstreams from tying up your connections indefinitely. ### Sibling pages - [Authentication](https://docs.erpc.cloud/config/auth.llms.txt) — Lock down every request with a token, JWT, wallet signature, or IP allowlist — and bind each identity to its own rate-limit budget. - [Example config](https://docs.erpc.cloud/config/example.llms.txt) — A production-ready starting point you can copy today, plus a complete annotated reference of every config section — caching, failover, hedging, rate limits, and observability included. - [Matcher syntax](https://docs.erpc.cloud/config/matcher.llms.txt) — One pattern engine everywhere — globs, boolean logic, and hex ranges that work identically across cache policies, failsafe rules, rate limits, method filters, and routing directives. - [Projects](https://docs.erpc.cloud/config/projects.llms.txt) — One eRPC, many tenants — each project gets its own networks, upstreams, auth, and budgets. - [Rate Limiters](https://docs.erpc.cloud/config/rate-limiters.llms.txt) — Stop a runaway caller or a misbehaving provider from affecting everyone else — eRPC applies independent request budgets at four layers and self-tunes outbound limits automatically. - [Server](https://docs.erpc.cloud/config/server.llms.txt) — eRPC's front door — dual-stack listeners, TLS/mTLS, a hard global timeout, gzip, drain-aware shutdown, and domain aliasing so any Host header routes to the right chain without touching a URL path.