# Survive provider outages

> Source: https://docs.erpc.cloud/use-cases/survive-provider-outages
> Keep serving traffic when an RPC provider slows down, rate-limits you, or disappears entirely.
> Format: machine-readable markdown export of the docs page above.
> All collapsible AI sections are inlined and fully expanded.

# Survive provider outages

Every RPC provider has a bad day. With eRPC in front, your users never find out:
slow answers get raced against a second provider, errors fail over automatically,
and an upstream that keeps misbehaving is quietly benched until it recovers.
You ship one endpoint; eRPC turns a pool of imperfect providers into something
that behaves like a perfect one.

- **[Retry](/config/failsafe/retry.llms.txt)** — Errors fail over to the next-best upstream, automatically.
- **[Hedge](/config/failsafe/hedge.llms.txt)** — Slow request? A backup race starts before the user notices.
- **[Timeout](/config/failsafe/timeout.llms.txt)** — Nothing hangs — every request has a hard ceiling.
- **[Circuit breaker](/config/failsafe/circuit-breaker.llms.txt)** — Repeat offenders get benched until they behave again.
- **[Cordoning](/operation/cordoning.llms.txt)** — See why an upstream was benched, or bench one yourself.
- **[Healthcheck](/operation/healthcheck.llms.txt)** — Tell your load balancer the truth about readiness.

All of the above in one place — illustrative, not a tuned production config:

**Config path:** `projects[]`

**YAML — `erpc.yaml`:**

```yaml
projects:
  - id: main
    # applies to every chain in this project
    networkDefaults:
      failsafe:
        - matchMethod: "*"
          # nothing hangs: hard ceiling per request
          timeout:
            duration: 30s
          # errors fail over to the next-best upstream
          retry:
            maxAttempts: 3
          # slow answers get raced at their p70 latency
          hedge:
            delay: { quantile: 0.7, min: 100ms, max: 2s }
            maxCount: 1
    upstreamDefaults:
      failsafe:
        - matchMethod: "*"
          # one in-place retry per upstream before rotating away
          retry:
            maxAttempts: 1
          # bench repeat offenders, probe again after 5m
          circuitBreaker:
            failureThresholdCount: 20
            failureThresholdCapacity: 80
            halfOpenAfter: 5m
            successThresholdCount: 8
# Cordoning is automatic (no config); /healthcheck reflects readiness.
```

**TypeScript — `erpc.ts`:**

```typescript
projects: [{
  id: "main",
  // applies to every chain in this project
  networkDefaults: {
    failsafe: [{
      matchMethod: "*",
      // nothing hangs: hard ceiling per request
      timeout: { duration: "30s" },
      // errors fail over to the next-best upstream
      retry: { maxAttempts: 3 },
      // slow answers get raced at their p70 latency
      hedge: { delay: { quantile: 0.7, min: "100ms", max: "2s" }, maxCount: 1 },
    }],
  },
  upstreamDefaults: {
    failsafe: [{
      matchMethod: "*",
      // one in-place retry per upstream before rotating away
      retry: { maxAttempts: 1 },
      // bench repeat offenders, probe again after 5m
      circuitBreaker: {
        failureThresholdCount: 20,
        failureThresholdCapacity: 80,
        halfOpenAfter: "5m",
        successThresholdCount: 8,
      },
    }],
  },
}]
```

## Agent reference

Copy one of these prompts into your AI agent session (Claude Code, Cursor, …) — each one
points the agent at this page's machine-readable reference so it can do the work correctly:

**Prompt Example #1: make my RPC layer survive provider outages**

```text
I want my app to keep serving traffic even when an RPC provider slows down,
rate-limits me, or goes offline. Configure eRPC in my eRPC config with retry,
hedge, timeout, and circuit breaker so outages are transparent to my users.
Read the reference and follow the child-page links inside it:
https://docs.erpc.cloud/use-cases/survive-provider-outages.llms.txt
```

**Prompt Example #2: add circuit breaking to bench a flaky provider automatically**

```text
One of my RPC providers is intermittently returning errors and degrading my
app. Add a circuit breaker to my eRPC config so eRPC benches that upstream after
repeated failures and probes it again once it recovers, without any manual
intervention. Reference:
https://docs.erpc.cloud/use-cases/survive-provider-outages.llms.txt
```

**Prompt Example #3: tune failover speed to minimize user-visible errors**

```text
My eRPC config in my eRPC config already has retry and hedge configured but
users still see occasional errors during provider incidents. Review the
timeout, retry backoff, and hedge delay settings and tighten them so failover
is as fast as possible without excessive upstream cost. Reference:
https://docs.erpc.cloud/use-cases/survive-provider-outages.llms.txt
```

---

### Survive outages — agent starting points

This page sells the outcome; implementation lives in the feature pages. Fetch their
machine-readable companions and combine policies per method:

- [Retry](/config/failsafe/retry.llms.txt) — scopes (network vs upstream), empty-result handling, backoff math.
- [Hedge](/config/failsafe/hedge.llms.txt) — fixed vs quantile delays, clamps, loser cancellation.
- [Timeout](/config/failsafe/timeout.llms.txt) — the three-level timeout hierarchy and dynamic quantile mode.
- [Circuit breaker](/config/failsafe/circuit-breaker.llms.txt) — thresholds, half-open probing, and how it complements selection-policy exclusion (cordoning is a third, independent mechanism).
- [Cordoning](/operation/cordoning.llms.txt) — every cordon reason and how to inspect/override it.
- [Healthcheck](/operation/healthcheck.llms.txt) — eval strategies and load-balancer integration.

Composition rules an agent must know: failsafe entries are matched per request via
`matchMethod`/`matchFinality`; at network scope the executor chain nests
timeout(consensus(retry(hedge(upstreams)))); retries at network scope rotate across
upstreams while upstream-scope retries re-attempt the same one. Full ordering and
per-policy field tables are in each feature page's agent section.

---


## Navigation (machine-readable surface)

- Up: [All pages index](https://docs.erpc.cloud/llms.txt)
- Root index of every page: [llms.txt](https://docs.erpc.cloud/llms.txt) · everything in one file: [llms-full.txt](https://docs.erpc.cloud/llms-full.txt)

### Sibling pages

- [Cut RPC cost & latency](https://docs.erpc.cloud/use-cases/cut-costs-and-latency.llms.txt) — Serve repeated questions from cache, deduplicate identical requests, and stop paying providers for the same answer twice.
- [How eRPC works](https://docs.erpc.cloud/use-cases/how-it-works.llms.txt) — Every JSON-RPC call travels a battle-tested pipeline — auth, smart caching, parallel hedging, multi-upstream consensus — and arrives with full diagnostic headers. Zero glue code required.
- [Lock it down](https://docs.erpc.cloud/use-cases/lock-it-down.llms.txt) — Keys, JWTs, sign-in with Ethereum, per-user rate limits — your RPC endpoint stops being a free-for-all.
- [Scale chains & providers](https://docs.erpc.cloud/use-cases/scale-chains-and-providers.llms.txt) — One config line per provider, every chain they support — and the best upstream wins each request.
- [See everything](https://docs.erpc.cloud/use-cases/see-everything.llms.txt) — Per-request metrics, traces, and honest healthchecks — know about problems before your users do.
- [Trust the data](https://docs.erpc.cloud/use-cases/trust-the-data.llms.txt) — Don't let one misbehaving node feed your app a wrong answer — verify, cross-check, and enforce integrity automatically.