Timeout policy
AIOpen as plain markdown for AIThe timeout policy puts a ceiling on how long eRPC waits for a result. It lives at two places: on the network (wraps the entire request lifecycle including all retries and failover across upstreams) and on each upstream (bounds a single attempt against a single endpoint). Too short a timeout produces false failures; too long a timeout means bad tail latency propagates to callers.
Full configuration
The two forms below show a network-level timeout and an upstream-level timeout side by side. Both use the object form of duration (an AdaptiveDuration) to enable quantile-adaptive behavior.
projects: - id: main networks: - architecture: evm evm: chainId: 1 failsafe: - matchMethod: '*' timeout: duration: base: 5s # static floor — always wait at least this long quantile: 0.99 # add observed p99 latency on top of base min: 500ms # floor for the adaptive component (cold-start guard) max: 30s # ceiling — never wait longer than this total - matchMethod: 'trace_*|debug_*' timeout: duration: 60s # scalar shorthand: just a fixed base, no quantile upstreams: - id: my-node endpoint: https://rpc.example.com failsafe: - matchMethod: '*' timeout: duration: base: 0s # no fixed floor; rely entirely on the quantile quantile: 0.95 # timeout at p95 of this upstream's observed latency min: 200ms # never fire before 200ms (protects fast cache hits) max: 10s # ceiling per attempt - matchMethod: 'eth_getLogs' timeout: duration: 25s # getLogs can be slow — fixed ceiling, no adaptationThe scalar shorthand duration: 30s is equivalent to duration: { base: '30s' }. It sets only the base field and leaves all other AdaptiveDuration fields unset (no quantile adaptation).
How it works
Fixed mode. When duration is a scalar or an object with only base set (no quantile), the timeout is a hard constant. On the network, that constant bounds the entire lifecycle — the request is cancelled and an error is returned to the caller if any combination of upstream attempts + retries + hedges hasn't resolved by then. On an upstream, it bounds one attempt; if that attempt times out, the upstream's retry or the network's failover can still try elsewhere.
Dynamic (quantile-adaptive) mode. When quantile is set, the effective timeout is computed on every request as:
effective = clamp(base + quantile_value, min, max)where quantile_value is the rolling latency percentile for that specific (upstream, method) pair. The base offset lets you add a constant buffer on top of the percentile — for example base: 500ms, quantile: 0.95 means "fire at p95 + 500 ms". When only quantile is set with no base, the timeout is purely driven by observed latency.
Cold start. Before any latency samples exist for a (upstream, method) pair, the quantile tracker returns zero. In that case the adaptive component falls back to min (if set) so the request isn't immediately killed with a near-zero timeout. The effective timeout on cold start is therefore clamp(base + min, min, max).
Per-method, per-upstream tracking. Each (upstream, method) pair maintains its own latency histogram independently. A quantile timeout on eth_call won't be influenced by the latency profile of eth_getLogs. If you set a quantile timeout at the network level, note that the network has no single "upstream" — the latency tracked there is end-to-end wall time across whatever upstreams were used for that method.
Network vs upstream interaction. The network timeout is the outer boundary; upstream timeouts are inner boundaries on individual attempts. If you configure both, the upstream timeout fires first (cancels the attempt), then the network's retry or hedge can try the next upstream. The network timeout fires if the whole sequence hasn't resolved in time. A common misconfiguration is setting the network timeout too short relative to the upstream timeout times the number of retry attempts — this silently kills the retry budget.
What happens on timeout. An upstream timeout classifies the attempt as a retryable error (same as a transport failure). The network's retry policy and selection policy can then route to a different upstream. A network timeout cancels all in-flight attempts and returns an error to the caller; no further retries happen.
Defaults
| Field | Default | Notes |
|---|---|---|
duration (network) | 120s (static) | Applied when no timeout is configured on the network's failsafe entry. |
duration (upstream) | 60s (static) | Applied when no timeout is configured on the upstream's failsafe entry. |
base | unset | Zero offset when using the object form without a base. |
quantile | unset | Quantile adaptation is off unless you set this. |
min | unset | No floor unless specified. On cold start with quantile set and no min, falls back to zero — requests can timeout almost instantly. |
max | unset | No ceiling unless specified. |
When quantile is set and neither min nor base is set, the cold-start timeout is effectively zero until at least one latency sample exists. Always set min or base when using quantile mode.
Gotchas
- Network timeout shorter than
upstream.timeout × maxAttempts. If the upstream is configured with a 10 s timeout andretry.maxAttempts: 3, the worst-case upstream budget is 30 s. A network timeout of 15 s will cut that short, dropping the third attempt before it can complete. Set the network timeout to at leastupstream.timeout.max × retry.maxAttempts— or accept the tradeoff explicitly.
Network timeout ≥ upstream.timeout × retry.maxAttempts. This is the most common timeout misconfiguration and the hardest to diagnose because it manifests as intermittent failures under load rather than consistent errors.
-
quantilealone withoutbaseormin. A bare{ quantile: 0.99 }with nobaseand nominworks correctly at steady state but will timeout almost immediately on the very first few requests of a cold process. Always pair with at leastminorbase. -
basealone (scalar or object) is not adaptive. If you writeduration: { base: 30s }there is no quantile adaptation — it's identical to the scalarduration: 30s. The quantile adaptation only engages whenquantile > 0. -
mintoo low on fast upstreams. If an upstream usually responds in 5 ms (e.g., it's cache-hitting at the RPC provider) and you setmin: 10ms, the quantile will compress toward that minimum and any request that misses the cache (200 ms+) will timeout. Setminto a value that accommodates both the fast and slow paths for the upstream — or don't setminand let the quantile find its own floor. -
Heavy methods need their own entry.
trace_*,debug_*,eth_getLogsover large block ranges can take 10–60 s on a lightly loaded archive node. A catch-allmatchMethod: '*'entry with a 5 s timeout will reject every one of those. Add a dedicated entry before the wildcard entry (first match wins):failsafe: - matchMethod: 'trace_*|debug_*' timeout: { duration: 120s } - matchMethod: 'eth_getLogs' timeout: { duration: 30s } - matchMethod: '*' timeout: { duration: 5s } -
Timeout doesn't disable retry. A timeout fires on an attempt; if the network or upstream retry policy allows another attempt, it will happen. Set
retry.maxAttempts: 1on the same failsafe entry to get "one shot, then give up" behavior. -
duration: nulldisables the timeout entirely. This is valid if you want to inherit only the retry policy from a failsafe entry. Without any timeout the request will hang until the upstream closes the connection or the caller disconnects.
Metrics
erpc_network_timeout_duration_seconds is a histogram of the dynamically computed effective timeout per request, labeled by method. This metric is only populated in quantile mode — fixed timeouts don't emit it because there's nothing dynamic to observe.
# P99 effective timeout per method (last 5 min)
histogram_quantile(0.99,
sum by (method, le) (
rate(erpc_network_timeout_duration_seconds_bucket[5m])
)
)
# Alert when p50 effective timeout drops below 500ms (possible cold-start or config problem)
histogram_quantile(0.50,
sum by (method, le) (
rate(erpc_network_timeout_duration_seconds_bucket[5m])
)
) < 0.5See also
- Failsafe overview — scoping rules, finality states, where each policy is valid
- Retry — composes with timeout; timeout fires per attempt, retry decides whether to try again
- Hedge — speculative parallel copies when a single upstream is slow; pairs well with a tight network timeout
Copy for your AI assistant — timeout referenceExpand for every option, default, and edge case — or copy this entire section into your AI assistant.
TimeoutPolicyConfig — every field
| Field | Type | Default | Notes |
|---|---|---|---|
duration | Duration | AdaptiveDuration | none (system default applied) | The timeout spec. Accepts a scalar string ("30s") or an object { base, quantile, min, max }. The scalar sets base only; no quantile adaptation. |
AdaptiveDuration — object form fields (when duration is an object)
| Field | Type | Default | Notes |
|---|---|---|---|
base | Duration | 0 | Static base added to the adaptive component. Scalar shorthand (duration: "30s") sets only this field. |
quantile | float64 | unset | Latency percentile (0 < q < 1). When set, the observed p at that quantile of (upstream, method) latency is added to base. 0.99 is typical; 0.95 for tighter tails. Requires base or max to be set (validation error otherwise). |
min | Duration | unset | Floor for the base + adaptive result. Also used as the cold-start fallback adaptive value when quantile > 0 and no samples exist yet. |
max | Duration | unset | Ceiling for the base + adaptive result. When quantile is set and base/duration is omitted, acts as the cold-start fallback. |
Resolution formula (when quantile > 0):
adaptive = quantile_value_from_histogram (or min if no samples yet)
effective = clamp(base + adaptive, min, max)When quantile == 0: effective = base exactly (no clamping applied).
Legacy flat form (still accepted)
The pre-AdaptiveDuration wire format { duration, quantile, minDuration, maxDuration } is still accepted and silently folded into the new object form at parse time:
# Legacy — still works
timeout:
duration: 5s
quantile: 0.99
minDuration: 200ms
maxDuration: 30s
# Equivalent new form
timeout:
duration:
base: 5s
quantile: 0.99
min: 200ms
max: 30sPrefer the new object form in new configs. The flat form emits a deprecation notice in debug logs.
Where timeout is valid
| Level | Effect |
|---|---|
projects[].networks[].failsafe[] | Bounds the entire request lifecycle: all upstream attempts, retries, and hedges. The outer hard limit. |
projects[].upstreams[].failsafe[] | Bounds a single attempt against one upstream. Does not stop the network from retrying or hedging on another upstream. |
Interaction with other policies
- Retry: timeout fires per attempt. If the attempt times out and
retry.maxAttempts > 1, the retry policy can start another attempt (on a different upstream at the network level; same upstream at the upstream level). The network timeout is still the outer bound — once it fires, no more attempts happen. - Hedge: a hedge spawned after the hedge delay gets its own upstream-level timeout (if configured). The network timeout covers the whole hedge fan-out. If the network timeout fires before any hedge or primary resolves, all in-flight requests are cancelled.
- Circuit breaker: a timed-out attempt increments the circuit breaker's failure counter for that upstream, same as any other failed attempt.