Production guidelines
AIOpen as plain markdown for AIPractical recommendations for running eRPC in production — from container sizing through zero-downtime rollouts and instance identification.
What this page covers:
- Memory usage and Go GC tuning (
GOGC,GOMEMLIMIT) - Failsafe policies (retry, timeout, hedge)
- Caching database selection
- Horizontal scaling with shared state
- Explicit chain ID configuration
- Zero-downtime healthcheck rollout (Cilium/Envoy drain pattern)
- Custom response headers for instance identification
includeErrorDetailsin productiontrustedIPForwardersandtrustedIPHeadersbehind a load-balancer or CDN
Memory and GC tuning
The largest memory contributor in eRPC is the size of RPC responses. Common calls like eth_getBlockByNumber or eth_getTransactionReceipt are typically under 1 MB; heavy calls like debug_traceTransaction can reach 50 MB. Most deployments see ~256 MB RSS at modest load.
Start with a generous limit (e.g. 16 GB) while routing real traffic, then lower it once you know your p99 working set.
To prevent OOM-kills on Kubernetes, add both env vars to your container spec:
# Trigger GC when heap grows by 30 % (default is 100 %)
GOGC=30
# Trigger GC when RSS approaches 2 GiB — tune to ~80 % of your container memory limit
# WARNING: set this too low and GC will thrash; combine with GOGC for best results
GOMEMLIMIT=2GiBExample Docker run:
docker run -e GOGC=30 -e GOMEMLIMIT=2GiB ghcr.io/erpc/erpc:latest \
erpc start -c /etc/erpc/erpc.yamlKubernetes container spec snippet:
env:
- name: GOGC
value: "30"
- name: GOMEMLIMIT
value: "2GiB"
resources:
limits:
memory: "2.5Gi"
requests:
memory: "512Mi"Failsafe policies
Configure retry at both network and upstream scopes:
- Network-level retry rotates to a different upstream on a transient failure. Even with a single upstream it's worth enabling. Set
maxAttempts≈ number of upstreams. - Upstream-level retry covers per-attempt flakiness within the same upstream. Use 2–5
maxAttempts.
Set a timeout that matches your request profile. For standard EVM calls a 3s default is safe; for heavy trace or getLogs calls allow 10 s or more. Set quantile: 0.99 on the upstream-scope timeout to auto-tune per method.
Enable the hedge policy for latency-sensitive reads. With delay: 500ms, eRPC races a second upstream once the primary has been quiet for 500 ms and returns the first kept response — at the cost of duplicate traffic for slow requests. Hedge attempts are excluded from per-upstream scoring and from the circuit breaker.
Use consensus for high-trust reads (gas price, nonce, contract calls during write paths). Set maxWaitOnResult to bound tail latency when one participant lags.
Execution trace headers (X-ERPC-Upstreams-Tried, X-ERPC-Upstreams-Outcomes, X-ERPC-Upstreams-Reasons, X-ERPC-Upstreams-Durations-Ms, X-ERPC-Upstreams-Flags) ship by default — clients can debug retry/hedge/consensus decisions without server-side traces. Disable with server.executionHeaders: off if you want zero diagnostic leakage.
Caching database
Large read-heavy workloads (e.g. indexing 100 M Arbitrum blocks) require substantial cache storage. Start with Redis; switch to PostgreSQL when cached data exceeds available memory.
eRPC degrades gracefully if the cache backend is unavailable — it falls back to live upstream calls with no impact on availability.
See Database for connector configuration. eRPC Cloud offers the most cost-effective caching for multi-tenant deployments.
Horizontal scaling
Run multiple eRPC replicas with a shared Redis connector to synchronize latest/finalized block numbers across instances. Without shared state, each replica polls independently, increasing upstream requests.
See Shared State. Even when Redis is temporarily unavailable, eRPC continues serving requests using local state tracking.
Explicitly configure chain ID
Auto-detected chain IDs add one upstream call per network at startup and slow rolling restarts. Configure them explicitly:
Healthcheck and zero-downtime rollout
Configure a Healthcheck readiness probe so your orchestrator stops routing to a pod before it shuts down.
Cilium / Envoy drain pattern
When using Cilium with Envoy (Ingress or Gateway API), set both shutdown wait fields to 30 s:
server: waitBeforeShutdown: 30s # pod marked draining; readiness probe fails waitAfterShutdown: 30s # process stays alive until Envoy drains its connectionsShorter values allow Envoy to reuse a connection after the listener closes, or route to a pod that has already exited. Adjust to match your own probe intervals.
Custom response headers
Use server.responseHeaders to stamp every HTTP response with instance metadata for quick debugging without opening a trace:
server: responseHeaders: X-ERPC-Region: ${FLY_REGION} # Fly.io region X-ERPC-Machine: ${FLY_MACHINE_ID} # Fly.io machine ID # Kubernetes: # X-ERPC-Pod: ${HOSTNAME}Headers with empty values (after env-var expansion) are automatically omitted. Combine with custom trace attributes for full observability.
Error detail visibility
By default eRPC includes upstream error details in responses. In production, set includeErrorDetails: false to avoid leaking internal endpoint URLs, API key fragments, or upstream error messages to end-users:
server: includeErrorDetails: falseTrusted IP forwarding
When eRPC runs behind a load-balancer or CDN, the real client IP is in a forwarded header. Configure trustedIPForwarders (CIDR ranges of your LB/CDN) and trustedIPHeaders (the header name to read):
server: trustedIPForwarders: - "10.0.0.0/8" # cluster-internal LB CIDR - "172.16.0.0/12" trustedIPHeaders: - "X-Forwarded-For" - "CF-Connecting-IP" # CloudflareWithout this, IP-based rate limits and network auth strategies see the LB address rather than the real client.
Copy for your AI assistant — full production guide referenceExpand for every option, default, and edge case — or copy this entire section into your AI assistant.
Memory / GC tuning
eRPC is a Go process. The runtime's default GC target (GOGC=100) is appropriate for development but often too loose for containers with hard memory limits.
Recommended production pair:
GOGC=30 # run GC after heap grows 30 % — smaller heap, more frequent collections
GOMEMLIMIT=2GiB # soft ceiling — GC fires when RSS nears this valueSet GOMEMLIMIT to ~80 % of your container memory limit. For example: 2 GiB limit → GOMEMLIMIT=1600MiB. Setting it equal to the limit leaves no headroom and risks GC thrash or OOM from transient allocation bursts.
Caution: GOGC < 10 causes GC thrashing — the runtime spends most CPU collecting, not serving requests. Values of 20–50 are the practical floor.
If you have abundant RAM and want to reduce CPU overhead, raise GOGC (e.g. 200). The heap will grow larger but GC runs less often.
Healthcheck rollout pattern
eRPC's shutdown sequence:
- Receive SIGTERM.
- Stop accepting new connections (
waitBeforeShutdowndelay — readiness probe starts failing). - Drain in-flight requests.
- Wait
waitAfterShutdown(keeps the process alive so the LB/proxy can close open connections). - Exit 0.
For Kubernetes with Cilium/Envoy, both values should be at least 30 s:
server:
waitBeforeShutdown: 30s
waitAfterShutdown: 30sThe readiness probe should return unhealthy within 10 s of SIGTERM (before waitBeforeShutdown expires) so the orchestrator removes the endpoint before connections are refused.
Kubernetes terminationGracePeriodSeconds must be greater than waitBeforeShutdown + waitAfterShutdown + time to drain. Set it to at least 90 s for the 30 s + 30 s pattern above.
responseHeaders for instance identification
server.responseHeaders is a map of header name → value. Values support \${VAR} env-var expansion. Headers with an empty value after expansion are silently omitted (safe to use with optional env vars).
Useful headers:
| Header | Env var | Platform |
|---|---|---|
X-ERPC-Region | \${FLY_REGION} | Fly.io |
X-ERPC-Machine | \${FLY_MACHINE_ID} | Fly.io |
X-ERPC-Pod | \${HOSTNAME} | Kubernetes (pod name) |
X-ERPC-Instance | \${INSTANCE_ID} | explicit / custom |
Combine with tracing resource attributes (tracing.resourceAttributes) so every trace span carries the same instance label as the HTTP response header.
includeErrorDetails
Controls whether upstream error messages and internal endpoint information appear in JSON-RPC error responses returned to callers.
- Default:
true(errors are verbose — helpful for development). - Production: set to
falseto prevent leaking upstream URLs, API key fragments, and internal error strings.
Errors are still logged internally at full verbosity regardless of this setting.
trustedIPForwarders + trustedIPHeaders
When eRPC sits behind a reverse proxy, load-balancer, or CDN, the TCP source IP is always the proxy's address. To recover the real client IP:
trustedIPForwarders— list of CIDR blocks (or individual IPs) whoseX-Forwarded-For(or the named headers) are trusted. Requests from outside these ranges have their forwarded headers ignored.trustedIPHeaders— ordered list of headers to read. eRPC picks the first header that is present on a request from a trusted forwarder.
This real IP is then used for:
- IP-based rate limiting (
networkauth strategyallowedIPs) - Per-IP metric labels
- Any upstream selection that keys on client IP
Without this config, all requests appear to originate from your LB IP and IP-based policies are effectively global.
Metrics tuning
See Monitoring for metrics.histogramDropLabels — dropping high-cardinality label combinations (e.g. per-upstream request-size histograms) avoids cardinality explosion in Prometheus.
Tracing in production
See Tracing for OTLP exporter setup, sampling rate config, and adding custom resource attributes. Use tracing.resourceAttributes to attach region/instance labels that correlate with the responseHeaders you set above.
Rate-limit budgets per project / upstream
See Rate Limiters for rateLimiters.budgets — define per-project or per-upstream budgets and reference them from auth strategies (per-API-key limits) or directly from upstream config (cap upstream call rate to protect a paid plan).
Common pitfalls
GOMEMLIMITwithoutGOGC— the runtime relies solely on the soft limit, leading to large heap swings just under the ceiling. Always pair them.GOGC=100with a tight container limit — the heap can double in size before GC fires. A container with a 512 MiB limit can OOM before GC triggers.waitBeforeShutdowntoo short — load-balancers / service meshes can take several seconds to drain an endpoint after a readiness probe fails. Values below 10 s risk connection resets on rolling restarts with Envoy.waitAfterShutdowntoo short — if the process exits before the proxy finishes draining, in-flight requests to that pod are reset. 30 s is a safe default.terminationGracePeriodSecondstoo short — Kubernetes SIGKILL fires when this expires. It must exceedwaitBeforeShutdown + waitAfterShutdown + expected drain time.includeErrorDetails: truein production — upstream error messages often contain full endpoint URLs with API keys embedded. Set tofalsebefore exposing eRPC to external callers.- Missing
trustedIPForwarders— IP-based rate limits and auth policies all see the LB IP, effectively becoming global instead of per-client. - Chain ID auto-detection in large deployments — every eRPC replica calls
eth_chainIdon every upstream at startup. With many replicas and many upstreams this creates a startup burst. Configuringevm.chainIdexplicitly eliminates it.
Append .llms.txt to this URL (or use the AI link above) to fetch the entire expanded reference as plain markdown for an AI assistant.