# Shared state > Source: https://docs.erpc.cloud/config/database/shared-state > Give every pod in your fleet the same real-time view of each upstream's block height — routing stays consistent as you scale out, with zero added latency on the request path. > Format: machine-readable markdown export of the docs page above. > All collapsible AI sections are inlined and fully expanded. # Shared state Scale eRPC horizontally and every pod independently tracks upstream block heights. Without coordination, one pod may route a request to a lagging upstream while another pod wouldn't. Shared state fixes this: a background pipeline publishes every new block observation to a common store and fans updates out to every peer, so all pods route from the same global picture. The request path never waits — reads stay in-process and the network round-trip is always off the critical path. ## Quick taste Illustrative, not a tuned production config — connect all pods to one Redis instance: **Config path:** `database.sharedState` **YAML — `erpc.yaml`:** ```yaml database: sharedState: # unique key per deployment — prevents cross-pod contamination if two clusters share one Redis clusterKey: "prod-main" connector: # redis gives sub-millisecond cross-pod propagation via pub/sub driver: redis redis: uri: "redis://:secret@redis.internal:6379/?pool_size=10" ``` **TypeScript — `erpc.ts`:** ```typescript database: { sharedState: { // unique key per deployment — prevents cross-pod contamination if two clusters share one Redis clusterKey: "prod-main", connector: { // redis gives sub-millisecond cross-pod propagation via pub/sub driver: "redis", redis: { uri: "redis://:secret@redis.internal:6379/?pool_size=10" }, }, }, } ``` ## Agent reference Copy one of these prompts into your AI agent session (Claude Code, Cursor, …) — each one points the agent at this page's machine-readable reference so it can do the work correctly: **Prompt Example #1: add shared state to a multi-pod deployment** ```text I'm running eRPC on multiple pods and they each independently track upstream block heights, causing inconsistent routing. Set up database.sharedState in my eRPC config using Redis so all pods share the same block-number view. Set a unique clusterKey for my deployment and leave the foreground latency defaults unless there's a strong reason to change them. Read the full reference first: https://docs.erpc.cloud/config/database/shared-state.llms.txt ``` **Prompt Example #2: tune foreground latency for co-located Redis** ```text My eRPC pods use shared state with Redis on the same host (sub-millisecond RTT). Audit my eRPC config sharedState settings — specifically updateMaxWait, lockMaxWait, fallbackTimeout, and lockTtl — and tighten the foreground latency budget without breaking the lockTtl > fallbackTimeout constraint. Reference: https://docs.erpc.cloud/config/database/shared-state.llms.txt ``` **Prompt Example #3: debug inconsistent block routing across pods** ```text Some of my eRPC pods are routing requests to lagging upstreams even though other pods see a higher block. I have sharedState configured in my eRPC config. Walk me through what Prometheus metrics and log messages to check to confirm cross-pod propagation is working, and whether my clusterKey is set correctly so pods aren't cross-contaminating each other. Reference: https://docs.erpc.cloud/config/database/shared-state.llms.txt ``` **Prompt Example #4: migrate from DynamoDB to Redis for shared state** ```text Our eRPC shared state currently uses DynamoDB and we're seeing ~5s cross-pod convergence lag. Migrate the database.sharedState connector in my eRPC config to Redis for sub-millisecond pubsub propagation. Keep the clusterKey unchanged so existing counter state is preserved at the new connector. Reference: https://docs.erpc.cloud/config/database/shared-state.llms.txt ``` **Prompt Example #5: run two independent clusters on one Redis** ```text I have a staging and a production eRPC deployment both pointing at the same Redis instance. Ensure their shared-state counters don't interfere with each other by setting distinct clusterKey values in each my eRPC config, and explain what happens if I accidentally leave both at the default. Reference: https://docs.erpc.cloud/config/database/shared-state.llms.txt ``` --- ### Shared state — full agent reference ### How it works **Registry and key space.** `NewSharedStateRegistry` wraps one `Connector` and a `clusterKey` prefix. Every counter is stored under `//` — for example `prod-main/latestBlock/projectX/evm/1/alchemy`. The registry holds a `sync.Map` of live `counterInt64` instances; calling `GetCounterInt64` with the same key always returns the same object (idempotent `LoadOrStore`). On first call, two async bootstrap tasks start: one seeds the local counter with a single remote `Get`, and one calls `WatchCounterInt64` to open a long-lived change stream from the connector. **Write path.** `TryUpdate` is synchronous and local-only — it atomically advances the in-memory counter via a CAS loop and then schedules a background push without touching the network. `scheduleBackgroundPushCurrent` coalesces concurrent pushes: a single goroutine per counter (1) publishes the current state via `connector.PublishCounterInt64` for fast peer notification, (2) acquires the distributed lock within `lockMaxWait`, (3) reads remote state and adopts it if the remote value is newer, then writes local state if local is newer. If the lock cannot be acquired within `lockMaxWait`, the goroutine falls back to publish-only propagation and reschedules if another update arrived while it was running. **Foreground latency budget.** `TryUpdateIfStale` is used by the EVM state poller to debounce expensive RPC calls. It launches the upstream RPC in a background goroutine and waits at most `updateMaxWait` for the result. If the RPC finishes within that window the fresh value is returned synchronously; otherwise the stale local value is returned immediately and the RPC completes asynchronously. The foreground path never acquires the distributed lock. **Rollback handling.** All counter updates enforce a rollback policy. `latestBlock`, `finalizedBlock`, and `servedLatestBlock`/`servedFinalizedBlock` use `ignoreRollbackOf = 1024` — decreases up to 1024 blocks are silently discarded as noise from lagging upstreams; larger gaps (genuine reorgs or provider resets) are accepted and trigger the `OnLargeRollback` callback. `earliestBlock` uses `ignoreRollbackOf = 0` — all decreases are accepted because node pruning genuinely shifts the historical anchor. **Timestamp vector clock and conflict resolution.** Each counter state is serialised as `{"v":,"t":,"b":""}`. The `t` field doubles as both a vector clock for conflict resolution and a staleness indicator. `allocateUpdatedAtMs` ensures local timestamps are strictly monotonic even under concurrent goroutines (a CAS spin-loop). Remote state is only adopted when `st.UpdatedAt > currentTs`, providing idempotency and protection against out-of-order network delivery. When a remote update has a newer timestamp but a value that would imply a small rollback (within `ignoreRollbackOf`), `advanceTimestampPast` is called to advance the local timestamp past the remote's so the local (higher) value wins the next reconciliation cycle. **Transport differences by connector.** All five connectors implement the same `GetCounterInt64` / `PublishCounterInt64` / `WatchCounterInt64` interface; propagation latency differs significantly: - **Redis** — push via `PUBLISH counter:` (e.g. `counter:prod-main/latestBlock/...`) and subscribe via a single shared `PSubscribe("counter:*")` connection that matches all counters regardless of cluster; subscribers filter by exact key inside `runMessageLoop`. Reconnects automatically with exponential backoff (1 s → 30 s with jitter) and a 5-minute periodic poll fallback; `pollAllKeys()` is triggered after each reconnect to close the gap during the reconnection window. Subscriber channels are buffered (size 1, drop-on-full) so slow consumers never block the message loop. Source: [`data/redis_pubsub_manager.go:L155-460`](https://github.com/erpc/erpc/blob/main/data/redis_pubsub_manager.go#L155-L460), [`data/redis.go:L591`](https://github.com/erpc/erpc/blob/main/data/redis.go#L591) - **PostgreSQL** — push via `pg_notify` using a channel name derived by `sanitizeChannelName(fmt.Sprintf("counter_%s", key))` to strip characters illegal in PostgreSQL NOTIFY channel names; LISTEN-based subscription with a 30 s polling fallback. Source: [`data/postgresql.go:L620-718`](https://github.com/erpc/erpc/blob/main/data/postgresql.go#L620-L718) - **DynamoDB** — no push primitive. `PublishCounterInt64` is a documented no-op; propagation relies on polling at `statePollInterval` (default 5 s). Cross-pod convergence latency is bounded by the poll interval, not a push event. Source: [`data/dynamodb.go:L723-829`](https://github.com/erpc/erpc/blob/main/data/dynamodb.go#L723-L829) - **Memory** — both `WatchCounterInt64` and `PublishCounterInt64` are no-ops; in-process only, no cross-pod coordination. Source: [`data/memory.go:L195-207`](https://github.com/erpc/erpc/blob/main/data/memory.go#L195-L207) - **gRPC** — both `WatchCounterInt64` and `PublishCounterInt64` return an error; the gRPC connector cannot participate in shared state. **Deadlock-prevention invariant.** The foreground paths (`TryUpdate` and `TryUpdateIfStale`) never acquire the distributed lock. Only the background `scheduleBackgroundPushCurrent` goroutine acquires it, and only after any local update is complete. This strict lock-ordering rule was introduced to fix a deadlock where the old design could have `scheduleBackgroundPushCurrent` hold the distributed lock and wait for `updateMu`, while `TryUpdateIfStale` held `updateMu` and waited for the distributed lock. The test `TestLockOrderSimulation_OldBuggyBehavior` documents the old behavior. **Counter families.** Four families of counters are materialised during normal operation, all stored under `//<...>` in the backing store: 1. `latestBlock////` — latest block per upstream (`ignoreRollbackOf=1024`). 2. `finalizedBlock////` — finalized block per upstream (`ignoreRollbackOf=1024`). 3. `earliestBlock/////` — earliest available block per upstream per probe type (`ignoreRollbackOf=0`; all decreases accepted because node pruning genuinely shifts the historical anchor). 4. `servedLatestBlock//>` and `servedFinalizedBlock//>` — cross-pod monotonic served-tip per tag-group. The per-upstream key component is derived via `common.UniqueUpstreamKey(up)` which hashes the upstream's project/network/upstream triple to prevent collisions between upstreams with the same ID in different networks or projects. Source: [`common/upstream.go:L62-65`](https://github.com/erpc/erpc/blob/main/common/upstream.go#L62-L65) **Instance ID.** Each counter state carries an `UpdatedBy` field (`"b"` in JSON) identifying which pod wrote it. It is resolved from the first non-empty value of: env `INSTANCE_ID` → env `POD_NAME` → env `HOSTNAME` → `os.Hostname()` → `"unknown"`. This is diagnostic-only and does not affect routing. Source: [`data/shared_state_registry.go:L82-98`](https://github.com/erpc/erpc/blob/main/data/shared_state_registry.go#L82-L98) **SuggestLatestBlock fast path.** When the eRPC proxy observes a block number in an upstream response, it immediately calls `SuggestLatestBlock(blockNumber)` which calls `TryUpdate` if the new value is greater than the current local value. This avoids a full polling round-trip and immediately triggers background propagation, advancing the shared counter without waiting for the next polling tick. Source: [`architecture/evm/evm_state_poller.go:L459-481`](https://github.com/erpc/erpc/blob/main/architecture/evm/evm_state_poller.go#L459-L481) **IsStale semantics.** `IsStale(d)` returns `true` if `updatedAtUnixMs <= 0` (counter never written) or if `time.Since(time.UnixMilli(updatedAtUnixMs)) > d`. It is the gate inside `TryUpdateIfStale` before attempting a refresh. Source: [`data/shared_state_variable.go:L37-43`](https://github.com/erpc/erpc/blob/main/data/shared_state_variable.go#L37-L43) **OnValue / OnLargeRollback callbacks.** Registered at counter construction time; called synchronously from `triggerValueCallback` within both `processNewValue` (local write path) and `processNewState` (remote sync path) after a successful CAS. The EVM state poller registers `OnValue` to push the new block number into the health tracker for metrics and `OnLargeRollback` to record large rollback events. Source: [`architecture/evm/evm_state_poller.go:L126-139`](https://github.com/erpc/erpc/blob/main/architecture/evm/evm_state_poller.go#L126-L139) **Poll timeout in EVM state poller.** The state poller computes its per-tick context timeout as `lockTtl + 15s` (minimum 30 s) to allow time for acquiring the distributed lock plus network operations. Source: [`architecture/evm/evm_state_poller.go:L174-187`](https://github.com/erpc/erpc/blob/main/architecture/evm/evm_state_poller.go#L174-L187) **Bootstrap task retry.** If `WatchCounterInt64` fails at startup (connector not ready, network error), `initCounterSync` marks the watch task as failed and the `Initializer` retries it in the background. During this window the counter operates in local-only mode. Source: [`data/shared_state_registry.go:L168-222`](https://github.com/erpc/erpc/blob/main/data/shared_state_registry.go#L168-L222) **Served-tip partitions.** Beyond per-upstream counters, eRPC also maintains network-wide `servedLatestBlock` and `servedFinalizedBlock` counters, plus per-tag-group partitions for tag-aware routing. The partition key is `"grp:" + hex(sha256(sorted_upstream_ids)[:8])` so equivalent selectors map to the same key. A cap of 16 partitions per network prevents cardinality explosion from pathological configs. Once the cap is reached, new tag-group selectors fall back to the stateless subnet-minimum. **Fallback when absent.** `erpc/erpc.go:L49-L66` synthesises an in-memory registry if `sharedState == nil`; the same `GetCounterInt64` API works identically but no cross-pod propagation occurs. Init failures are non-fatal: `erpc/init.go` logs a warning and continues with a synthesised in-memory registry. A broken Redis config therefore produces degraded per-pod behavior, not a hard startup failure. ### Config schema All fields are under `database.sharedState` in `erpc.yaml`. The block is optional; its absence causes a silent fallback to a process-local in-memory registry. Struct at [`common/config.go:L269-286`](https://github.com/erpc/erpc/blob/main/common/config.go#L269-L286), defaults at [`common/defaults.go:L789-828`](https://github.com/erpc/erpc/blob/main/common/defaults.go#L789-L828), validation at [`common/validation.go:L263-309`](https://github.com/erpc/erpc/blob/main/common/validation.go#L263-L309). | Field | Type | Default | Behavior / footguns | |---|---|---|---| | `database.sharedState` | `*SharedStateConfig` | `nil` | Root pointer. `nil` by default; when absent, `erpc/erpc.go:L49-L66` synthesises an identical in-memory config at runtime. The same `GetCounterInt64` API works in both cases; the only difference is no cross-pod propagation. Source: [`common/config.go:L266`](https://github.com/erpc/erpc/blob/main/common/config.go#L266), [`erpc/erpc.go:L49-66`](https://github.com/erpc/erpc/blob/main/erpc/erpc.go#L49-L66) | | `database.sharedState.clusterKey` | `string` | Inherits top-level `clusterKey` (default `"erpc-default"`) | Prefix prepended to every counter key as `/`. Multiple logical clusters can share the same Redis/PG instance without collision. **Footgun:** all instances default to `"erpc-default"` — distinct independent deployments pointing at the same store must set distinct keys or they cross-contaminate block-number state. Source: [`common/config.go:L271`](https://github.com/erpc/erpc/blob/main/common/config.go#L271), [`common/defaults.go:L803-804`](https://github.com/erpc/erpc/blob/main/common/defaults.go#L803-L804) | | `database.sharedState.connector` | `*ConnectorConfig` | Auto-defaulted to in-memory (`driver="memory"`) when block is present but connector is omitted | Backing store for counter persistence and pub/sub. Drivers: `memory`, `redis`, `postgresql`, `dynamodb`. **GOTCHA — id auto-assignment:** if `connector.id` is empty, shared-state scope sets it to `string(connector.driver)` (e.g. `"redis"`) at `common/defaults.go:L799-L801`, NOT `"shared-state-redis"` (the `scope+"-"+driver` pattern used by cache/auth connectors). Copying a connector block from `database.evmJsonRpcCache` into `database.sharedState` and relying on auto-assigned ids produces different ids. Always set `connector.id` explicitly. Source: [`common/defaults.go:L799-801`](https://github.com/erpc/erpc/blob/main/common/defaults.go#L799-L801) | | `database.sharedState.fallbackTimeout` | `Duration` | `3s` | Deadline for background remote I/O (Get, Set, Publish) and the hard cap on the async-refresh helper goroutine (`fallbackTimeout + 1s`). **Does NOT bound request-path latency** — that is `updateMaxWait`. Setting this very low degrades background propagation without improving client latency. Source: [`common/defaults.go:L809-812`](https://github.com/erpc/erpc/blob/main/common/defaults.go#L809-L812), [`data/shared_state_variable.go:L426`](https://github.com/erpc/erpc/blob/main/data/shared_state_variable.go#L426), [`data/shared_state_variable.go:L511`](https://github.com/erpc/erpc/blob/main/data/shared_state_variable.go#L511) | | `database.sharedState.lockTtl` | `Duration` | `4s` | Expiry of the distributed lock key in the backing store. Must exceed `fallbackTimeout` (enforced by validation) to allow one full Get+Set cycle under the lock. If the lock expires while the goroutine is still working, `Unlock` logs "lock was already expired" at DEBUG level — this is expected. Source: [`common/defaults.go:L814-817`](https://github.com/erpc/erpc/blob/main/common/defaults.go#L814-L817), [`common/validation.go:L292-295`](https://github.com/erpc/erpc/blob/main/common/validation.go#L292-L295) | | `database.sharedState.lockMaxWait` | `Duration` | `100ms` | Maximum time the background push goroutine waits to acquire the distributed lock before giving up and relying on publish-only propagation. Never blocks the request path. Must be `> 0` and `< fallbackTimeout`. Source: [`common/defaults.go:L820-823`](https://github.com/erpc/erpc/blob/main/common/defaults.go#L820-L823), [`common/validation.go:L298-301`](https://github.com/erpc/erpc/blob/main/common/validation.go#L298-L301) | | `database.sharedState.updateMaxWait` | `Duration` | `50ms` | **Primary foreground latency knob.** `TryUpdateIfStale` waits at most this long for the upstream RPC before returning the stale value and continuing async. Fast Redis responses (<5 ms typical) complete synchronously; slow responses are cut off. Must be `> 0` and `< fallbackTimeout`. Source: [`common/defaults.go:L824-827`](https://github.com/erpc/erpc/blob/main/common/defaults.go#L824-L827), [`data/shared_state_variable.go:L447`](https://github.com/erpc/erpc/blob/main/data/shared_state_variable.go#L447) | **Top-level `clusterKey`** (`common/config.go:L40`): defaults to `"erpc-default"`. Propagated to `database.sharedState.clusterKey` if that field is empty. **DynamoDB-specific `statePollInterval`** (`common/config.go:L441`): default `5s`. Controls the polling rate inside `WatchCounterInt64` since DynamoDB has no pub/sub primitive. ### Worked examples All patterns below are distilled from real production fleets; comments explain the non-obvious choices. **1. Minimal Redis config with an explicit cluster key (recommended baseline).** Every production deployment sets a unique `clusterKey` so that staging and production instances sharing one Redis instance don't contaminate each other's block-number state. The explicit `connector.id` avoids the auto-assignment gotcha where shared-state connectors get `"redis"` instead of `"shared-state-redis"`: **Config path:** `database.sharedState` **YAML — `erpc.yaml`:** ```yaml database: sharedState: # unique per deployment — prevents cross-pod block-height contamination # if two clusters share the same Redis instance (e.g. prod + staging) clusterKey: "prod-us-east-1" connector: # always set id explicitly: auto-assigned id here is "redis", not "shared-state-redis" id: "shared-state-redis" driver: redis redis: uri: "\${SHARED_STATE_REDIS_URL}" ``` **TypeScript — `erpc.ts`:** ```typescript database: { sharedState: { // unique per deployment — prevents cross-pod block-height contamination // if two clusters share the same Redis instance (e.g. prod + staging) clusterKey: "prod-us-east-1", connector: { // always set id explicitly: auto-assigned id here is "redis", not "shared-state-redis" id: "shared-state-redis", driver: "redis", redis: { uri: "\${SHARED_STATE_REDIS_URL}" }, }, }, } ``` **2. Multi-region fleet with relaxed lock and wide foreground wait.** When pods span multiple regions, the Redis round-trip from any one pod may vary. This production multi-region config raises `lockTtl` to 30 s to accommodate the longer distributed-lock cycle, sets `lockMaxWait` to 200 ms to fail fast and fall back to publish-only propagation rather than blocking, and raises `updateMaxWait` to 2 s to give the foreground refresh more time before falling through to a stale value: **Config path:** `database.sharedState` **YAML — `erpc.yaml`:** ```yaml database: sharedState: clusterKey: "fly-prod-aggregator-shard0" # long TTL covers one full Get+Set cycle at cross-region Redis latency; # must stay > fallbackTimeout (default 3s), so 30s is generous lockTtl: "30s" # fail fast on lock contention — proceed with publish-only; never block requests lockMaxWait: "200ms" # cross-region Redis RTT can be 50–200ms; give the refresh goroutine time to finish # synchronously before falling through to the stale value updateMaxWait: "2s" connector: driver: redis redis: uri: "redis://erpc-redis.internal:6379" ``` **TypeScript — `erpc.ts`:** ```typescript database: { sharedState: { clusterKey: "fly-prod-aggregator-shard0", // long TTL covers one full Get+Set cycle at cross-region Redis latency; // must stay > fallbackTimeout (default 3s), so 30s is generous lockTtl: "30s", // fail fast on lock contention — proceed with publish-only; never block requests lockMaxWait: "200ms", // cross-region Redis RTT can be 50–200ms; give the refresh goroutine time to finish // synchronously before falling through to the stale value updateMaxWait: "2s", connector: { driver: "redis", redis: { uri: "redis://erpc-redis.internal:6379" }, }, }, } ``` **3. Same Redis, two logical clusters (staging + prod).** When you can't provision separate Redis instances, use distinct `clusterKey` values. Counter keys are prefixed with the cluster key so `staging/latestBlock/...` and `prod/latestBlock/...` never collide, even though both clusters publish to the same Redis PUBLISH channel namespace. The `fallbackTimeout` and `lockTtl` defaults are left in place since in-cluster Redis latency is well under 3 s: **Config path:** `database.sharedState` **YAML — `erpc.yaml`:** ```yaml # staging erpc.yaml database: sharedState: # distinct from the production clusterKey — same Redis, no cross-contamination clusterKey: "staging" connector: driver: redis redis: uri: "\${SHARED_STATE_REDIS_URL}" --- # production erpc.yaml database: sharedState: clusterKey: "prod" connector: driver: redis redis: uri: "\${SHARED_STATE_REDIS_URL}" ``` **TypeScript — `erpc.ts`:** ```typescript // staging config database: { sharedState: { // distinct from the production clusterKey — same Redis, no cross-contamination clusterKey: "staging", connector: { driver: "redis", redis: { uri: "\${SHARED_STATE_REDIS_URL}" } }, }, }, // production config database: { sharedState: { clusterKey: "prod", connector: { driver: "redis", redis: { uri: "\${SHARED_STATE_REDIS_URL}" } }, }, }, ``` **4. DynamoDB for AWS-only deployments.** When Redis or PostgreSQL aren't available, DynamoDB is a valid fallback, but `PublishCounterInt64` is a no-op — propagation relies entirely on polling. Cross-pod convergence latency is bounded by `statePollInterval` (default 5 s) rather than a push event. Acceptable for batch indexing workloads that tolerate slightly stale block-height views, but not recommended for latency-sensitive RPC fleets: **Config path:** `database.sharedState` **YAML — `erpc.yaml`:** ```yaml database: sharedState: clusterKey: "prod-main" connector: driver: dynamodb dynamodb: region: "us-east-1" table: "erpc-shared-state" # DynamoDB has no pub/sub — all propagation is via polling; # lower values cost more DynamoDB read capacity; default 5s is the floor statePollInterval: 5s ``` **TypeScript — `erpc.ts`:** ```typescript database: { sharedState: { clusterKey: "prod-main", connector: { driver: "dynamodb", dynamodb: { region: "us-east-1", table: "erpc-shared-state", // DynamoDB has no pub/sub — all propagation is via polling; // lower values cost more DynamoDB read capacity; default 5s is the floor statePollInterval: "5s", }, }, }, } ``` **5. No shared state (single-pod or dev).** Omitting the block entirely gives you a silent per-process in-memory registry — no config needed, same API, no cross-pod propagation: ```yaml # No database.sharedState block — eRPC synthesises an in-memory registry automatically database: {} ``` ### Request/response behavior The shared-state layer has no effect on JSON-RPC request or response shapes visible to clients. Its impact is routing-internal: - **Block-number reads** (`latestBlock`, `finalizedBlock`) used by the upstream selector and tag-aware routing come from the in-process atomic copy — zero latency on the request path. - **Block-number refreshes** are debounced via `TryUpdateIfStale`; the foreground caller waits at most `updateMaxWait` (default 50 ms). If the refresh completes within that window the fresh value influences the current request; otherwise the stale value is used and the refresh completes asynchronously for the next request. - **Served-tip counters** (`servedLatestBlock`, `servedFinalizedBlock`) are updated after a response is selected and do not add latency. They gate the "never serve older than seen" invariant that prevents block-number regressions across requests in a scaled fleet. - A broken or unreachable shared-state connector degrades to per-pod local state — no error is returned to the client; routing quality degrades silently. ### Best practices - Always set a **unique `clusterKey`** per logical deployment (e.g. `"prod-v2"`, `"staging"`). The default `"erpc-default"` is shared by every unconfigured instance pointing at the same store — cross-contamination breaks block-height monotonicity. - Always set **`connector.id` explicitly** (`id: "shared-state-redis"`). The auto-assigned id for shared-state connectors is `string(driver)` (e.g. `"redis"`), not the `"shared-state-redis"` pattern used by cache connectors — copying connector blocks between scopes silently changes the id. - Prefer **Redis or PostgreSQL** over DynamoDB for shared state. DynamoDB has no pub/sub; convergence latency is 5 s by default vs. milliseconds for Redis pubsub. - Do not set **`fallbackTimeout` low** to try to speed up request latency — `fallbackTimeout` governs background I/O only. The foreground latency knob is `updateMaxWait`. - Keep **`lockTtl > fallbackTimeout`** (the default 4 s > 3 s relationship). Validation enforces this, but understand it: `lockTtl` must be long enough for one full Get+Set cycle under the lock. - In AWS deployments with Redis, set **`pool_size`** in the Redis URI (`?pool_size=10`) to avoid connection exhaustion under bursty background push traffic. - Monitor `erpc_upstream_block_head_large_rollback` in production. A sustained non-zero value means an upstream is experiencing genuine reorgs or provider resets — something worth routing around. ### Edge cases & gotchas 1. **`Value=0` is valid, `UpdatedAt=0` is not.** Genesis block (`earliestBlock=0`) is a legitimate counter value. Uninitialised state is signalled by `UpdatedAt <= 0`, never by `Value == 0`. Source: [`data/connector.go:L32-34`](https://github.com/erpc/erpc/blob/main/data/connector.go#L32-L34) 2. **Stale lock expiry is silent.** If a remote operation takes longer than `lockTtl`, the lock key expires. `Unlock` returns "lock was already expired" logged at DEBUG — expected and benign. Setting `lockTtl < fallbackTimeout` is rejected by validation to prevent this in every call. Source: [`common/validation.go:L292-295`](https://github.com/erpc/erpc/blob/main/common/validation.go#L292-L295) 3. **Remote-ahead adoption happens in background.** A fresh pod can have its local counter bumped forward by a peer before the pod's first RPC poll completes. Source: [`data/shared_state_variable.go:L686-691`](https://github.com/erpc/erpc/blob/main/data/shared_state_variable.go#L686-L691) 4. **Small-rollback rejection advances local timestamp.** When a remote update has a newer `UpdatedAt` but a value within the rollback threshold, `advanceTimestampPast` is called to ensure the local (higher) value wins the next reconciliation. Source: [`data/shared_state_variable.go:L211-218`](https://github.com/erpc/erpc/blob/main/data/shared_state_variable.go#L211-L218) 5. **`TryUpdate` must not acquire `updateMu`.** Holding `updateMu` during `TryUpdate` would block a concurrent `TryUpdateIfStale` caller already holding the mutex that could in turn wait for background operations — a classic deadlock. The old design had this bug; test `TestLockOrderSimulation_OldBuggyBehavior` documents it. Source: [`data/shared_state_variable_deadlock_test.go:L95-212`](https://github.com/erpc/erpc/blob/main/data/shared_state_variable_deadlock_test.go#L95-L212) 6. **`continueAsyncRefresh` goroutine leak protection.** If the downstream RPC ignores its context and never returns, a bounded timer of `fallbackTimeout + 1s` kills the helper. Source: [`data/shared_state_variable.go:L511`](https://github.com/erpc/erpc/blob/main/data/shared_state_variable.go#L511) 7. **Double-checked staleness.** `TryUpdateIfStale` checks `IsStale` before and after acquiring `updateMu` to prevent a thundering herd where N goroutines all see stale and all fire a refresh after the first one already updated the value. Source: [`data/shared_state_variable.go:L399-413`](https://github.com/erpc/erpc/blob/main/data/shared_state_variable.go#L399-L413) 8. **Redis pubsub channel buffered=1, drop-on-full.** If the registry consumer falls behind, messages are silently dropped. The 5-minute periodic `pollingLoop` is the recovery path. Source: [`data/redis_pubsub_manager.go:L450-457`](https://github.com/erpc/erpc/blob/main/data/redis_pubsub_manager.go#L450-L457) 9. **DynamoDB `PublishCounterInt64` is a no-op.** Cross-pod latency is bounded by `statePollInterval` (default 5 s). Using DynamoDB means significantly higher convergence latency than Redis or PostgreSQL. Source: [`data/dynamodb.go:L825-829`](https://github.com/erpc/erpc/blob/main/data/dynamodb.go#L825-L829) 10. **`WatchCounterInt64` channel closed on shutdown.** `RedisPubSubManager.stop()` closes all subscriber channels on app shutdown; the registry goroutine observing the channel receives `ok=false` and marks the watch task as failed. This is benign since the app context is also cancelled at that point. Source: [`data/shared_state_registry.go:L200-207`](https://github.com/erpc/erpc/blob/main/data/shared_state_registry.go#L200-L207) 11. **`clusterKey` propagation order.** Explicit `database.sharedState.clusterKey` wins over the inherited top-level `clusterKey`. Source: [`common/defaults.go:L803-805`](https://github.com/erpc/erpc/blob/main/common/defaults.go#L803-L805) 12. **Served-tip partition cap at 16 per network.** Once reached, new tag-group selectors fall back to stateless subnet-minimum rather than getting a monotonic counter. Source: [`erpc/networks.go:L80-84`](https://github.com/erpc/erpc/blob/main/erpc/networks.go#L80-L84) 13. **init.go treats shared-state failure as a warning, not fatal.** A broken Redis config produces degraded per-pod behavior, not a hard startup failure. Source: [`erpc/init.go:L93-97`](https://github.com/erpc/erpc/blob/main/erpc/init.go#L93-L97) 14. **`database.sharedState` nil means `SetDefaults` is never called on it.** The synthesised config at `erpc/erpc.go:L49-L66` exists only in local scope and does not write back to `cfg.Database.SharedState`. Source: [`common/defaults.go:L837-840`](https://github.com/erpc/erpc/blob/main/common/defaults.go#L837-L840) 15. **Connector `id` auto-assignment differs between shared-state and other scopes.** Shared-state connector id defaults to `string(driver)` (e.g. `"redis"`), not `"shared-state-redis"`. Source: [`common/defaults.go:L799-801`](https://github.com/erpc/erpc/blob/main/common/defaults.go#L799-L801) ### Observability | Metric | Type | Labels | When it fires | |---|---|---|---| | `erpc_upstream_latest_block_number` | Gauge | `project`, `vendor`, `network`, `upstream` | On every `OnValue` callback from `latestBlockShared` counter via `tracker.SetLatestBlockNumber` | | `erpc_upstream_finalized_block_number` | Gauge | `project`, `vendor`, `network`, `upstream` | On every `OnValue` callback from `finalizedBlockShared` counter via `tracker.SetFinalizedBlockNumber` | | `erpc_upstream_latest_block_polled_total` | Counter | `project`, `vendor`, `network`, `upstream` | Each time `PollLatestBlockNumber` issues an actual RPC call (not debounced by `TryUpdateIfStale`) | | `erpc_upstream_finalized_block_polled_total` | Counter | `project`, `vendor`, `network`, `upstream` | Each time `PollFinalizedBlockNumber` issues an actual RPC call | | `erpc_upstream_block_head_large_rollback` | Gauge | `project`, `vendor`, `network`, `upstream` | On `OnLargeRollback` callback; value is the rollback size in blocks | | `erpc_unexpected_panic_total` | Counter | `component`, `context`, `fingerprint` | On panic recovery inside `initCounterSync` (`component="shared-state-counter-sync"`), `messageLoop` (`component="redis-pubsub-message-loop"`), or `pollingLoop` (`component="redis-pubsub-polling-loop"`) | **OTel trace spans:** - `CounterInt64.TryUpdate` — every foreground local-update call; attributes: `key`, `new_value` (when detailed tracing enabled). - `CounterInt64.TryUpdateIfStale` — attributes: `key`, `staleness_ms`, `skipped_not_stale`, `skipped_not_stale_after_mutex`, `role`, `result_path` (`got_result` vs `timeout`), `async_refresh`, `foreground_remote_io_disabled`. - `CounterInt64.TryUpdateIfStale.AcquireMutex` — measures mutex wait time inside `TryUpdateIfStale`. - `CounterInt64.TryUpdateIfStale.ExecuteRefresh` — spans the actual RPC call inside the refresh goroutine; attribute `timeout_ms`. - `RedisConnector.PublishCounterInt64` — spans each publish call; attributes `key`, `value`, `updated_at`, `updated_by` (when detailed tracing enabled). - `PostgreSQLConnector.PublishCounterInt64` — same pattern as Redis span above. **Key log messages** (component `sharedState`): - `"no remote initial value found for counter"` (DEBUG) — normal first-run; counter seeds from zero. - `"fetched initial value for counter"` (DEBUG) — successful seed from remote on first bootstrap. - `"received new value from shared state sync"` (DEBUG) — pubsub/watch update received and processed. - `"failed to setup counter sync"` (ERROR) — watch setup failed; will retry via initializer. - `"lock held by another instance, proceeding with local lock"` (DEBUG) — normal under load; background push proceeds without distributed lock. - `"lock acquisition timed out waiting for other instance"` (WARN) — lock contention above `lockMaxWait`; falling back to publish-only propagation. - `"lock expired during operations (expected behavior)"` (DEBUG) — expected when operations take near `lockTtl`. - `"published counter value to remote"` (DEBUG) — successful remote publish. - `"pubsub reconnected successfully"` (INFO) — Redis pubsub recovery. - `"counter value increased (remote)"` (TRACE) — value advanced by remote pubsub event. - `"small rollback ignored (remote)"` (TRACE) — rollback inside threshold, discarded. - `"large rollback applied (remote)"` (TRACE) — real reorg or provider reset accepted. ### Source code entry points - [`data/shared_state_registry.go:L38-L80`](https://github.com/erpc/erpc/blob/main/data/shared_state_registry.go#L38-L80) — `NewSharedStateRegistry`: registry construction, `clusterKey` prefix, `sync.Map` of live counters - [`data/shared_state_variable.go:L364-L707`](https://github.com/erpc/erpc/blob/main/data/shared_state_variable.go#L364-L707) — `TryUpdate`, `TryUpdateIfStale`, `scheduleBackgroundPushCurrent`, lock ordering, rollback handling, timestamp CAS helpers - [`data/shared_state_registry.go:L102-L222`](https://github.com/erpc/erpc/blob/main/data/shared_state_registry.go#L102-L222) — `GetCounterInt64`: idempotent counter creation, bootstrap tasks, `WatchCounterInt64` goroutine - [`data/redis_pubsub_manager.go:L155-L460`](https://github.com/erpc/erpc/blob/main/data/redis_pubsub_manager.go#L155-L460) — `RedisPubSubManager`: single `PSubscribe("counter:*")`, reconnect backoff, periodic poll fallback, copy-on-write subscriber slice - [`erpc/networks.go:L55-L96`](https://github.com/erpc/erpc/blob/main/erpc/networks.go#L55-L96) — served-tip partition key derivation and 16-partition cap - [`erpc/erpc.go:L49-L66`](https://github.com/erpc/erpc/blob/main/erpc/erpc.go#L49-L66) — in-memory fallback synthesis when `sharedState == nil` - [`common/defaults.go:L789-L828`](https://github.com/erpc/erpc/blob/main/common/defaults.go#L789-L828) — `SharedStateConfig.SetDefaults`; all default values including connector id override - [`data/shared_state_sources.go`](https://github.com/erpc/erpc/blob/main/data/shared_state_sources.go) — named constants for the `source` field in logs and traces (`UpdateSourceRemoteSync`, `UpdateSourceTryUpdate`, etc.) - [`data/shared_state_variable_deadlock_test.go:L95-L212`](https://github.com/erpc/erpc/blob/main/data/shared_state_variable_deadlock_test.go#L95-L212) — lock-ordering correctness tests ### Related pages - [Connectors](/config/database.llms.txt) — the backing store options (Redis, PostgreSQL, DynamoDB, memory) that power shared state. - [Tag-aware routing](/config/projects/selection-policies.llms.txt) — uses `servedLatestBlock`/`servedFinalizedBlock` partition counters that shared state keeps monotonic across pods. - [EVM state poller](/reference/evm.llms.txt) — the consumer that calls `TryUpdateIfStale` per upstream; registers `OnValue`/`OnLargeRollback` callbacks. - [Survive provider outages](/use-cases/survive-provider-outages.llms.txt) — routing quality under failover depends on consistent block-height awareness across pods. - [Deployment](/deployment.llms.txt) — horizontal scaling guidance where shared state becomes necessary. --- ## Navigation (machine-readable surface) - Up: [All pages index](https://docs.erpc.cloud/llms.txt) - Root index of every page: [llms.txt](https://docs.erpc.cloud/llms.txt) · everything in one file: [llms-full.txt](https://docs.erpc.cloud/llms-full.txt) ### Sibling pages - [Storage drivers](https://docs.erpc.cloud/config/database/drivers.llms.txt) — Five interchangeable cache back-ends — memory, Redis, PostgreSQL, DynamoDB, and a read-only gRPC BDS connector — all behind one uniform interface, with optional per-operation failsafe policies that keep transient storage hiccups invisible to your upstreams. - [Cache policies](https://docs.erpc.cloud/config/database/evm-json-rpc-cache.llms.txt) — Stop paying for the same upstream call twice — eRPC caches every EVM JSON-RPC response by finality bucket, fans out reads in parallel, and rejects stale tip-of-chain data before it ever reaches your users.