# Cordoning

> Source: https://docs.erpc.cloud/operation/cordoning
> Pull any upstream out of routing instantly with one admin call — no metric window to wait for, no config redeploy required.
> Format: machine-readable markdown export of the docs page above.
> All collapsible AI sections are inlined and fully expanded.

# Cordoning

When a vendor starts acting up — latency spikes, wrong responses, a quota incident — you
want it gone from routing *right now*, not after a 15-minute error-rate window closes.
One admin RPC call cordons the upstream instantly. Another call brings it back just as
fast. No config change, no redeploy, no restart.

## Agent reference

Copy one of these prompts into your AI agent session (Claude Code, Cursor, …) — each one
points the agent at this page's machine-readable reference so it can do the work correctly:

**Prompt Example #1: pull a degraded upstream out of rotation right now**

```text
One of my eRPC upstreams is having a vendor incident and I need to remove it from routing
immediately without redeploying. Show me the admin API calls to cordon it, verify the
change, and restore it when the incident is resolved. My admin endpoint is configured in
my eRPC config. Read the full reference first:
https://docs.erpc.cloud/operation/cordoning.llms.txt
```

**Prompt Example #2: cordon a single broken RPC method**

```text
A specific upstream in my eRPC setup is timing out only on eth_getLogs but is fine for
everything else. Show me how to cordon just that method without pulling the whole upstream
out of rotation, and explain why erpc_listCordoned won't show it. Work with my existing eRPC config.
Reference: https://docs.erpc.cloud/operation/cordoning.llms.txt
```

**Prompt Example #3: alert and dashboard on cordon state**

```text
Set up Prometheus alerts and dashboard panels so I can see which eRPC upstreams are
currently cordoned, how long they've been cordoned, and how many times they've been
cordoned in the past 24 hours. Explain the erpc_upstream_cordoned gauge vs the event
counter. Reference: https://docs.erpc.cloud/operation/cordoning.llms.txt
```

---

### Cordoning — full agent reference

### How it works

Cordon state lives on the health tracker as an `atomic.Bool` plus a reason string, keyed
to an `(upstream, method, finality=All)` cell where `method` defaults to `"*"` for
whole-upstream cordons. It is not a rolling-window metric: `Rotate()` (the window-tick
function) never touches cordoned cells, and the idle-sweep goroutine explicitly skips them
so the flag cannot silently vanish after 30 idle minutes.

The state feeds the selection policy through the `cordonedReason` field exposed on each
upstream's JS metrics object. The built-in default policy calls `.removeCordoned()` as its
first chain step, so a cordon takes effect within one eval interval after the admin RPC
returns (default 15 s). A custom `evalFunc` must call `.removeCordoned()` explicitly or
the cordon has no routing effect.

`IsCordoned(up, method)` checks the `(up, "*", All)` wildcard scope first, then the
specific `(up, method, All)` scope — a wildcard cordon shadows any method-scoped check.
Uncordoning only flips the cell addressed by the call; it never automatically lifts a
wildcard cordon when a method-scoped cordon is removed.

Uncordoning records the duration (`erpc_upstream_cordon_duration_seconds`) and emits the
`action="uncordon"` event counter. Repeated cordon calls on an already-cordoned upstream
overwrite the reason but do not reset the start timestamp, so duration accounting is
continuous even if the reason is updated mid-incident.

Cordon state is in-process only — it survives metric window rotations but is lost on
process restart. Use it for outages measured in minutes to hours. For permanent exclusion,
remove the upstream from config or set `ignoreMethods: ["*"]` and redeploy.

### Config schema

Cordoning has no config-file fields. It is controlled entirely at runtime via admin RPC.
The selection policy's `evalFunc` implicitly controls whether cordon state is respected —
it requires `.removeCordoned()` in the chain. For `evalFunc` and `evalInterval` config
fields, see [Selection &amp; scoring](/config/projects/selection-policies.llms.txt).

### Worked examples

**1. Incident response: cordon a vendor immediately.**
You notice `erpc_upstream_error_rate` spiking on `alchemy-eth-1` before the error window
is wide enough to trigger automatic exclusion. Cordon it now:

```bash
curl -X POST http://localhost:4000/admin \
  -H 'Content-Type: application/json' \
  -d '{
    "jsonrpc": "2.0", "id": 1,
    "method": "erpc_cordonUpstream",
    "params": [{"projectId": "main", "upstream": "alchemy-eth-1",
                "reason": "vendor incident #12345"}]
  }'
```

```json
{"jsonrpc":"2.0","id":1,"result":{
  "projectId": "main",
  "upstream": "alchemy-eth-1",
  "method": "*",
  "cordoned": true,
  "reason": "vendor incident #12345"
}}
```

Omitting `method` cordons the upstream for all methods (`"*"`).

**2. Method-scoped cordon: isolate a broken RPC method.**
A vendor is fine for most calls but `eth_getLogs` is timing out past 30 s. Cordon just
that method while the rest of the vendor's capacity keeps working:

```bash
curl -X POST http://localhost:4000/admin \
  -H 'Content-Type: application/json' \
  -d '{
    "jsonrpc": "2.0", "id": 2,
    "method": "erpc_cordonUpstream",
    "params": [{"projectId": "main", "upstream": "drpc-eth-1",
                "method": "eth_getLogs", "reason": "p95 > 30 s"}]
  }'
```

A wildcard cordon overrides method-scoped cordons: an upstream with both `"*"` and
`"eth_getLogs"` cordons is excluded for all methods. Uncordoning a specific method does
not lift a wildcard cordon.

**3. List and restore.**
During an incident, list all whole-upstream cordons, then restore when the vendor
confirms recovery:

```bash
# list all wildcard-scoped cordons in a project
curl -X POST http://localhost:4000/admin \
  -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","id":3,"method":"erpc_listCordoned",
       "params":[{"projectId":"main"}]}'
```

```json
{
  "projectId": "main",
  "cordoned": [
    {"upstream": "alchemy-eth-1", "reason": "vendor incident #12345"}
  ]
}
```

`erpc_listCordoned` only returns upstreams with a `"*"`-scope cordon. Method-scoped
cordons are invisible to it; read the `erpc_upstream_cordoned` gauge labels to enumerate
them.

```bash
# restore the upstream
curl -X POST http://localhost:4000/admin \
  -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","id":4,"method":"erpc_uncordonUpstream",
       "params":[{"projectId":"main","upstream":"alchemy-eth-1",
                  "reason":"vendor confirmed resolved"}]}'
```

**4. Custom evalFunc — explicit `.removeCordoned()` required.**
If you have replaced `selectionPolicy.evalFunc` with a custom function, you must call
`.removeCordoned()` explicitly near the start of the chain. Without it the
`cordonedReason` field on the upstream's metrics object is populated, but no step actually
drops the upstream from the ordered list — the cordon has no routing effect.

Source: [`internal/policy/default_policy.js:L1`](https://github.com/erpc/erpc/blob/main/internal/policy/default_policy.js#L1) — `.removeCordoned()` is the first call.

### Request/response behavior

- `erpc_cordonUpstream` `method` param defaults to `"*"` when absent; `reason` defaults
  to `"admin: manual cordon"`. [[`erpc/admin.go:L630-671`](https://github.com/erpc/erpc/blob/main/erpc/admin.go#L630-L671)]
- `erpc_uncordonUpstream` `reason` defaults to `"admin: manual uncordon"` — a distinct
  string from the cordon default. [[`erpc/admin.go:L668-673`](https://github.com/erpc/erpc/blob/main/erpc/admin.go#L668-L673)]
- `erpc_listCordoned` only returns upstreams where `CordonedReason("*")` returns
  `cordoned=true` — method-scoped cordons are not listed.
  [[`erpc/admin.go:L721-722`](https://github.com/erpc/erpc/blob/main/erpc/admin.go#L721-L722)]
- Cordon state takes effect on the next policy eval tick (default 15 s after the RPC
  returns). Requests in-flight at the moment of cordon can still reach the upstream for up
  to one eval interval.
- A cordoned upstream placed at position `-1` in `erpc_selection_position` once
  `.removeCordoned()` drops it on the next eval tick.
  [[`internal/policy/slot.go:L477`](https://github.com/erpc/erpc/blob/main/internal/policy/slot.go#L477)]
- All admin RPCs require `admin.auth` configured; missing `admin:` block returns
  `"admin is not enabled for this project"` (401); present `admin:` but absent
  `admin.auth:` returns `"admin auth not configured"` (401).
  [[`erpc/admin.go:L26-30`](https://github.com/erpc/erpc/blob/main/erpc/admin.go#L26-L30)]

### Best practices

- **Cordon early, not late.** The selection policy's metric-driven exclusion needs sample
  accumulation; cordoning is zero-lag. When you see a vendor degrading in dashboards,
  cordon it immediately rather than waiting for `errorRateAbove(0.7)` to trigger.
- **Always supply a reason string.** The reason appears in `erpc_upstream_cordoned` gauge
  labels and in `erpc_listCordoned` output. A reason like `"incident-#12345 alchemy"` makes
  incident timelines clear and helps correlate with duration histograms post-incident.
- **Prefer method-scoped cordons when feasible.** If a vendor is broken only for
  `eth_getLogs` but healthy for `eth_call`, a method-scoped cordon keeps the vendor in
  rotation for the methods it can serve — reducing pressure on remaining upstreams.
- **Remember the 15 s propagation delay.** The tracker is updated immediately, but the
  ordered-list cache is rebuilt on the next eval tick. Build runbooks around this delay
  (e.g. wait 20 s before verifying traffic shifted).
- **Do not use cordon for permanent exclusions.** Cordon state is in-process only; it is
  lost on restart. For a vendor you want permanently removed, update the config.
- **Watch for stale gauge series after reason updates.** `erpc_upstream_cordoned` is
  labeled by `reason`; calling `erpc_cordonUpstream` with a new reason on an already-cordoned
  upstream leaves the old gauge series at `1`. Alert on label value, not just the metric
  name, or always uncordon before re-cordoning with a new reason.
- **Custom evalFunc operators: `.removeCordoned()` must be explicit.** The default policy
  includes it; a custom `evalFunc` that omits it will silently ignore all admin cordons.

### Edge cases &amp; gotchas

1. **`erpc_listCordoned` hides method-scoped cordons.** Only `"*"`-scope cordons appear.
   To enumerate method-scoped cordons, query the `erpc_upstream_cordoned` Prometheus gauge
   and filter by `category != "*"`. Source: [`erpc/admin.go:L721-724`](https://github.com/erpc/erpc/blob/main/erpc/admin.go#L721-L724).
2. **Uncordoning a method does not lift a wildcard cordon.** `Uncordon("eth_getLogs")`
   flips only that cell. `IsCordoned` checks `"*"` first, so the upstream stays out of
   rotation for all methods as long as the wildcard cordon is active.
   Source: [`health/tracker.go:L853-863`](https://github.com/erpc/erpc/blob/main/health/tracker.go#L853-L863).
3. **Reason change creates a new gauge series.** `erpc_upstream_cordoned` is labeled by
   `reason`; calling `erpc_cordonUpstream` with a different reason on an already-cordoned
   upstream leaves the old gauge series at `1`. The old series is not cleared until an
   `erpc_uncordonUpstream` call with the exact matching reason label or a process restart.
   Source: [`health/tracker.go:L805-817`](https://github.com/erpc/erpc/blob/main/health/tracker.go#L805-L817).
4. **Cordon survives `Rotate()` but not restart.** In-process only; no persistence to any
   storage backend. For permanent exclusion use config changes.
5. **Custom evalFunc without `.removeCordoned()` ignores all admin cordons.** The cordon
   flag is set on the tracker and `cordonedReason` is populated on the JS upstream object,
   but no routing effect occurs unless `.removeCordoned()` (or equivalent logic) appears
   in the chain.
6. **Routing effect is delayed by up to `evalInterval`.** Cordon sets the tracker state
   immediately, but the ordered-list cache used by the request path is updated only on the
   next eval tick (default every 15 s). Requests in-flight at the moment of cordon can
   still reach the upstream for up to one eval interval.
7. **State-poller is unaffected by cordon.** The EVM state poller for a cordoned upstream
   continues running, keeping its latest/finalized block numbers fresh. When uncordoned,
   the upstream's metrics are current and re-admission scoring works immediately.
8. **Shadow mirroring is unaffected by cordon.** Shadow upstreams receive async-mirrored
   traffic regardless of cordon state; cordon only affects routing of real requests.
9. **`erpc_cordonUpstream` is idempotent on the same `(upstream, method)` pair.** Repeated
   calls update `reason` but do not reset `CordonedAtMs` or fire multiple `cordon` event
   counter increments. Source: [`health/tracker.go:L805-817`](https://github.com/erpc/erpc/blob/main/health/tracker.go#L805-L817).
10. **`erpc_listCordoned` requires `admin.auth` configured.** If the `admin:` section is
    present but `admin.auth:` is absent, the request returns `"admin auth not configured"`.
    If the `admin:` section is entirely missing, it returns `"admin is not enabled for this
    project"`. Both are 401 responses.
    Source: [`erpc/admin.go:L26-30`](https://github.com/erpc/erpc/blob/main/erpc/admin.go#L26-L30) and [`erpc/http_server.go:L597-610`](https://github.com/erpc/erpc/blob/main/erpc/http_server.go#L597-L610).

### Observability

| Metric | Type | Labels | When it fires |
|---|---|---|---|
| `erpc_upstream_cordoned` | gauge | project, vendor, network, upstream, category (=method), reason | Set to 1 on cordon, 0 on uncordon; persists until uncordon or process restart |
| `erpc_upstream_cordon_event_total` | counter | project, network, upstream, action | Edge transitions only: OFF→ON (`action="cordon"`) and ON→OFF (`action="uncordon"`); repeated cordons do not increment |
| `erpc_upstream_cordon_duration_seconds` | histogram | project, network, upstream | Observed once per uncordon; value = `now − CordonedAtMs`; buckets 1 s … 86400 s |
| `erpc_selection_position{upstream=…}` | gauge | project, network, method, upstream | Set to `-1` for every excluded (including cordoned) upstream after each eval tick |
| `erpc_selection_rejection_total{step="removeCordoned"}` | counter | project, network, method, upstream, step | Per tick × upstream dropped by the `.removeCordoned()` step |

Log messages emitted on cordon state changes (DEBUG level, `health/tracker.go:L795-798` and `L822-824`):
- `"cordoning upstream to disable routing"` — emitted on every `Cordon` call (including repeated calls that only update the reason).
- `"uncordoning upstream to enable routing"` — emitted on every `Uncordon` call.

### Source code entry points

- [`health/tracker.go:L793-849`](https://github.com/erpc/erpc/blob/main/health/tracker.go#L793-L849) — `Cordon` and `Uncordon` methods: state storage, timestamp, gauge, event counter, duration histogram
- [`health/tracker.go:L853-890`](https://github.com/erpc/erpc/blob/main/health/tracker.go#L853-L890) — `IsCordoned` and `CordonedReason`: wildcard-first lookup logic
- [`erpc/admin.go:L590-729`](https://github.com/erpc/erpc/blob/main/erpc/admin.go#L590-L729) — `erpc_cordonUpstream`, `erpc_uncordonUpstream`, `erpc_listCordoned` handlers and param parsing
- [`erpc/admin.go:L641-654`](https://github.com/erpc/erpc/blob/main/erpc/admin.go#L641-L654) — upstream lookup by ID across the project's upstream registry (the step before calling `upstream.Cordon`)
- [`upstream/upstream.go:L1568-1580`](https://github.com/erpc/erpc/blob/main/upstream/upstream.go#L1568-L1580) — `Cordon`, `Uncordon`, `CordonedReason` pass-throughs from `Upstream` to the health tracker
- [`health/tracker.go:L207-224`](https://github.com/erpc/erpc/blob/main/health/tracker.go#L207-L224) — `Rotate()` skipping cordoned cells (window rotation cannot clear a cordon)
- [`health/tracker.go:L598-615`](https://github.com/erpc/erpc/blob/main/health/tracker.go#L598-L615) — idle-sweep exclusion for cordoned cells
- [`internal/policy/eval.go:L98-100`](https://github.com/erpc/erpc/blob/main/internal/policy/eval.go#L98-L100) — `cordonedReason` exposed to JS eval context
- [`internal/policy/default_policy.js:L1`](https://github.com/erpc/erpc/blob/main/internal/policy/default_policy.js#L1) — `.removeCordoned()` as first default policy step
- [`internal/policy/slot.go:L477`](https://github.com/erpc/erpc/blob/main/internal/policy/slot.go#L477) — position `-1` assigned for excluded (cordoned) upstreams after each tick

### Related pages

- [Admin API](/operation/admin.llms.txt) — authentication setup required before any cordon call will succeed.
- [Selection &amp; scoring](/config/projects/selection-policies.llms.txt) — the `evalFunc` chain where `.removeCordoned()` must appear; also `evalInterval` that controls propagation delay.
- [Survive provider outages](/use-cases/survive-provider-outages.llms.txt) — the incident-response outcome cordoning helps achieve.
- [Upstreams](/config/projects/upstreams.llms.txt) — permanent exclusion via `ignoreMethods: ["*"]` or config removal, the alternative when cordon is not the right tool.

---


## Navigation (machine-readable surface)

- Up: [All pages index](https://docs.erpc.cloud/llms.txt)
- Root index of every page: [llms.txt](https://docs.erpc.cloud/llms.txt) · everything in one file: [llms-full.txt](https://docs.erpc.cloud/llms-full.txt)

### Sibling pages

- [Admin API](https://docs.erpc.cloud/operation/admin.llms.txt) — A built-in operator control plane — inspect topology, cordon sick upstreams without restarts, and manage API keys, all over a secure JSON-RPC 2.0 endpoint.
- [Batching & multiplexing](https://docs.erpc.cloud/operation/batch.llms.txt) — Send one request, get back a merged response — eRPC parallelises inbound batch arrays, re-batches calls to supporting upstreams, and collapses identical in-flight requests so each unique call hits the network exactly once.
- [CLI & env vars](https://docs.erpc.cloud/operation/cli.llms.txt) — Start, validate, or inspect your eRPC config from the command line — then deploy with confidence knowing exactly what the engine will run.
- [Directives](https://docs.erpc.cloud/operation/directives.llms.txt) — Send an HTTP header or query param and change routing, caching, validation, or consensus for exactly that one request — no restarts, no config changes.
- [Healthcheck](https://docs.erpc.cloud/operation/healthcheck.llms.txt) — One endpoint that tells Kubernetes exactly when your pod is ready, draining, or broken — with eight probe strategies from "any upstream alive" to live chain-ID verification.
- [Monitoring & metrics](https://docs.erpc.cloud/operation/monitoring.llms.txt) — Every subsystem in eRPC — upstreams, cache, rate limits, consensus, hedging — emits Prometheus metrics. One scrape target, full visibility, zero instrumentation work.
- [Production checklist](https://docs.erpc.cloud/operation/production.llms.txt) — Go live confidently — a short list of settings that separate a hardened eRPC deployment from a dev-mode one.
- [Tracing & logging](https://docs.erpc.cloud/operation/tracing.llms.txt) — Every request, cache lookup, and upstream call becomes a searchable span — shipped to any OTel backend. Secrets never leave the process.
- [URL structure](https://docs.erpc.cloud/operation/url.llms.txt) — One URL pattern routes every chain — domain and network aliases let you publish clean, memorable endpoints without touching your app code.