Circuit Breakers
Circuit breakers protect push mode endpoints from cascading failures. When a function’s HTTP endpoint returns repeated errors, the circuit opens to stop sending requests. After a timeout, a single probe request tests recovery. If it succeeds, the circuit closes and normal traffic resumes.
How It Works
Section titled “How It Works”Each push mode function gets its own circuit breaker, keyed by function ID and endpoint URL. This means two functions sharing the same endpoint URL get independent breakers.
┌──────────┐ 5 consecutive ┌──────────┐ 60s elapsed ┌───────────┐ │ CLOSED │───failures─────▶│ OPEN │───────────────▶│ HALF-OPEN │ │ (normal) │ │ (reject) │ │ (probe) │ └──────────┘ └──────────┘ └───────────┘ ▲ ▲ │ │ │ success │ failure │ │ └────────────────────────────┴────────────────────────┘ │ success ──────────────┘States:
| State | Behavior |
|---|---|
| Closed | Normal operation. Requests pass through to the endpoint. |
| Open | Failing fast. All requests are rejected without calling the endpoint. Runs are deferred for retry. |
| Half-Open | A single probe request is allowed through. If it succeeds, the circuit closes. If it fails, the circuit re-opens. |
Default thresholds:
| Setting | Default | Description |
|---|---|---|
| Failure threshold | 5 | Consecutive failures before opening the circuit |
| Success threshold | 1 | Consecutive successes in half-open before closing |
| Timeout | 60 seconds | How long the circuit stays open before probing |
State Persistence
Section titled “State Persistence”Circuit breaker state is persisted in a NATS KV bucket (SYS_circuit_breakers) with a 24-hour TTL. This means:
- Restart survival: A restarted node inherits open circuits from before the restart instead of allowing traffic to endpoints that were failing.
- Cross-node sharing: In a multi-node cluster, when one node opens a circuit, all other nodes learn about it within seconds via NATS KV watch.
- Rolling deploy safety: State carries across rolling deploys without protection gaps.
If NATS KV is unavailable (embedded NATS not started, test environments), circuit breakers gracefully fall back to in-memory mode with the same behavior as before persistence was added.
Viewing Circuit Breaker State
Section titled “Viewing Circuit Breaker State”Dashboard
Section titled “Dashboard”The Functions page shows a circuit breaker state badge next to each function. Open circuits show a red badge, half-open shows yellow. Closed circuits show no badge (normal state).
# List all circuit breakersironflow circuit-breaker list
# JSON outputironflow circuit-breaker list --jsonExample output:
FUNCTION_ID ENDPOINT STATE FAILS LAST_FAILUREfn-payments http://payments:3000/api/ironflow open 5 2026-04-06T12:00:00Zfn-orders http://orders:4000/api/ironflow closed 0 -# List all breaker statescurl http://localhost:9123/api/v1/circuit-breakers
# Response[ { "key": "fn-payments|http://payments:3000/api/ironflow", "function_id": "fn-payments", "endpoint": "http://payments:3000/api/ironflow", "state": "open", "consecutive_fails": 5, "last_failure": "2026-04-06T12:00:00Z" }]Prometheus Metrics
Section titled “Prometheus Metrics”The ironflow_circuit_breaker_state gauge tracks circuit breaker state per function. The state Prometheus label is one of closed, open, or half-open; the gauge value also encodes the state numerically (0=closed, 1=open, 2=half-open), so either form works for filtering.
# Find all open circuitsironflow_circuit_breaker_state{state="open"} == 1The NATSPublishCircuitOpen alert ships with the Helm chart (deploy/helm/ironflow/templates/ironflow-alerts.yaml) and fires after 2 minutes of continuous open state. Bare-binary and docker-compose deploys must wire this alert themselves.
Resetting a Circuit Breaker
Section titled “Resetting a Circuit Breaker”If you’ve fixed the downstream issue and don’t want to wait for the 60-second timeout, you can manually reset a circuit breaker:
# Reset by endpoint URLironflow circuit-breaker reset https://payments:3000/api/ironflow
# Reset by function IDironflow circuit-breaker reset fn-paymentsThe arg is detected as an endpoint URL if it contains ://, otherwise treated as a function ID.
# The key is the base64url-encoded composite key (function_id|endpoint_url)KEY=$(echo -n "fn-payments|http://payments:3000/api/ironflow" | base64 | tr '+/' '-_' | tr -d '=')# Endpoints with shell-special chars: keep the value quoted as shown above.curl -X POST http://localhost:9123/api/v1/circuit-breakers/$KEY/resetHow Circuit Breakers Interact with Other Features
Section titled “How Circuit Breakers Interact with Other Features”Retry scheduling: When a circuit is open, the scheduler defers retries by 60 seconds instead of attempting them immediately. This prevents wasting retry attempts against a known-failing endpoint.
Cron triggers: Cron-triggered runs are skipped entirely when the circuit for their function’s endpoint is open. The cron scheduler logs a debug message and moves on to the next scheduled time.
Multi-node clusters: Circuit breaker state is shared across all nodes via NATS KV. If Node A discovers an endpoint is failing, Node B learns about it within seconds and stops sending traffic too. This reduces the total number of failed requests across the cluster from N nodes x failure_threshold to just failure_threshold.