Skip to content

Circuit Breakers

Circuit breakers protect push mode endpoints from cascading failures. When a function’s HTTP endpoint returns repeated errors, the circuit opens to stop sending requests. After a timeout, a single probe request tests recovery. If it succeeds, the circuit closes and normal traffic resumes.

Each push mode function gets its own circuit breaker, keyed by function ID and endpoint URL. This means two functions sharing the same endpoint URL get independent breakers.

┌──────────┐ 5 consecutive ┌──────────┐ 60s elapsed ┌───────────┐
│ CLOSED │───failures─────▶│ OPEN │───────────────▶│ HALF-OPEN │
│ (normal) │ │ (reject) │ │ (probe) │
└──────────┘ └──────────┘ └───────────┘
▲ ▲ │ │
│ success │ failure │ │
└────────────────────────────┴────────────────────────┘ │
success ──────────────┘

States:

StateBehavior
ClosedNormal operation. Requests pass through to the endpoint.
OpenFailing fast. All requests are rejected without calling the endpoint. Runs are deferred for retry.
Half-OpenA single probe request is allowed through. If it succeeds, the circuit closes. If it fails, the circuit re-opens.

Default thresholds:

SettingDefaultDescription
Failure threshold5Consecutive failures before opening the circuit
Success threshold1Consecutive successes in half-open before closing
Timeout60 secondsHow long the circuit stays open before probing

Circuit breaker state is persisted in a NATS KV bucket (SYS_circuit_breakers) with a 24-hour TTL. This means:

  • Restart survival: A restarted node inherits open circuits from before the restart instead of allowing traffic to endpoints that were failing.
  • Cross-node sharing: In a multi-node cluster, when one node opens a circuit, all other nodes learn about it within seconds via NATS KV watch.
  • Rolling deploy safety: State carries across rolling deploys without protection gaps.

If NATS KV is unavailable (embedded NATS not started, test environments), circuit breakers gracefully fall back to in-memory mode with the same behavior as before persistence was added.

The Functions page shows a circuit breaker state badge next to each function. Open circuits show a red badge, half-open shows yellow. Closed circuits show no badge (normal state).

Terminal window
# List all circuit breakers
ironflow circuit-breaker list
# JSON output
ironflow circuit-breaker list --json

Example output:

FUNCTION_ID ENDPOINT STATE FAILS LAST_FAILURE
fn-payments http://payments:3000/api/ironflow open 5 2026-04-06T12:00:00Z
fn-orders http://orders:4000/api/ironflow closed 0 -
Terminal window
# List all breaker states
curl http://localhost:9123/api/v1/circuit-breakers
# Response
[
{
"key": "fn-payments|http://payments:3000/api/ironflow",
"function_id": "fn-payments",
"endpoint": "http://payments:3000/api/ironflow",
"state": "open",
"consecutive_fails": 5,
"last_failure": "2026-04-06T12:00:00Z"
}
]

The ironflow_circuit_breaker_state gauge tracks circuit breaker state per function. The state Prometheus label is one of closed, open, or half-open; the gauge value also encodes the state numerically (0=closed, 1=open, 2=half-open), so either form works for filtering.

# Find all open circuits
ironflow_circuit_breaker_state{state="open"} == 1

The NATSPublishCircuitOpen alert ships with the Helm chart (deploy/helm/ironflow/templates/ironflow-alerts.yaml) and fires after 2 minutes of continuous open state. Bare-binary and docker-compose deploys must wire this alert themselves.

If you’ve fixed the downstream issue and don’t want to wait for the 60-second timeout, you can manually reset a circuit breaker:

Terminal window
# Reset by endpoint URL
ironflow circuit-breaker reset https://payments:3000/api/ironflow
# Reset by function ID
ironflow circuit-breaker reset fn-payments

The arg is detected as an endpoint URL if it contains ://, otherwise treated as a function ID.

Terminal window
# The key is the base64url-encoded composite key (function_id|endpoint_url)
KEY=$(echo -n "fn-payments|http://payments:3000/api/ironflow" | base64 | tr '+/' '-_' | tr -d '=')
# Endpoints with shell-special chars: keep the value quoted as shown above.
curl -X POST http://localhost:9123/api/v1/circuit-breakers/$KEY/reset

How Circuit Breakers Interact with Other Features

Section titled “How Circuit Breakers Interact with Other Features”

Retry scheduling: When a circuit is open, the scheduler defers retries by 60 seconds instead of attempting them immediately. This prevents wasting retry attempts against a known-failing endpoint.

Cron triggers: Cron-triggered runs are skipped entirely when the circuit for their function’s endpoint is open. The cron scheduler logs a debug message and moves on to the next scheduled time.

Multi-node clusters: Circuit breaker state is shared across all nodes via NATS KV. If Node A discovers an endpoint is failing, Node B learns about it within seconds and stops sending traffic too. This reduces the total number of failed requests across the cluster from N nodes x failure_threshold to just failure_threshold.