Error Handling
Ironflow provides automatic retries for transient failures while allowing you to mark permanent failures that shouldn’t be retried.
Retry Behavior
By default, errors in step functions are retried according to your function’s retry configuration. These are the system defaults — applied when retry is not specified explicitly:
createFunction( { id: "my-function", triggers: [{ event: "my.event" }], retry: { maxAttempts: 3, // Maximum retry attempts initialDelayMs: 1000, // Initial delay (1 second) backoffFactor: 2.0, // Exponential backoff multiplier maxDelayMs: 300_000, // Maximum delay (5 minutes) }, }, async ({ event, step }) => { /* ... */ },);Retry delays follow exponential backoff:
- First retry: 1 second
- Second retry: 2 seconds
- Third retry: 4 seconds
- (capped at maxDelayMs)
(Derived from initialDelayMs=1000 × backoffFactor=2.0 shown above.)
NonRetryableError
Use NonRetryableError to indicate permanent failures that shouldn’t be retried:
import { NonRetryableError } from "@ironflow/node";
await step.run("validate", async () => { if (!isValid(data)) { // Won't retry - permanent failure throw new NonRetryableError("Invalid input"); } // Regular errors retry automatically throw new Error("Temporary failure");});import "errors"
ironflow.Run(ctx, "validate", func() (any, error) { if !isValid(data) { return nil, ironflow.WrapNonRetryable(errors.New("invalid input")) } return nil, errors.New("temporary failure") // will retry})When to use NonRetryableError:
- Invalid input data that won’t change on retry
- Business logic failures (e.g., insufficient funds)
- Authentication/authorization errors
- Resource not found errors
When NOT to use NonRetryableError:
- Network timeouts
- External service temporary failures
- Rate limiting (should back off and retry)
Error Types Summary
| Error Type | Behavior | Use Case |
|---|---|---|
| Regular Error | Retried with backoff | Transient failures |
NonRetryableError | Not retried | Permanent failures — invalid input, business-rule violations |
StepError | Thrown by Ironflow when a step’s underlying error propagates out of step.run | Catch around step calls to inspect step name + attempt count |
StepTimeoutError | Not retried beyond function policy | A step exceeded its configured timeout |
TimeoutError | Subject to function retry policy | Function-level timeout (whole run exceeded its deadline) |
ValidationError / SchemaValidationError | Not retried | Event payload failed schema validation |
Use the isRetryable(err) helper (exported from @ironflow/node) to test whether a caught error will be retried by the engine.
Webhook Signature Verification
All requests from Ironflow are signed for security. The SDK verifies signatures automatically when you provide a signing key:
import { serve } from "@ironflow/node";
export const POST = serve({ functions: [myFunction], signingKey: process.env.IRONFLOW_SIGNING_KEY, // Automatic verification});Manual signature verification is not yet available in the JS SDK. Use the signingKey option in serve() for automatic verification.
handler := ironflow.Serve(ironflow.ServeConfig{ Functions: []ironflow.Function{MyFunction}, SigningKey: os.Getenv("IRONFLOW_SIGNING_KEY"), // Automatic verification})For manual verification:
err := ironflow.VerifySignature( payload, req.Header.Get("X-Ironflow-Signature"), signingKey, ironflow.DefaultSignatureTolerance,)Signature Header
Ironflow includes the signature in the X-Ironflow-Signature header using HMAC-SHA256.
Development Mode
During local development, you can skip verification:
export const POST = serve({ functions: [myFunction], skipVerification: true, // Only for local development!});handler := ironflow.Serve(ironflow.ServeConfig{ Functions: []ironflow.Function{MyFunction}, SkipVerification: true, // Only for local development!})Never disable signature verification in production. This protects your endpoints from unauthorized requests.
Global Error Observation (Client onError)
For client-side operations (emitting events, managing runs, KV store, etc.), you can register a global onError handler to observe all errors without wrapping every call in try/catch:
import { createClient } from "@ironflow/node";
const client = createClient({ onError: async (error, context) => { // Send to your logging/metrics system await logger.error("Ironflow client error", { method: context.method, // e.g. "emit", "kv.bucket.get" endpoint: context.endpoint, // e.g. "/ironflow.v1.IronflowService/Trigger" statusCode: context.statusCode, // HTTP status or undefined for network errors error: error.message, }); },});Key behaviors:
- The handler fires before the error is re-thrown — it never suppresses errors
- Async handlers are fully awaited before the error propagates
- If the handler itself throws, its error is swallowed (logged to stderr)
- Propagates to sub-clients created via
client.kv()andclient.config()
This is useful for centralized logging, metrics collection, and alerting on client errors. See the @ironflow/node reference for the full API.
onError is for observing client errors. For controlling retry behavior inside functions, use NonRetryableError instead.
Handling Failed Runs
When a run fails after exhausting retries, you can:
- Hot Patch: Edit step outputs and resume from a specific step
- Investigate: Use the TUI debugger or dashboard to inspect the failure
- Fix and Retry: Fix the underlying issue and trigger a new event
See Debugging for more details on investigating and recovering from failures.
What’s Next?
- Debugging — Hot patching, TUI debugger, VS Code DAP
- API Reference — REST API endpoints