Error Handling

Ironflow provides automatic retries for transient failures while allowing you to mark permanent failures that shouldn’t be retried.

Retry Behavior

By default, errors in step functions are retried according to your function’s retry configuration. These are the system defaults — applied when retry is not specified explicitly:

createFunction(
  {
    id: "my-function",
    triggers: [{ event: "my.event" }],
    retry: {
      maxAttempts: 3,        // Maximum retry attempts
      initialDelayMs: 1000,  // Initial delay (1 second)
      backoffFactor: 2.0,    // Exponential backoff multiplier
      maxDelayMs: 300_000,   // Maximum delay (5 minutes)
    },
  },
  async ({ event, step }) => { /* ... */ },
);

Retry delays follow exponential backoff:

First retry: 1 second
Second retry: 2 seconds
Third retry: 4 seconds
(capped at maxDelayMs)

(Derived from initialDelayMs=1000 × backoffFactor=2.0 shown above.)

NonRetryableError

Use NonRetryableError to indicate permanent failures that shouldn’t be retried:

TypeScript
Go

import { NonRetryableError } from "@ironflow/node";

await step.run("validate", async () => {
  if (!isValid(data)) {
    // Won't retry - permanent failure
    throw new NonRetryableError("Invalid input");
  }
  // Regular errors retry automatically
  throw new Error("Temporary failure");
});

import "errors"

ironflow.Run(ctx, "validate", func() (any, error) {
    if !isValid(data) {
        return nil, ironflow.WrapNonRetryable(errors.New("invalid input"))
    }
    return nil, errors.New("temporary failure") // will retry
})

When to use NonRetryableError:

Invalid input data that won’t change on retry
Business logic failures (e.g., insufficient funds)
Authentication/authorization errors
Resource not found errors

When NOT to use NonRetryableError:

Network timeouts
External service temporary failures
Rate limiting (should back off and retry)

Error Types Summary

Error Type	Behavior	Use Case
Regular Error	Retried with backoff	Transient failures
`NonRetryableError`	Not retried	Permanent failures — invalid input, business-rule violations
`StepError`	Thrown by Ironflow when a step’s underlying error propagates out of `step.run`	Catch around step calls to inspect step name + attempt count
`StepTimeoutError`	Not retried beyond function policy	A step exceeded its configured timeout
`TimeoutError`	Subject to function retry policy	Function-level timeout (whole run exceeded its deadline)
`ValidationError` / `SchemaValidationError`	Not retried	Event payload failed schema validation

Use the isRetryable(err) helper (exported from @ironflow/node) to test whether a caught error will be retried by the engine.

Webhook Signature Verification

All requests from Ironflow are signed for security. The SDK verifies signatures automatically when you provide a signing key:

TypeScript
Go

import { serve } from "@ironflow/node";

export const POST = serve({
  functions: [myFunction],
  signingKey: process.env.IRONFLOW_SIGNING_KEY, // Automatic verification
});

Manual signature verification is not yet available in the JS SDK. Use the signingKey option in serve() for automatic verification.

handler := ironflow.Serve(ironflow.ServeConfig{
    Functions:  []ironflow.Function{MyFunction},
    SigningKey: os.Getenv("IRONFLOW_SIGNING_KEY"), // Automatic verification
})

For manual verification:

err := ironflow.VerifySignature(
    payload,
    req.Header.Get("X-Ironflow-Signature"),
    signingKey,
    ironflow.DefaultSignatureTolerance,
)

Signature Header

Ironflow includes the signature in the X-Ironflow-Signature header using HMAC-SHA256.

Development Mode

During local development, you can skip verification:

TypeScript
Go

export const POST = serve({
  functions: [myFunction],
  skipVerification: true, // Only for local development!
});

handler := ironflow.Serve(ironflow.ServeConfig{
    Functions:        []ironflow.Function{MyFunction},
    SkipVerification: true, // Only for local development!
})

Never disable signature verification in production. This protects your endpoints from unauthorized requests.

Global Error Observation (Client onError)

For client-side operations (emitting events, managing runs, KV store, etc.), you can register a global onError handler to observe all errors without wrapping every call in try/catch:

import { createClient } from "@ironflow/node";

const client = createClient({
  onError: async (error, context) => {
    // Send to your logging/metrics system
    await logger.error("Ironflow client error", {
      method: context.method,        // e.g. "emit", "kv.bucket.get"
      endpoint: context.endpoint,    // e.g. "/ironflow.v1.IronflowService/Trigger"
      statusCode: context.statusCode, // HTTP status or undefined for network errors
      error: error.message,
    });
  },
});

Key behaviors:

The handler fires before the error is re-thrown — it never suppresses errors
Async handlers are fully awaited before the error propagates
If the handler itself throws, its error is swallowed (logged to stderr)
Propagates to sub-clients created via client.kv() and client.config()

This is useful for centralized logging, metrics collection, and alerting on client errors. See the @ironflow/node reference for the full API.

onError is for observing client errors. For controlling retry behavior inside functions, use NonRetryableError instead.

Handling Failed Runs

When a run fails after exhausting retries, you can:

Hot Patch: Edit step outputs and resume from a specific step
Investigate: Use the TUI debugger or dashboard to inspect the failure
Fix and Retry: Fix the underlying issue and trigger a new event

See Debugging for more details on investigating and recovering from failures.

What’s Next?

Debugging — Hot patching, TUI debugger, VS Code DAP
API Reference — REST API endpoints