Agent Stack Comparison

Last verified: April 2026. Vendor facts decay — please file an issue if anything below is out of date.

Most “AI agent” tools answer one of two questions: how should the agent think? or where should the agent live? Reasoning frameworks answer the first. Runtimes answer the second. Ironflow is a runtime — npm install @ironflow/langgraph and your LangGraph agent gets durable execution, event-sourced memory, scoped injection, and a built-in MCP server without rewriting a line of graph logic.

The 2x2 below maps the landscape on two axes that matter once your agent leaves your laptop.

The map

                    framework owns the loop (graphs, prompts, tools)
                             ▲
                             │
                  LangGraph ●│● CrewAI
              Claude Agent SDK│● LlamaIndex
                             │
                             │● AWS Step Functions
                             │
   in-memory ────────────────┼──────────────── event-sourced
   (state lost on            │           (state survives crash,
    crash, no replay)        │            replayable, auditable)
                             │
            Anthropic raw ●  │
              OpenAI raw  ●  │● Temporal
                             │● Hatchet
                             │● Inngest
                             │
                             │  ●●●  IRONFLOW
                             │       (host any reasoning layer
                             │        on durable substrate)
                             │
                             ▼
                         you own the loop (call LLM yourself)

Top half: the framework drives — its graph, its prompts, its tool-calling format. You write nodes; the framework decides what runs next.

Bottom half: you drive — explicit while loop, you call the model, you decide what to do with the output.

Left half: state lives in the process. Crash and the agent forgets everything — including which tool calls already ran.

Right half: every step is an immutable recorded fact in a durable store. Crash, redeploy, restart from cached step. Audit later. Replay for debugging.

At a glance

	Ironflow	LangGraph	Claude Agent SDK	CrewAI	LlamaIndex	Anthropic raw	OpenAI raw	Temporal	Inngest	Hatchet	AWS Step Functions
Primary role	Runtime (substrate)	Reasoning framework	Reasoning framework	Reasoning framework	RAG-shaped framework	Provider SDK	Provider SDK	Workflow engine	Workflow engine	Workflow engine	Workflow engine
Hosts other frameworks	Yes — by design	No	No	No	No	N/A	N/A	Generic, not agent-shaped	Generic, not agent-shaped	Generic, not agent-shaped	Generic, not agent-shaped
Durable steps	Native	Via checkpointer add-on	No (compaction-based context)	No	No	No	No	Yes	Yes	Yes	Yes
Event-sourced memory	Native	No	No	No	No	No	No	Workflow history only	Function history only	Execution event log only	Execution history only
Scoped injection (pause/edit/resume)	Native	No	No	No	No	No	No	Signals only	No	No	Manual restart
MCP server, built-in	Yes	No	Hosts MCP clients	No	No	N/A	N/A	No	No	No	No
Sub-agents (parent ↔ child runs)	Native	Subgraphs (in-memory)	Spawns	Crew of agents	No	No	No	Child workflows	Function fan-out	Child task spawning	Step Functions nesting
Human-in-the-loop wait	`step.waitForEvent`, hours/days	Interrupt + manual resume	No	No	No	No	No	Signals + timers	`step.waitForEvent`	Events + durable wait	Wait-for-token
Single binary, zero infra	Yes	N/A (library)	N/A (library)	N/A (library)	N/A (library)	N/A (API client)	N/A (API client)	No (cluster)	Cloud-managed	Server binary + PG required	Cloud-managed
Self-hostable	Yes	N/A	N/A	N/A	N/A	No	No	Yes (cluster)	Limited	Yes (PG required)	No
License	FSL-1.1-ALv2 (Apache 2.0 after 2 years)	MIT	MIT	MIT	MIT	Proprietary	Proprietary	MIT	Proprietary	MIT	Proprietary

Ironflow is the only row whose primary role is to host other rows. That’s the pitch.

vs. LangGraph

LangGraph is the reasoning framework Ironflow recommends pairing with. Use both.

LangGraph’s checkpointer protocol is pluggable — by default it’s in-memory or Postgres. The Ironflow checkpointer (@ironflow/langgraph) writes graph state to an event-sourced entity stream keyed by irn:agent-ckpt:{thread_id}. Result: your LangGraph cycles, conditionals, and tool nodes survive crashes, redeploys, and human-approval gaps without changing graph code.

import { StateGraph } from "@langchain/langgraph";
import { createClient } from "@ironflow/node";
import { IronflowSaver } from "@ironflow/langgraph";

const client = createClient({ serverUrl: process.env.IRONFLOW_SERVER_URL });
const saver = new IronflowSaver({ client });

const graph = new StateGraph(...)
  .addNode("plan", planNode)
  .addNode("act", actNode)
  .addEdge("plan", "act")
  .compile({ checkpointer: saver });

What you get on top of LangGraph:

Crash-resume from last cached node
Time-travel through every state transition
Pause at any node, inspect or edit state, resume
Sub-agent spawning with parent run linkage
MCP server for free — your LangGraph tools become MCP tools

Not a LangGraph replacement. LangGraph still owns the graph, the prompts, the routing logic.

vs. Claude Agent SDK

The Claude Agent SDK gives you a polished agent loop with first-class compaction, MCP client integration, and tool calling tuned for Anthropic models. It assumes your agent runs in a single process and that context compaction is the right answer to long-running state.

That assumption breaks when:

The process dies mid-tool-call
A human approval needs to hold for hours
You need an audit trail for compliance
A second user wants to inspect what the agent did

Ironflow runs underneath. The Claude SDK keeps owning prompts, tools, and the loop. Ironflow’s agent() wrapper turns each tool call into a memoized step.run and each loop turn into an event in the entity stream. Crash mid-flight; resume; tool calls don’t replay.

Choose Claude Agent SDK alone for short-lived agents in a single process where context compaction is enough. Choose Claude Agent SDK + Ironflow the moment you need crash-survival, human-approval gaps, audit, or multi-process inspection.

vs. CrewAI

CrewAI gives you a multi-agent orchestration model — agents with roles, tasks, and a process that wires them together. It’s opinionated and quick to demo.

CrewAI assumes the crew runs in one process for one execution. Ironflow doesn’t replace the crew — the crew still owns roles and task assignment. Ironflow makes the crew durable: each agent’s tool call is a step, the conversation is an entity stream, and the crew can be paused, inspected, and resumed.

Pairing pattern: each Agent in the crew becomes an Ironflow function; the Crew itself becomes the parent run that spawns sub-agents.

vs. LlamaIndex

LlamaIndex is RAG-shaped — its strongest abstractions are around indexing documents, retrieval, and query engines. Its agent framework is built atop those primitives.

If your agent’s primary loop is “retrieve, augment, generate,” LlamaIndex’s tools are excellent. Ironflow underneath gives you durable retrieval steps (no re-embedding on crash), event-sourced query history (so you can debug “why did the agent pull that wrong doc?”), and human-approval gates on retrieval results.

vs. Anthropic raw / OpenAI raw

The provider SDKs are minimal: send messages, get completions, parse tool calls, repeat. Most “I built an agent” demos are 80 lines of while (notDone) { messages.push(...); response = await client.messages.create(...); }.

That’s the loop you own. Ironflow doesn’t take it. agent() wraps your loop so each iteration becomes a durable step. You still write the while. You still call client.messages.create. The wrapper turns tool calls into memoized steps and the conversation into an entity stream.

const myAgent = agent(async ({ step, llm, tool }) => {
  const messages = [];
  while (true) {
    const response = await llm.complete({ messages, tools }); // memoized
    if (!response.toolCalls?.length) return response.content;
    for (const call of response.toolCalls) {
      const result = await tool(call.name, call.input); // memoized
      messages.push({ role: "tool", content: result });
    }
  }
});

Crash on iteration 7? Resume on iteration 7. The first 6 LLM calls don’t replay.

vs. Temporal

Temporal is a workflow engine, not an agent framework. You can absolutely build agents on Temporal — many do. The cost is vocabulary mismatch (workflows, activities, signals — not turns, tools, memory) and the operational tax of running a Temporal cluster (server + DB + worker fleet + UI).

Ironflow is shaped for agents from the API down: agent(), tool(), llm(), approve(), memory(). The substrate is similar (durable execution, replay) but the surface is agent-native and the deployment is one binary.

Choose Temporal for massive distributed scale and existing Temporal investment. Choose Ironflow for agent-native API, single-binary deploy, and built-in MCP / scoped injection.

vs. Hatchet

Hatchet is “Temporal, but Postgres-only and MIT.” Same shape — workflow engine, durable execution, pull-based workers, dashboard replay — with operationally simpler infrastructure. They market AI agents as use case #1 on the landing page, but the SDK has no agent-specific surface: tasks, workers, retries, durable sleeps. Agents are tasks that happen to call LLMs.

The agent friction is the same as Temporal’s, with one wrinkle better and one wrinkle worse:

Better than Temporal: PostgreSQL is the only dependency — no server cluster, no separate broker. Ops surface for self-hosting is one Postgres instance + the Hatchet binary. 100% MIT license.
Same as Temporal: Vocabulary mismatch (tasks/workers/durable-sleeps, not turns/tools/memory). No agent-native primitives. No MCP server. No scoped injection or hot-patching. The reasoning framework still has to map onto generic workflow primitives.
Worse than Temporal: Smaller community (~7K vs ~12K stars). Less battle-testing at Fortune 500 scale.

Ironflow’s agent-native API (agent() / tool() / llm() / approve() / memory()) and built-in MCP server are positioned exactly at this gap. Hatchet’s substrate is sound; the surface is generic. The same way Ironflow hosts LangGraph and Claude Agent SDK on top of its substrate, you could in principle build an agent() wrapper on top of Hatchet — but you’d be reinventing the surface that Ironflow ships natively.

Choose Hatchet if you want Temporal’s durability story with simpler ops (PG-only) and MIT licensing for plain task workloads. Choose Ironflow if you want agent-native primitives, built-in MCP, scoped injection, and history navigation — without rebuilding them on top of a generic workflow engine.

vs. Inngest

Inngest is event-driven serverless workflows — excellent for triggered functions on Vercel/Next.js. Their agent story is grafted on top.

Ironflow’s event-sourcing goes deeper: events are permanent recorded facts, not ephemeral triggers. Inngest’s events disappear after delivery; Ironflow’s events power projections, entity timelines, and time-travel debugging.

For agent use cases, the practical difference: Inngest treats each LLM call as a function invocation in a fan-out graph. Ironflow treats the agent as one durable run with a memoized loop and an event-sourced conversation.

vs. AWS Step Functions

AWS Step Functions is a state-machine workflow engine for AWS. You define states in JSON, transitions in JSON, and Lambdas execute the work. It’s durable and mature for non-agent workflows.

For agents the friction is real: agent loops aren’t naturally state machines (the next step depends on LLM output, not a deterministic edge), JSON-defined transitions don’t fit dynamic tool selection, and there’s no agent-native vocabulary.

Ironflow is procedural in your language of choice (TS or Go), with the same durability guarantees and dramatically less ceremony.

When to use Ironflow alone

Ironflow ships an agent() API with tool(), llm(), approve(), memory(), and spawn() in both Node (@ironflow/node/agent) and Go (sdk/go/ironflow/agent). If your agent loop is straightforward (call model, run tool, repeat, with human approval somewhere), you don’t need a reasoning framework on top.

import { agent } from "@ironflow/node/agent";

export const reviewAgent = agent(
  { id: "code-review" },
  async ({ step, tool, llm, approve }) => {
    const diff = await tool("fetch-diff", { pr: step.input.pr });
    const findings = await llm.complete({
      messages: [{ role: "user", content: `Review:\n${diff}` }],
    });
    const approved = await approve({ ttl: "24h", payload: findings });
    if (approved)
      await tool("post-comment", { pr: step.input.pr, body: findings });
    return { approved };
  },
);

Crash mid-LLM call: tool result is cached. Crash mid-approval-wait: process dies, agent resumes when restarted, approval still pending.

Same surface in Go:

import (
 "time"

 "github.com/sahina/ironflow/sdk/go/ironflow"
 "github.com/sahina/ironflow/sdk/go/ironflow/agent"
)

var ReviewAgent = agent.Agent(agent.AgentConfig{
 Function: ironflow.FunctionConfig{ID: "code-review"},
 Tools:    []agent.ToolEntry{fetchDiff.Entry()},
}, func(ctx agent.Context) (any, error) {
 diff, _ := agent.Tool(ctx, fetchDiff, fetchInput{PR: prNumber})
 res, _  := agent.LLM(ctx, agent.LLMCompleteRequest{
  Messages: []agent.LLMMessage{{Role: "user", Content: diff.Diff}},
  Call:     func() (agent.LLMCompleteResult, error) { return providerComplete(diff.Diff) },
 })
 approval, _ := agent.Approve[any](ctx, "ship-it", agent.ApproveOptions[any]{TTL: 24 * time.Hour})
 return map[string]any{"approved": approval.Approved}, nil
})

When to use Ironflow with a reasoning framework

The moment your loop has nontrivial branching, multi-agent orchestration, or graph-shaped reasoning, drop in LangGraph, the Claude Agent SDK, or CrewAI on top. Ironflow’s job is to be invisible — durable execution, memory, MCP, audit. The framework’s job is to think.

  ┌──────────────────────────────────────────┐
  │  REASONING LAYER                         │
  │  LangGraph / Claude SDK / CrewAI / yours │
  └──────────────────────────────────────────┘
                    │
                    ▼
  ┌──────────────────────────────────────────┐
  │  IRONFLOW RUNTIME                        │
  │  durable steps · event log · MCP server  │
  │  pause/resume · sub-agents · replay      │
  └──────────────────────────────────────────┘
                    │
                    ▼
            SQLite or Postgres + NATS
            (one binary, zero infra)

See it survive a crash

Talking about durability is cheap. The Survive a Crash tutorial walks through a 3-step agent, a kill -9 mid-pipeline, and a recovery from cached steps — runnable in two terminals.

make demo-agent-crash-resume

That target runs the literal tutorial script: starts the server, runs the doc-processor agent, kills the worker mid-OCR, restarts it, and asserts the projection caught the recovered run.

Companion examples:

code-review-agent — adds llm() + approve() for human-in-the-loop gates. The approval can hold for hours across worker restarts.
doc-processor-agent — minimal agent() + memory() example used by the tutorial.
ai-agent — multi-step research with parallel search and byArgs idempotent tool calls.
doc-processor-agent/web — Next.js browser UI driving ironflow.agents.invoke() + agents.subscribe() + agents.readMemory() for live YC-style demos.

Browser-driven agents

The @ironflow/browser SDK exposes ironflow.agents.invoke(), ironflow.agents.subscribe(), and ironflow.agents.readMemory() for browser-driven agent UIs. Same agent function, two surfaces: server-side workers and browser-driven UIs run against one runtime. readMemory is a typed read over the agent’s memory projection with optional read-your-writes catchup via minSeq. See the browser SDK Agents section and the agents spec.

Summary

The agent landscape splits into three camps: reasoning frameworks (LangGraph, Claude SDK, CrewAI, LlamaIndex), provider SDKs (Anthropic, OpenAI), and generic workflow engines (Temporal, Inngest, Step Functions). Each treats agents as either “state machines you happen to fill with LLM calls” or “loops you scaffold yourself.”

Ironflow treats agents as a first-class workload — durable steps, event-sourced memory, MCP-native, scoped-injectable — and explicitly hosts the reasoning layer above it. Bring your framework. Or skip it. Either way, your agent gets a runtime that survives.