Skip to content

Projections

Projections build read models from the event stream. A projection subscribes to a set of events, and a handler either maintains derived state (managed mode) or performs side effects (external mode). This document covers the two modes, then focuses on the contract every managed reducer must honor.

A natural-but-incorrect intuition is “a projection reads from an entity stream, so entity snapshots should speed it up.” The actual model is different: projections subscribe to the global ordered event log (the Postgres events table in nats_seq order), usually filtered by event name, and fold across many entity streams into a cross-cutting view. Entity-stream snapshots are the wrong input artifact for projection rebuild — they answer a different question.

Events land on per-entity streams AND the global log

Section titled “Events land on per-entity streams AND the global log”

Each write goes to two places: its entity stream (for aggregate rehydration) and the global events table (ordered by nats_seq).

Entity streams (write side) Global event log (events table)
─────────────────────────── ──────────────────────────────
nats_seq │ entity │ event
Order-123 stream: ─────────┼─────────────┼──────────────
v1 OrderPlaced 1 │ Order-123 │ OrderPlaced
v2 ItemAdded 2 │ User-42 │ UserSignedUp
v3 OrderShipped 3 │ Order-123 │ ItemAdded
4 │ Order-124 │ OrderPlaced
Order-124 stream: 5 │ Order-123 │ OrderShipped
v1 OrderPlaced 6 │ Order-124 │ OrderShipped
v2 OrderShipped 7 │ User-42 │ EmailChanged
8 │ Order-125 │ OrderPlaced
User-42 stream: ...
v1 UserSignedUp
v2 EmailChanged

Entity-stream view = vertical slice per entity. Global log view = horizontal timeline of everything.

Aggregate load uses the entity stream (+ its snapshot)

Section titled “Aggregate load uses the entity stream (+ its snapshot)”
Load Order-123 state:
┌──────────────────────────────┐
│ entity_stream_snapshots │ snapshot at v2 → { status:"open", items:[x] }
│ Order-123 @ v2 │ ──┐
└──────────────────────────────┘ │
replay tail: v3 OrderShipped
{ status:"shipped", items:[x] }

One entity. Its snapshot. Its events. Clean.

Projections read the global log, not one stream

Section titled “Projections read the global log, not one stream”

A projection like OrdersByDay doesn’t know about Order-123 specifically. It subscribes to the global log, filtered by event name:

Global log in nats_seq order:
1 OrderPlaced (Order-123) ──┐
2 UserSignedUp (User-42) │ skipped (filter: OrderPlaced only)
3 ItemAdded (Order-123) │ skipped
4 OrderPlaced (Order-124) ──┤
5 OrderShipped (Order-123) │ skipped
6 OrderShipped (Order-124) │ skipped
7 EmailChanged (User-42) │ skipped
8 OrderPlaced (Order-125) ──┤
┌─────────────────────────────┐
│ Projection: OrdersByDay │
│ reducer(state, OrderPlaced) │
│ → state[day]++ │
└─────────────────────────────┘
Map {
"2026-04-20": 2, ← Order-123, Order-124
"2026-04-21": 1 ← Order-125
}

The projection crossed THREE entity streams (Order-123, Order-124, Order-125) to produce one cross-cutting view. Its state is NOT any entity’s state.

Why entity snapshots don’t help projections

Section titled “Why entity snapshots don’t help projections”
Order-123 snapshot: Projection state:
{ status:"shipped", Map<date, count>
items:[x], "2026-04-20": 2
total:100 } "2026-04-21": 1
▲ ▲
│ │
└──────── DIFFERENT ─────────────┘
SHAPE
DIFFERENT REDUCER
DIFFERENT QUESTION ANSWERED

No arithmetic on the snapshot produces the projection’s map. The snapshot doesn’t remember “which day this order was placed on” — it only remembers current status. The projection needs the raw OrderPlaced events, in order, across all orders.

Single-entity projections still need the log

Section titled “Single-entity projections still need the log”

Even when a projection happens to scope to one entity (e.g. OrderHistoryView for Order-123):

Order-123 snapshot: { status:"shipped", items:[x], total:100 }
Projection "OrderHistoryView" wants:
[
{ at: "2026-04-20T10:00", what: "OrderPlaced" },
{ at: "2026-04-20T10:05", what: "ItemAdded" },
{ at: "2026-04-20T10:30", what: "OrderShipped"}
]
Snapshot has no history. It has one moment. Projection needs the timeline.
  • Entity streams answer “what is the state of THIS entity right now” → use entity snapshots.
  • Projections answer “what is the derived/aggregated view across MANY entities over TIME” → must read the event log. Entity snapshots are the wrong input artifact.
  • This matches the canonical CQRS/ES pattern (EventStoreDB $all, Axon TrackingEventProcessor, Marten, Kafka/Flink materialized views): commands → entity streams → global ordered log → projections subscribe + fold.

Managed mode — the handler is a pure reducer: (state, event) -> newState. Ironflow stores the state, routes reads (getProjection, subscribeToProjection), and owns the cursor. Use managed mode for anything a UI or API reads.

External mode — the handler performs side effects (send email, call webhook, write to an external DB). Ironflow tracks the cursor but stores no state. Return value is void. Use external mode whenever the effect of processing an event is not “update some state I want Ironflow to hold.”

Mode is auto-detected from whether an initialState factory is provided, or set explicitly via mode: "managed" | "external".

Ironflow applies managed reducers under at-least-once delivery. Under #486, the rebuild replay (reading from the Postgres events table) and the live NATS tail may both invoke your reducer for the same event during the short overlap window at rebuild completion, and across node failover and retry paths the same event can hit your reducer any number of times. Correctness depends on the reducer being deterministic and idempotent. No runtime enforcement catches a violation; it surfaces as read models that disagree between nodes, or that diverge from the event stream after a rebuild.

A managed reducer must satisfy all four rules below.

Same (state, event) must produce the same newState, every invocation. Ironflow may call your reducer twice with identical arguments and expect identical output.

Banned inside a managed handler:

  • Clockstime.Now(), Date.now(), new Date() with no args, performance.now(). The event carries event.timestamp; use that.
  • RandomnessMath.random(), crypto.randomUUID(), uuid() in Go. If you need an ID, derive it from the event (event.id, event.data.orderId, etc.).
  • Environment readsos.Getenv, process.env.*, reading from a file, reading from a DB.
// WRONG — non-deterministic: rebuild produces different state than live
handler: (state, event) => ({ ...state, lastSeen: Date.now() })
// RIGHT — deterministic: derive from the event
handler: (state, event) => ({ ...state, lastSeen: event.timestamp })

No network calls, no DB writes, no console.log side effects. If you need a side effect, split the projection: a managed projection for the state, an external projection for the effect.

// WRONG — side effect in a managed handler
handler: async (state, event) => {
await sendSlackNotification(event);
return { ...state, count: state.count + 1 };
}
// RIGHT — split
const stats = createProjection({ /* managed, pure */ });
const notifier = createProjection({ mode: "external", handler: async (e) => sendSlackNotification(e) });

Return a fresh state object. Do not rely on the caller’s copy being isolated from anything you retain. The Go SDK performs a JSON deep-copy of state before each invocation (#486 I3) — this neutralizes the sharpest aliasing hazards, but it does not make in-place mutation safe as a pattern, and it does not apply to the JS SDK.

// WRONG — mutates and returns the argument; leaks if runner ever retains the ref
handler: (state, event) => { state.count += 1; return state; }
// RIGHT — return a new object
handler: (state, event) => ({ ...state, count: state.count + 1 })

The same event may be applied multiple times: during the PG-rebuild vs. live-tail overlap window (#486), after a NATS re-delivery, after a node failover. Handler output must be invariant under replay.

  • Counters: prefer state.ordersByID[id] = order (set-like) over state.count += 1 (accumulator). The set-like form survives replay; the accumulator over-counts.
  • If you must accumulate, key the accumulation to the event: if (!state.seen[event.id]) { state.count += 1; state.seen[event.id] = true }.
// WRONG — double-counts on replay
handler: (state, event) => ({ ...state, total: state.total + event.data.amount })
// RIGHT — idempotent via event ID
handler: (state, event) => {
if (state.applied[event.id]) return state;
return {
...state,
applied: { ...state.applied, [event.id]: true },
total: state.total + event.data.amount,
};
}

For most business cases the cleaner fix is a keyed map (state.orders[event.data.orderId] = ...) — replay then overwrites with the same value and is naturally idempotent.

Multi-node deployments running the same projection on overlapping partitions can lose updates on projection_state. Ironflow does not currently guard those writes with optimistic concurrency (CAS on a per-row version counter); the UPSERT only enforces a monotonic last_event_seq. Two nodes reading the same cursor and writing different events race like this:

node-A reads state(cursor=9)
node-B reads state(cursor=9)
node-A applies E10, writes state(cursor=10)
node-B applies E11, writes state(cursor=11) ← E10's state contribution lost

Rule 4 (idempotency) does not prevent this race. Rule 4 keeps replay of the same event safe; the race above loses a different event (E10) by advancing last_event_seq past it, so E10 is never re-applied. Both reducer shapes are affected — projection_state is written as a full blob, so the final UPSERT overwrites everything the other node contributed (whichever event the final writer didn’t see is lost, keyed-map or accumulator alike).

Rule 4 is still required; it’s orthogonal to this race and covers replay and redelivery.

The only mitigation available today is to pin the projection to a single runner so concurrent reads of the same cursor cannot happen.

CAS-style row-level guards on projection_state (bump a per-row version counter and fail the UPSERT on stale version) are the intended long-term fix, tracked in #504; not implemented today.

External handlers have side effects; they cannot be “pure” in the reducer sense. However, they still run under at-least-once delivery, so the side effect itself must be idempotent. Use idempotency keys on the external API (see anti-pattern 8), or look up the external resource before writing.

Rebuilds ship with metrics, trace spans, and environment-variable knobs so operators can watch progress and shed database load (issue #495).

Metrics (emitted only while a rebuild is in flight):

  • ironflow_projection_rebuild_active{name, env} — gauge, 1 per projection currently rebuilding. Sourced from a scrape-time collector against projection_registry, so the gauge stays consistent across process crashes and cluster-wide handoffs.
  • ironflow_projection_rebuild_events_applied_total{name, env, phase} — counter incremented after each successful PG-source batch. Use rate() to compute events/sec.
  • ironflow_projection_rebuild_duration_seconds{name, result} — histogram observed on terminal transition, labelled with result ∈ {completed, cancelled, failed}.

Tunables (read once at startup — restart to change):

VariableSQLite defaultPostgres defaultMeaning
IRONFLOW_REBUILD_BATCH_SIZE_SQLITE500Events per PG pull during rebuild
IRONFLOW_REBUILD_BATCH_SIZE_POSTGRES1000Events per PG pull during rebuild
IRONFLOW_REBUILD_BATCH_PAUSE_MS00Inter-batch sleep; only applied to full batches

During rebuild, the server overrides the SDK-supplied batch_size with the tunable so throughput stays deterministic (batch_size / pause_ms) regardless of SDK configuration. Outside rebuild, the SDK value is used as before.

On an unrecoverable PG error during a rebuild (e.g. ListEventsByNatsSeq fails), the PG branch of pullForProjection calls FailRebuild — rebuild markers cleared, status flipped to error, error_message populated. No retry loop; the operator re-triggers manually once the underlying issue is fixed.

See runbook-projection-rebuild.md for PromQL examples, alert recipes, and a walk-through of “rebuild stuck” triage.

The SDK projection runner prefers the StreamProjectionEvents RPC (long-lived SSE/ConnectRPC server stream) over PollProjectionEvents. Streams can sit idle for minutes in quiet environments, which used to cause intermediaries (proxies, LBs, or Node’s default fetch socket idle timer) to close the connection — producing spurious TypeError: fetch failed reconnect logs every ~60–90s (issue #550).

When the client sets accept_heartbeats = true on the request (the @ironflow/node runner does so by default), the server emits periodic ProjectionEvent frames with kind = PROJECTION_EVENT_KIND_HEARTBEAT every ~15 seconds. The frames carry no payload and are silently dropped by the client; they exist only to keep the socket warm. Clients that don’t opt in see no heartbeats — back-compat for older SDK versions that don’t understand the kind field.

  • PR #486 — PG-backed projection rebuild (the concurrency model that makes determinism load-bearing).
  • Issue #495 — rebuild observability & tunables.
  • Issue #550 — SSE keepalive pings on projection streams.
  • Event sourcing — how entity streams relate to projections.
  • Anti-pattern 3 and 3a/3b in .agents/skills/ironflow-docs/anti-patterns.md.