Projections
Projections build read models from the event stream. A projection subscribes to a set of events, and a handler either maintains derived state (managed mode) or performs side effects (external mode). This document covers the two modes, then focuses on the contract every managed reducer must honor.
Projections read the global event log
Section titled “Projections read the global event log”A natural-but-incorrect intuition is “a projection reads from an entity stream, so entity snapshots should speed it up.” The actual model is different: projections subscribe to the global ordered event log (the Postgres events table in nats_seq order), usually filtered by event name, and fold across many entity streams into a cross-cutting view. Entity-stream snapshots are the wrong input artifact for projection rebuild — they answer a different question.
Events land on per-entity streams AND the global log
Section titled “Events land on per-entity streams AND the global log”Each write goes to two places: its entity stream (for aggregate rehydration) and the global events table (ordered by nats_seq).
Entity streams (write side) Global event log (events table)─────────────────────────── ────────────────────────────── nats_seq │ entity │ eventOrder-123 stream: ─────────┼─────────────┼────────────── v1 OrderPlaced 1 │ Order-123 │ OrderPlaced v2 ItemAdded 2 │ User-42 │ UserSignedUp v3 OrderShipped 3 │ Order-123 │ ItemAdded 4 │ Order-124 │ OrderPlacedOrder-124 stream: 5 │ Order-123 │ OrderShipped v1 OrderPlaced 6 │ Order-124 │ OrderShipped v2 OrderShipped 7 │ User-42 │ EmailChanged 8 │ Order-125 │ OrderPlacedUser-42 stream: ... v1 UserSignedUp v2 EmailChangedEntity-stream view = vertical slice per entity. Global log view = horizontal timeline of everything.
Aggregate load uses the entity stream (+ its snapshot)
Section titled “Aggregate load uses the entity stream (+ its snapshot)”Load Order-123 state: ┌──────────────────────────────┐ │ entity_stream_snapshots │ snapshot at v2 → { status:"open", items:[x] } │ Order-123 @ v2 │ ──┐ └──────────────────────────────┘ │ ▼ replay tail: v3 OrderShipped │ ▼ { status:"shipped", items:[x] }One entity. Its snapshot. Its events. Clean.
Projections read the global log, not one stream
Section titled “Projections read the global log, not one stream”A projection like OrdersByDay doesn’t know about Order-123 specifically. It subscribes to the global log, filtered by event name:
Global log in nats_seq order: 1 OrderPlaced (Order-123) ──┐ 2 UserSignedUp (User-42) │ skipped (filter: OrderPlaced only) 3 ItemAdded (Order-123) │ skipped 4 OrderPlaced (Order-124) ──┤ 5 OrderShipped (Order-123) │ skipped 6 OrderShipped (Order-124) │ skipped 7 EmailChanged (User-42) │ skipped 8 OrderPlaced (Order-125) ──┤ │ ▼ ┌─────────────────────────────┐ │ Projection: OrdersByDay │ │ reducer(state, OrderPlaced) │ │ → state[day]++ │ └─────────────────────────────┘ │ ▼ Map { "2026-04-20": 2, ← Order-123, Order-124 "2026-04-21": 1 ← Order-125 }The projection crossed THREE entity streams (Order-123, Order-124, Order-125) to produce one cross-cutting view. Its state is NOT any entity’s state.
Why entity snapshots don’t help projections
Section titled “Why entity snapshots don’t help projections”Order-123 snapshot: Projection state: { status:"shipped", Map<date, count> items:[x], "2026-04-20": 2 total:100 } "2026-04-21": 1
▲ ▲ │ │ └──────── DIFFERENT ─────────────┘ SHAPE DIFFERENT REDUCER DIFFERENT QUESTION ANSWEREDNo arithmetic on the snapshot produces the projection’s map. The snapshot doesn’t remember “which day this order was placed on” — it only remembers current status. The projection needs the raw OrderPlaced events, in order, across all orders.
Single-entity projections still need the log
Section titled “Single-entity projections still need the log”Even when a projection happens to scope to one entity (e.g. OrderHistoryView for Order-123):
Order-123 snapshot: { status:"shipped", items:[x], total:100 }
Projection "OrderHistoryView" wants: [ { at: "2026-04-20T10:00", what: "OrderPlaced" }, { at: "2026-04-20T10:05", what: "ItemAdded" }, { at: "2026-04-20T10:30", what: "OrderShipped"} ]
Snapshot has no history. It has one moment. Projection needs the timeline.Takeaway
Section titled “Takeaway”- Entity streams answer “what is the state of THIS entity right now” → use entity snapshots.
- Projections answer “what is the derived/aggregated view across MANY entities over TIME” → must read the event log. Entity snapshots are the wrong input artifact.
- This matches the canonical CQRS/ES pattern (EventStoreDB
$all, AxonTrackingEventProcessor, Marten, Kafka/Flink materialized views): commands → entity streams → global ordered log → projections subscribe + fold.
Managed vs external mode
Section titled “Managed vs external mode”Managed mode — the handler is a pure reducer: (state, event) -> newState. Ironflow stores the state, routes reads (getProjection, subscribeToProjection), and owns the cursor. Use managed mode for anything a UI or API reads.
External mode — the handler performs side effects (send email, call webhook, write to an external DB). Ironflow tracks the cursor but stores no state. Return value is void. Use external mode whenever the effect of processing an event is not “update some state I want Ironflow to hold.”
Mode is auto-detected from whether an initialState factory is provided, or set explicitly via mode: "managed" | "external".
Reducer Contract (managed mode)
Section titled “Reducer Contract (managed mode)”Ironflow applies managed reducers under at-least-once delivery. Under #486, the rebuild replay (reading from the Postgres events table) and the live NATS tail may both invoke your reducer for the same event during the short overlap window at rebuild completion, and across node failover and retry paths the same event can hit your reducer any number of times. Correctness depends on the reducer being deterministic and idempotent. No runtime enforcement catches a violation; it surfaces as read models that disagree between nodes, or that diverge from the event stream after a rebuild.
A managed reducer must satisfy all four rules below.
1. Deterministic
Section titled “1. Deterministic”Same (state, event) must produce the same newState, every invocation. Ironflow may call your reducer twice with identical arguments and expect identical output.
Banned inside a managed handler:
- Clocks —
time.Now(),Date.now(),new Date()with no args,performance.now(). The event carriesevent.timestamp; use that. - Randomness —
Math.random(),crypto.randomUUID(),uuid()in Go. If you need an ID, derive it from the event (event.id,event.data.orderId, etc.). - Environment reads —
os.Getenv,process.env.*, reading from a file, reading from a DB.
// WRONG — non-deterministic: rebuild produces different state than livehandler: (state, event) => ({ ...state, lastSeen: Date.now() })
// RIGHT — deterministic: derive from the eventhandler: (state, event) => ({ ...state, lastSeen: event.timestamp })2. Pure
Section titled “2. Pure”No network calls, no DB writes, no console.log side effects. If you need a side effect, split the projection: a managed projection for the state, an external projection for the effect.
// WRONG — side effect in a managed handlerhandler: async (state, event) => { await sendSlackNotification(event); return { ...state, count: state.count + 1 };}
// RIGHT — splitconst stats = createProjection({ /* managed, pure */ });const notifier = createProjection({ mode: "external", handler: async (e) => sendSlackNotification(e) });3. Aliasing-safe
Section titled “3. Aliasing-safe”Return a fresh state object. Do not rely on the caller’s copy being isolated from anything you retain. The Go SDK performs a JSON deep-copy of state before each invocation (#486 I3) — this neutralizes the sharpest aliasing hazards, but it does not make in-place mutation safe as a pattern, and it does not apply to the JS SDK.
// WRONG — mutates and returns the argument; leaks if runner ever retains the refhandler: (state, event) => { state.count += 1; return state; }
// RIGHT — return a new objecthandler: (state, event) => ({ ...state, count: state.count + 1 })4. Idempotent by construction
Section titled “4. Idempotent by construction”The same event may be applied multiple times: during the PG-rebuild vs. live-tail overlap window (#486), after a NATS re-delivery, after a node failover. Handler output must be invariant under replay.
- Counters: prefer
state.ordersByID[id] = order(set-like) overstate.count += 1(accumulator). The set-like form survives replay; the accumulator over-counts. - If you must accumulate, key the accumulation to the event:
if (!state.seen[event.id]) { state.count += 1; state.seen[event.id] = true }.
// WRONG — double-counts on replayhandler: (state, event) => ({ ...state, total: state.total + event.data.amount })
// RIGHT — idempotent via event IDhandler: (state, event) => { if (state.applied[event.id]) return state; return { ...state, applied: { ...state.applied, [event.id]: true }, total: state.total + event.data.amount, };}For most business cases the cleaner fix is a keyed map (state.orders[event.data.orderId] = ...) — replay then overwrites with the same value and is naturally idempotent.
Concurrent writers
Section titled “Concurrent writers”Multi-node deployments running the same projection on overlapping partitions can lose updates on projection_state. Ironflow does not currently guard those writes with optimistic concurrency (CAS on a per-row version counter); the UPSERT only enforces a monotonic last_event_seq. Two nodes reading the same cursor and writing different events race like this:
node-A reads state(cursor=9)node-B reads state(cursor=9)node-A applies E10, writes state(cursor=10)node-B applies E11, writes state(cursor=11) ← E10's state contribution lostRule 4 (idempotency) does not prevent this race. Rule 4 keeps replay of the same event safe; the race above loses a different event (E10) by advancing last_event_seq past it, so E10 is never re-applied. Both reducer shapes are affected — projection_state is written as a full blob, so the final UPSERT overwrites everything the other node contributed (whichever event the final writer didn’t see is lost, keyed-map or accumulator alike).
Rule 4 is still required; it’s orthogonal to this race and covers replay and redelivery.
The only mitigation available today is to pin the projection to a single runner so concurrent reads of the same cursor cannot happen.
CAS-style row-level guards on projection_state (bump a per-row version counter and fail the UPSERT on stale version) are the intended long-term fix, tracked in #504; not implemented today.
External mode
Section titled “External mode”External handlers have side effects; they cannot be “pure” in the reducer sense. However, they still run under at-least-once delivery, so the side effect itself must be idempotent. Use idempotency keys on the external API (see anti-pattern 8), or look up the external resource before writing.
Rebuild observability and tunables
Section titled “Rebuild observability and tunables”Rebuilds ship with metrics, trace spans, and environment-variable knobs so operators can watch progress and shed database load (issue #495).
Metrics (emitted only while a rebuild is in flight):
ironflow_projection_rebuild_active{name, env}— gauge, 1 per projection currently rebuilding. Sourced from a scrape-time collector againstprojection_registry, so the gauge stays consistent across process crashes and cluster-wide handoffs.ironflow_projection_rebuild_events_applied_total{name, env, phase}— counter incremented after each successful PG-source batch. Userate()to compute events/sec.ironflow_projection_rebuild_duration_seconds{name, result}— histogram observed on terminal transition, labelled withresult ∈ {completed, cancelled, failed}.
Tunables (read once at startup — restart to change):
| Variable | SQLite default | Postgres default | Meaning |
|---|---|---|---|
IRONFLOW_REBUILD_BATCH_SIZE_SQLITE | 500 | — | Events per PG pull during rebuild |
IRONFLOW_REBUILD_BATCH_SIZE_POSTGRES | — | 1000 | Events per PG pull during rebuild |
IRONFLOW_REBUILD_BATCH_PAUSE_MS | 0 | 0 | Inter-batch sleep; only applied to full batches |
During rebuild, the server overrides the SDK-supplied batch_size with
the tunable so throughput stays deterministic (batch_size / pause_ms)
regardless of SDK configuration. Outside rebuild, the SDK value is used
as before.
On an unrecoverable PG error during a rebuild (e.g. ListEventsByNatsSeq
fails), the PG branch of pullForProjection calls FailRebuild —
rebuild markers cleared, status flipped to error, error_message
populated. No retry loop; the operator re-triggers manually once the
underlying issue is fixed.
See runbook-projection-rebuild.md
for PromQL examples, alert recipes, and a walk-through of “rebuild
stuck” triage.
Streaming mode and keepalives
Section titled “Streaming mode and keepalives”The SDK projection runner prefers the StreamProjectionEvents RPC
(long-lived SSE/ConnectRPC server stream) over PollProjectionEvents.
Streams can sit idle for minutes in quiet environments, which used to
cause intermediaries (proxies, LBs, or Node’s default fetch socket
idle timer) to close the connection — producing spurious TypeError: fetch failed reconnect logs every ~60–90s (issue #550).
When the client sets accept_heartbeats = true on the request (the
@ironflow/node runner does so by default), the server emits periodic
ProjectionEvent frames with kind = PROJECTION_EVENT_KIND_HEARTBEAT
every ~15 seconds. The frames carry no payload and are silently
dropped by the client; they exist only to keep the socket warm.
Clients that don’t opt in see no heartbeats — back-compat for older
SDK versions that don’t understand the kind field.
Related
Section titled “Related”- PR #486 — PG-backed projection rebuild (the concurrency model that makes determinism load-bearing).
- Issue #495 — rebuild observability & tunables.
- Issue #550 — SSE keepalive pings on projection streams.
- Event sourcing — how entity streams relate to projections.
- Anti-pattern 3 and 3a/3b in
.agents/skills/ironflow-docs/anti-patterns.md.