Transactional outbox

Ironflow uses the transactional outbox pattern to decouple accepting an event from announcing it on NATS. This document explains the architecture, the guarantees it provides, and the operational surface an operator needs to know about.

Introduced in #487 (supersedes #477).

Why an outbox

Every event in Ironflow lives in two places: the events table (source of truth, queryable, durable) and the NATS JetStream PUBSUB stream (notification channel consumed by projections and real-time subscriptions). An event is only useful once it exists in both — the DB row without the NATS message means projections never see it; the NATS message without the DB row means consumers act on state that the application can’t prove exists.

Before the outbox, the publish path wrote the DB row, then called NATS, then patched events.nats_seq with the assigned sequence. Four paths left events.nats_seq permanently NULL:

Legacy Emit() API — non-entity events never published to NATS.
publisher == nil (test mode, misconfigured deployments) — no publish call at all.
Publish failure — the write-back path was documented as non-fatal, so transient NATS outages left the row unpublished.
The t1→t3 window — between the DB insert and the seq write-back, readers could observe NULL. Process crashes in that window made the NULL permanent.

These gaps broke WaitForEvent(eventID), which has no way to resolve an event ID to a NATS sequence when the column is NULL.

The outbox pattern

AppendEvent / Emit request:
  ┌───────────────────────────────────────────────────────┐
  │  BEGIN txn                                            │
  │   INSERT INTO events (..., nats_seq=NULL)             │
  │   INSERT INTO outbox (event_id, topic, kind, ...)     │
  │   ← optimistic concurrency check happens here         │
  │  COMMIT                                               │
  └───────────────────────────────────────────────────────┘
         │
         ▼ return to caller (no NATS in the hot path)


outbox worker (one goroutine per node):
  ┌───────────────────────────────────────────────────────┐
  │  SELECT ... FOR UPDATE SKIP LOCKED                    │
  │  pub.PublishWithSeqAndMsgID(topic, payload, entry.id) │
  │  UPDATE events SET nats_seq = ack.Sequence            │
  │  DELETE FROM outbox WHERE id = entry.id               │
  └───────────────────────────────────────────────────────┘

The event row and the outbox row ride the same transaction, so either both are written or neither is. Optimistic concurrency conflicts roll back before anything lands on NATS — no orphan messages. NATS outages no longer block event acceptance; the outbox absorbs the backlog and drains when NATS recovers.

Guarantees

Atomicity. Accepting an event and scheduling its publish are one database transaction. Concurrency conflicts leave no trace on NATS.
Eventual consistency. events.nats_seq is NULL transiently — typically 10-50ms under healthy conditions, bounded by outbox worker lag. It stops being NULL once the worker publishes successfully. Readers that need read-your-writes call WaitForEvent(eventID) to block until the row drains.
No duplicate NATS messages on retry. The worker passes the outbox entry ID as the NATS msg-id; a worker that publishes successfully then crashes before marking the row published will re-publish with the same msg-id, and JetStream’s dedup window returns the original sequence.
Per-entity FIFO. Within a node, each batch is bucketed by entity_id and each bucket runs on a single goroutine that iterates serially. Non-entity rows (plain Emit calls) dispatch in parallel with no ordering constraint. Cross-node ordering relies on SELECT FOR UPDATE SKIP LOCKED plus a 30s claim lease on next_retry_at.
Poison-row handling. Publishes that fail 10 times (default) move to outbox_dead_letter. The main outbox stays small and hot; dead-letter rows stay for manual inspection until explicitly requeued or discarded.

Non-guarantees

Global FIFO across entities. Not provided. Projections dedupe by event ID, so cross-entity ordering is not required.
Zero duplicate deliveries at the consumer. NATS is at-least-once; consumers still need idempotence by event ID.
AppendEventResponse.Sequence in the same round trip. Under the outbox pattern the sequence is not known at AppendEvent time. The field is always 0 on the response — callers that need the sequence call WaitForEvent(eventID) (now blocking) or WaitProjectionCatchup (streaming).

Operational surface

Schema

Two tables, introduced in migration 011:

outbox — the live queue. Columns: id, event_id, entity_id, topic, kind, payload, metadata, environment_id, created_at, next_retry_at, attempts, last_error.
outbox_dead_letter — rows that exceeded retry budget. Same shape minus next_retry_at, plus dead_at.

kind is events (consumed by projections, triggers UPDATE events.nats_seq on publish) or entity (consumed by entity stream subscriptions only).

Tuning

Environment variables on the server binary (not yet wired — defaults only for now):

IRONFLOW_OUTBOX_WORKERS (default 8) — concurrent publish goroutines per batch.
IRONFLOW_OUTBOX_BATCH_SIZE (default 32) — rows claimed per poll.
IRONFLOW_OUTBOX_POLL_MS (default 50) — idle poll interval.
IRONFLOW_OUTBOX_MAX_ATTEMPTS (default 10) — move to dead-letter after this many failures.

Inspecting the outbox

For ad-hoc inspection beyond the CLI and HTTP API (covered below), use direct SQL:

-- How many rows pending?
SELECT COUNT(*) FROM outbox;

-- Oldest pending
SELECT event_id, topic, attempts, last_error, created_at
FROM outbox
ORDER BY created_at ASC
LIMIT 20;

-- Dead-letter summary
SELECT event_id, topic, attempts, last_error, dead_at
FROM outbox_dead_letter
ORDER BY dead_at DESC
LIMIT 20;

Recovery

If the outbox backlogs (NATS down, sustained publish failures), the worker keeps retrying with exponential backoff (1s → 10min cap) until the breaker recovers. No operator action is normally needed.

If rows land in outbox_dead_letter, use the CLI:

# Inspect what's in the DLQ
ironflow outbox dlq list --env prod

# Push a row back to the outbox for another publish attempt
ironflow outbox dlq requeue <event-id>

# Permanently delete a DLQ row (the underlying event row is kept)
ironflow outbox dlq discard <event-id> --yes

The CLI is a thin wrapper over three HTTP routes (handler in internal/server/outbox_handler.go):

Method	Path	Action
`GET`	`/api/v1/outbox/dead-letter`	List DLQ rows (env-scoped via auth context).
`POST`	`/api/v1/outbox/dead-letter/{eventID}/requeue`	Move a DLQ row back to `outbox` for retry.
`DELETE`	`/api/v1/outbox/dead-letter/{eventID}`	Permanently delete a DLQ row (event row kept).

Use these directly when integrating with dashboards or external tooling.

See the Outbox DLQ runbook for the full triage flow and alert rules. For raw-SQL access when the CLI is unavailable, the outbox_dead_letter table follows the schema above and accepts the usual requeue pattern (INSERT ... SELECT ... into outbox, then DELETE from outbox_dead_letter).

What the SDK change means for callers

The breaking change: AppendEventResponse.Sequence and EntityEmitResult.Sequence are always 0. Callers that previously used them for read-your-writes patterns should switch to WaitForEvent(eventID), which blocks until the outbox worker drains the row (or returns a terminal error if the row is dead-lettered):

// before
resp, _ := client.AppendEvent(ctx, req)
_, _ = client.WaitProjectionCatchup(ctx, &WaitReq{MinSeq: resp.Msg.Sequence})

// after
resp, _ := client.AppendEvent(ctx, req)
wait, _ := client.WaitForEvent(ctx, &WaitForEventReq{
    EventId: resp.Msg.EventId,
    Timeout: durationpb.New(5 * time.Second),
})
// wait.Msg.TargetSeq is the resolved NATS seq.

Observability

ironflow_outbox_dead_letter_count{env} — gauge, current DLQ depth per environment, computed at scrape time.
ironflow_dlq_writes_total{source} — counter, incremented every time the worker moves an outbox row to the DLQ (source="outbox").

Alert rules and triage steps live in the Outbox DLQ runbook.

Out of scope for the initial release

Prometheus publish latency histogram and ironflow_outbox_backlog_size gauge (only the DLQ count is wired today).
Cross-node per-entity advisory locking — current implementation serializes per-entity only within one node. Multiple nodes can publish concurrently for the same entity; acceptable until cluster scale warrants pg_try_advisory_xact_lock.

These are tracked as follow-ups on #487.