Benchmarks

Ironflow ships a benchmark suite that measures component throughput and end-to-end latency. Use it to establish baselines for your hardware and detect regressions.

Quick Start

# Component benchmarks (in-process, no server needed)
make bench

# Load tests (starts server, runs k6 scripts, captures pprof)
make loadtest

# Both
make bench-all

Prerequisites

Tool	Required for	Install
Go 1.25+	`make bench`	Already required
k6	`make loadtest`	`brew install k6`

make bench and make loadtest both run against in-memory SQLite with embedded NATS. PostgreSQL comparison is not currently wired into the bench pipeline.

Component Benchmarks (`make bench`)

Runs Go testing.B benchmarks against in-memory SQLite with embedded NATS. No running server needed.

What’s measured:

Category	Benchmarks
Store	CreateRun, CreateRun (parallel), GetRun, CreateStep, CreateEvent, ListRuns
Engine	StepMemoLookup, CreateAndCompleteStep
NATS	Publish, KV Put, KV Get, PublishSubscribe
Pattern	Parse, Match, MatchWildcard, PatternMatcher
API	EmitEvent, GetRun, ListEvents, HealthCheck

Plus TestGoroutineLeak (asserts no goroutine leak after 500 step cycles) and TestBootTime (cold start to /health response — 10 iterations, reports median and p95; logs a warning above 500ms but does not fail).

Reading the Output

BenchmarkStore_CreateRun-10     85423    14025 ns/op    2048 B/op    42 allocs/op
│                         │       │        │              │             │
│                         │       │        │              │             └ heap allocations per op
│                         │       │        │              └ bytes allocated per op
│                         │       │        └ nanoseconds per operation
│                         │       └ iterations run
│                         └ GOMAXPROCS
└ benchmark name

Lower ns/op = faster. Lower B/op and allocs/op = less GC pressure.

Load Tests (`make loadtest`)

Starts an Ironflow server with serve --dev --port 9199 --pprof (with IRONFLOW_METRICS_ENABLED=true and a dedicated bench DB), registers the SDK benchmark worker from tests/loadtest/functions/, then runs k6 scripts over a 3m30s ramp.

What’s measured:

Script	Peak VUs	Metric
`event-emission.js`	100	Event ingest throughput and latency
`mixed-workload.js`	100	Weighted mix (40% emit, 20% list runs, 20% list events, 10% functions, 10% health)
`function-invoke.js`	50	Function trigger-to-completion time
`event-to-projection.js`	50	End-to-end event → projection latency via WebSocket
`cancel-on-event.js`	varies	Event-driven run cancellation latency
`policy-eval.js`	varies	CEL policy evaluation throughput

Reading k6 Output

http_req_duration...: avg=3.19ms  min=245µs  med=2.23ms  max=27.3ms  p(90)=7.67ms  p(95)=9.28ms
http_req_failed.....: 0.00%  0 out of 134602
http_reqs...........: 134602  640.899/s

Thresholds are per-script — event-emission enforces p95 < 500ms / failure < 1%, mixed-workload p95 < 1s / failure < 1%, event-to-projection p95 < 2s, function-invoke p95 < 10s / failure < 5%, policy-eval failure < 1%
Results and pprof profiles are saved to tests/loadtest/results/{timestamp}/ — captures include heap-before.prof, heap-after.prof, goroutine-before.prof, goroutine-after.prof

Profiling After Load Tests

# Compare heap before/after load
go tool pprof -diff_base results/heap-before.prof results/heap-after.prof

# Check goroutine state
go tool pprof results/goroutine-after.prof

Grafana Dashboard

The Grafana performance dashboard is included in the Helm chart at deploy/helm/ironflow/dashboards/ironflow-performance.json. When deployed with monitoring.dashboards.enabled=true, Grafana auto-imports it via sidecar. For standalone Grafana, import the JSON file directly. Requires IRONFLOW_METRICS_ENABLED=true.