Authorization Policies (CEL)

Authorization Policies

Ironflow makes authorization decisions in two layers. The system layer (RBAC) is built into the binary and decides whether a caller has any business touching a resource type. The tenant layer (CEL) lets a tenant admin add deny rules on top — never new allows, only additional denies that the system layer did not already enforce.

If you have ever wanted to say “users in the developer role can invoke functions, except outside business hours, except during a change-freeze, except on functions tagged production,” that second sentence is what tenant-edit-time CEL is for.

The two layers

                  Caller --> AuthzRequest
                                 |
                                 v
                +----------------+----------------+
                |  Layer 1 — system RBAC          |
                |  built-in roles + verbs         |
                |  decides ALLOW vs DENY          |
                |  edited via Ironflow migration  |
                +----------------+----------------+
                                 |
                          allow ? --no--> DENY (L1)
                                 |
                                yes
                                 |
                                 v
                +----------------+----------------+
                |  Layer 2 — tenant CEL           |
                |  custom policies, deny-only     |
                |  edited by tenant admins        |
                +----------------+----------------+
                                 |
            any matching deny.Condition true ?
                                 |
                  +--------------+--------------+
                  |                             |
                 yes                            no
                  |                             |
                  v                             v
                DENY (L2)                     ALLOW
                  |
                  v
        audit row written with
        SHA256 hash chain link

Layer 1 is authoritative for ALLOW. A request that L1 denies stops there; L2 never runs. This is the core security guarantee: tenant-authored CEL cannot grant a permission that L1 does not already grant.

Layer 2 is subtractive only. A CEL policy with effect = "deny" and a Condition that returns true flips the L1 ALLOW into a final DENY. A CEL policy with effect = "allow" and a non-empty Condition is inert — L1 already allowed, so the L2 allow adds nothing. We document this property because it is structurally enforced in the evaluator, not a convention readers should remember.

What CEL sees

Every CEL Condition is evaluated against two map variables. The canonical shape lives in internal/auth/rbac/cel/env.go::NewCELEnv():

request — the action and resource being authorized:
- request.action — verb like functions:invoke, policies:write, runs:read
- request.resource — the IRN being acted on (e.g., irn:ironflow:org_acme:proj_default:function:prod:fn_payments)
- request.environment — environment name (e.g., prod, staging)
- request.org_id — caller’s organization ID
subject — who is asking, normalized across API-key and JWT/dashboard paths:
- subject.id — the principal ID (apikey_* or user_*)
- subject.user_email — for JWT/dashboard callers, empty string for raw API keys without a bound user
- subject.roles — list of role names assigned to the principal
- subject.groups — list of group memberships (reserved for future use; empty in v1)
- subject.org, subject.project, subject.env — the principal’s home scope
- subject.api_key_id, subject.is_platform — provenance fields

The shape is identical regardless of who triggered the call. A policy you write against subject.user_email will see the same value whether the caller hit a JWT-cookie dashboard route or a raw ifkey_* API path.

If you need to filter on something inside the IRN (e.g., the resource ID suffix), use CEL string ops (.contains(), .startsWith(), .endsWith()) against request.resource directly. There is no IRN-parsed-fields shortcut.

A worked example

Tenant admin wants to deny functions:invoke on production functions for callers who are not on the on-call rotation.

request.environment == "prod" && !("oncall" in subject.roles)

…stored as a policy with:

Field	Value
`name`	`deny-prod-invoke-non-oncall`
`effect`	`deny`
`actions`	`functions:invoke`
`resources`	`irn:ironflow:::function:prod:*`
`condition`	(the CEL above)
`valid_from`	(optional — when this policy becomes active)
`valid_until`	(optional — when it expires)

A functions:invoke from an on-call developer: L1 RBAC says ALLOW, L2 evaluates the condition (false — oncall is in roles), no L2 deny matches, final ALLOW. Same call from a non-on-call developer: L1 ALLOW, L2 condition true, final DENY. Audit row written with the chain link.

Time-bounded policies (`valid_from` / `valid_until`)

Both fields are wall-clock UTC timestamps stored on every policy row. The design intent is a half-open [valid_from, valid_until) window: policies outside the window should be skipped before the CEL Condition runs, so an out-of-window policy never evaluates.

Recurring windows (“deny outside business hours every weekday”) are not expressible against the v1 CEL env — there is no now binding inside CEL. The eventual filter will be wall-clock only.

Why empty `Condition` is rejected

A policy without a Condition is RBAC-redundant — its allow/deny effect is fully expressible as an L1 verb assignment. We reject empty Condition at write time (decision T1) so that the schema cannot drift back into the silent-ignore state that motivated this whole reintroduction. If you want a policy that always denies a verb, change the role assignment in L1; if you want a conditional deny, write the condition.

Audit hash chain

Every Layer 2 DENY (and every audit-write failure on the DENY path) writes a row to policy_decisions containing the request, subject, decision, and a SHA-256 chain link:

this_hash = SHA256( prev_hash || 0x00 || canonical_row_bytes(this_row) )

The 0x00 byte separates prev_hash from the canonical row bytes so a row whose canonical bytes happen to start with hex characters cannot be confused with a continuation of the prior hash. The implementation is internal/auth/audit/policy_decision.go::ChainHash; the PG-side equivalent runs inside the policy_decisions_chain_trigger BEFORE INSERT trigger (internal/store/migrations/postgres/028_policy_decisions.sql) so Go and PG compute byte-identical hashes from the same row.

prev_hash is the previous row’s this_hash for the same tenant. canonical_row_bytes is a deterministic ASCII-separator-delimited byte sequence (unit separator \x1f between key and value, record separator \x1e between fields, keys sorted alphabetically).

The insert trigger takes pg_advisory_xact_lock(hashtext(tenant_id)::bigint) — a single-key per-tenant advisory lock that auto-releases at COMMIT/ROLLBACK — so concurrent writers serialize on one chain per tenant. The seq column (monotonic per tenant, allocated inside the lock) is the canonical ordering; prev_hash and this_hash are integrity, not order.

This is tamper-evident, not tamper-proof. An attacker with PG write rights can rewrite the chain end-to-end and recompute hashes. We claim only what we can deliver. Customers needing true tamper-proof storage should ship audit rows to an external WORM target — not in v1 scope.

What CEL does not see

Deliberately, by design:

No engine global. CEL cannot reach into the engine to look up other resources, fan out queries, or call external APIs. Authz decisions are functions of the request, the subject, and time.
No L1 internals. CEL cannot inspect which RBAC verb was matched at L1. If you need to express “deny when L1 matched a wildcard verb,” restructure your roles instead.
No mutation. CEL is pure. A Condition cannot write to the database, emit events, or call out. Policies that need side effects belong in workflow code, not in authz.

Common mistakes

Writing an allow with a Condition and expecting it to grant something. L1 already decided ALLOW or it would not have reached L2. The condition is inert. If you want to grant a permission that L1 does not, change the L1 role assignment.
Forgetting valid_until on a temporary policy. A “for the next two weeks” deny without valid_until becomes a permanent deny. Use both bounds when the intent is bounded.
Self-lockout via policies:write deny. If your policy denies policies:write for your own role, you cannot edit policies anymore. The save path runs three preflights (saver subject, each admin subject, synthetic role-only subject) and hard-blocks the save in this case. CLI bypass exists for break-glass — see emergency-bypass.md.
Assuming subject fields are populated. subject.user_email is empty for raw API keys without a bound user. Test the field before using it: subject.user_email != "" && subject.user_email.endsWith("@acme.com").

Where to next

Author your first policy: author-policy.md
Debug a deny you didn’t expect: debug-deny.md
Roll back a bad policy: manage-versions.md
Architectural rationale: ADR 0016
Cluster security model: security.md