Emergency bypass for policy self-lockout
A bad CEL policy can deny policies:write for the very role that needs to fix it. The save path’s three-stage preflight (saver subject + per-admin subject + synthetic role-only subject) catches the obvious cases at write time, but a policy already-live can still strand admins — for example, a valid_from that activated a deny earlier than intended.
This page documents the break-glass CLI bypass. Use it when you understand exactly what is happening and why; do not use it as a routine workaround for preflight failures.
Security note. The bypass requires direct shell access to a node running the Ironflow binary with platform credentials. It is not exposed via the dashboard or HTTP. There is no “remote bypass” — by design.
When to use the bypass
Section titled “When to use the bypass”Bypass is the right tool only when:
- A live policy denies
policies:writefor every admin role under the currentnow. - Rolling forward (a corrective edit) is itself blocked by the self-lockout preflight.
- Rolling back to a known-good version is also blocked (the rollback runs the same preflight).
- You can produce a written incident note explaining the change.
If any of those four conditions is unmet, do not use bypass. The preflight is doing its job.
Common false alarms:
- Preflight names one admin who is on PTO. Use a different admin’s credentials. The preflight blocks because at least one admin is locked out, not necessarily because all are.
- Preflight names a synthetic role-only subject. This is the third preflight stage (OV4) and means the policy denies the role itself, not just specific people. This is rare — usually the right fix is to scope the policy more narrowly. Consider whether the policy is correct and the role assignments are wrong.
- You can wait for
valid_until. If the offending policy has an upcoming expiry, waiting it out is safer than bypassing.
Procedure
Section titled “Procedure”Step 1 — Document the incident
Section titled “Step 1 — Document the incident”Write the incident note first, before bypassing. The note must include:
- The policy name and version that is blocking the corrective edit.
- The audit row(s) demonstrating the lockout.
- The corrective action you are about to take.
- Your identity and timestamp.
This note goes into the change record. It is the audit trail that explains why a bypass exists.
Step 2 — Locate a node and platform key
Section titled “Step 2 — Locate a node and platform key”Bypass is gated on a platform-tier credential (ifplatform_*), not a tenant API key. The bypass flag rejects tenant credentials with an explicit error so a leaked tenant key cannot bypass.
# On a node with the binary:which ironflowecho "$IRONFLOW_PLATFORM_KEY" | head -c 12 # confirm ifplatform_ prefixStep 3 — Validate the corrective policy first
Section titled “Step 3 — Validate the corrective policy first”Run the corrective condition through policy test against the same subjects the preflight is unhappy about. You want to be sure the bypassed save is actually correct — bypass disables the preflight’s safety check, not its intent.
ironflow policy test \ --condition '<corrective condition>' \ --request '{"action":"policies:write","resource":"irn:ironflow:org_acme:proj_default:policy:default:*"}' \ --subject '{"id":"user_alice","roles":["admin"]}'Step 4 — Save with bypass
Section titled “Step 4 — Save with bypass”ironflow policy update <policy-id> \ --condition '<corrective condition>' \ --bypass-self-lockout-preflight \ --bypass-reason "incident <id>: original v4 denied admin policies:write at 14:00 UTC"--bypass-reason is required when --bypass-self-lockout-preflight is set. The reason is forwarded as an HTTP header and logged to stderr by the CLI. Bypassed writes are still audited normally in the policy_decisions chain.
Step 5 — Verify
Section titled “Step 5 — Verify”# Confirm the updated policyironflow policy get <policy-id>
# Lockout is cleared — attempt a benign update to confirm preflight passesironflow policy update <policy-id> --name <policy-name>Step 6 — Close the incident
Section titled “Step 6 — Close the incident”Update the incident note with:
- Confirmation the bypass succeeded.
- Confirmation the lockout is cleared.
- Any follow-up policy edits needed to prevent recurrence.
- Whether the original policy was reverted, edited, or left in place.
What bypass does not disable
Section titled “What bypass does not disable”- L1 RBAC. Bypass only skips the L2 self-lockout preflight. If your platform key doesn’t have
policies:writeat L1, bypass changes nothing — you’ll get the L1 deny. - CEL compilation. The bypassed save still runs
Compileon the new condition. T1 (no empty conditions) still applies. Bypass is only about preflight, not about validity. - Audit chain. Bypassed writes are audited. The bypass reason is forwarded as an HTTP header and logged to stderr. The chain links normally; verifying the chain after a bypass should succeed.
- Cache invalidation. Bypassed saves bump the tenant epoch the same way normal saves do. The cluster picks up the new policy on next lookup.
Postmortem expectations
Section titled “Postmortem expectations”Every bypass invocation is a small failure of the preflight model. Write a postmortem covering:
- Why the original policy created the lockout (was it a
valid_fromtypo, an unintended scope, a missing role exclusion?). - Why the per-admin preflight didn’t catch it at write time. The most common answer: the original save passed preflight against the saver but not against a future-time activation. Consider whether the policy’s
valid_fromshould have been simulated in the preflight (currently it isn’t — that’s a known limitation). - What changes prevent recurrence: tighter resource patterns, narrower role assignments, additional preflight subjects.
Postmortems for bypass invocations feed the C6 anomaly-detection follow-up (deferred per ADR 0016). When that ships, repeated bypass patterns become alertable.
What bypass is not for
Section titled “What bypass is not for”- Production tenants whose admins forgot their credentials. That’s an account recovery problem, not a policy problem.
- Tenant admins who don’t like the deny. L2 policies exist for a reason; if a deny is wrong, edit it normally — preflight only blocks self-lockouts, not edits in general.
- Policy import or template install. Use the template bundle workflow (
ironflow policy template install); templates are pre-vetted and the install path runsLintTemplateon the bundle.
Related
Section titled “Related”- Conceptual model: Authorization Policies, self-lockout section
- Architectural rationale: ADR 0016, decision S3
- Investigating the deny that triggered the lockout: debug-deny.md
- Rolling back instead of bypassing: manage-versions.md