Skip to content

kubectl Operations

This guide covers common kubectl operations for managing a running Ironflow deployment on Kubernetes. It assumes you have already deployed Ironflow using the Helm Chart or Hetzner Cloud guide.

Kubeconfig Setup

Before running any kubectl commands, you need to point your CLI at the right cluster. When you provision a cluster with ironflow provision create, the kubeconfig is saved to two locations:

  1. Workspace-local (gitignored): deploy/terraform/hetzner/kubeconfig
  2. Durable copy (survives across workspaces): ~/.kube/clusters/hetzner-<name>.yaml

The workspace-local copy is gitignored and won’t be present in new workspaces or git worktrees. Use the durable copy instead:

Terminal window
# Set for the current shell session
export KUBECONFIG=~/.kube/clusters/hetzner-ironflow.yaml
# Verify connectivity
kubectl get nodes

Persist across shell sessions

Add the export to your shell profile (~/.zshrc or ~/.bashrc) so kubectl always targets your cluster:

Terminal window
echo 'export KUBECONFIG=~/.kube/clusters/hetzner-ironflow.yaml' >> ~/.zshrc

Alternatively, merge it into your default kubeconfig so it’s available as a named context:

Terminal window
# Merge into ~/.kube/config (back up first)
cp ~/.kube/config ~/.kube/config.bak
KUBECONFIG=~/.kube/config:~/.kube/clusters/hetzner-ironflow.yaml kubectl config view --flatten > /tmp/merged-config
mv /tmp/merged-config ~/.kube/config
# Switch between contexts
kubectl config get-contexts
kubectl config use-context <context-name>

If the durable copy is missing

If you provisioned the cluster before the auto-save feature was added, or the save failed, copy it manually from wherever the original kubeconfig file exists:

Terminal window
mkdir -p ~/.kube/clusters
cp deploy/terraform/hetzner/kubeconfig ~/.kube/clusters/hetzner-ironflow.yaml

Namespace conventions

Ironflow server pods run in the ironflow namespace. Infrastructure (NATS, CloudNativePG PostgreSQL) lives in ironflow when using the bundled Helm chart configuration, or ironflow-system when deployed separately (e.g., via the Hetzner bootstrap script). The CNPG operator runs in cnpg-system. Adjust namespace flags accordingly.

Cluster Overview

Get a full picture of all Ironflow components at a glance:

Everything runs in the ironflow namespace:

Terminal window
kubectl get all -n ironflow
Terminal window
# Ironflow server pods only
kubectl get pods -n ironflow -l app.kubernetes.io/component=server
# Recent events (useful for diagnosing scheduling or startup issues)
kubectl get events -n ironflow --sort-by=.lastTimestamp

Health Checks

Ironflow Health

Terminal window
# Pod status
kubectl get pods -n ironflow -l app.kubernetes.io/component=server
# Port-forward and check the health endpoint
kubectl port-forward svc/ironflow -n ironflow 9123:9123
# In another terminal:
curl http://localhost:9123/health
# {"status":"healthy","timestamp":"...","version":"..."}
# Detailed overview with component status and stats
curl http://localhost:9123/api/v1/overview
Terminal window
# Check readiness (includes NATS connectivity)
curl http://localhost:9123/ready
# {"status":"ready"} or {"status":"not ready","issues":{"nats":"disconnected"}}

/health and /ready do not require authentication. /health is the liveness probe (checks PostgreSQL only). /ready is the readiness probe (checks PostgreSQL + NATS). The /api/v1/overview endpoint requires an API key (or dev mode).

NATS Health

Terminal window
kubectl get pods -n ironflow -l app.kubernetes.io/name=nats
# NATS health check
kubectl exec -n ironflow ironflow-nats-0 -c nats -- wget -qO- http://localhost:8222/healthz
# JetStream status (streams, consumers, storage)
kubectl exec -n ironflow ironflow-nats-0 -c nats -- wget -qO- http://localhost:8222/jsz
# Cluster route connections (clustered NATS only)
kubectl exec -n ironflow ironflow-nats-0 -c nats -- wget -qO- http://localhost:8222/routez

PostgreSQL Health

Terminal window
# CNPG cluster status
kubectl get cluster ironflow-postgresql -n ironflow
# Pod status
kubectl get pods -n ironflow -l cnpg.io/cluster=ironflow-postgresql
# Check the primary is accepting connections
kubectl exec -n ironflow \
$(kubectl get pod -l cnpg.io/cluster=ironflow-postgresql,role=primary -n ironflow -o name) \
-- pg_isready -U ironflow -d ironflow

Viewing Logs

Ironflow Server Logs

Terminal window
# Tail recent logs from all Ironflow pods
kubectl logs -l app.kubernetes.io/component=server -n ironflow --tail=50
# Stream logs in real time
kubectl logs -l app.kubernetes.io/component=server -n ironflow -f
# Logs from the last 15 minutes
kubectl logs -l app.kubernetes.io/component=server -n ironflow --since=15m
# All pods with pod-name prefix (useful for multi-replica)
kubectl logs -l app.kubernetes.io/component=server -n ironflow --all-containers --prefix

Set LOG_LEVEL to debug for detailed output. Update via Helm values (ironflow.logLevel) and run helm upgrade — pods restart automatically when the ConfigMap changes.

NATS Logs

Terminal window
kubectl logs ironflow-nats-0 -n ironflow -c nats --tail=30

PostgreSQL Logs

Terminal window
kubectl logs -l cnpg.io/cluster=ironflow-postgresql -n ironflow --tail=30

Accessing the Dashboard

Terminal window
kubectl port-forward svc/ironflow -n ironflow 9123:9123

Open http://localhost:9123 in your browser.

In production mode (devMode: false), you need the admin API key and password from the bootstrap logs. These are printed only once on first boot:

Terminal window
kubectl logs -n ironflow $(kubectl get pods -n ironflow -l app.kubernetes.io/component=server -o name | head -1) | grep -A8 "Admin API Key"

Scaling

Manual Scaling

Terminal window
kubectl scale deployment ironflow -n ironflow --replicas=3

Scaling beyond 1 replica requires cluster mode (cluster.enabled: true in Helm values) with external NATS and PostgreSQL. See the Docker Compose Deployment guide.

Checking Autoscaler Status

If HPA is enabled (autoscaling.enabled: true):

Terminal window
# HPA status
kubectl get hpa ironflow -n ironflow
# Detailed metrics and scaling events
kubectl describe hpa ironflow -n ironflow

Pod Disruption Budget

Terminal window
kubectl get pdb ironflow -n ironflow

Configuration

Viewing the ConfigMap

The Helm chart generates an ironflow.yaml ConfigMap from Helm values. This is the server’s runtime configuration:

Terminal window
kubectl get configmap ironflow-config -n ironflow -o yaml

Viewing Secrets

Terminal window
# List secret keys
kubectl get secret ironflow-secret -n ironflow -o yaml
# Decode a specific key
kubectl get secret ironflow-secret -n ironflow \
-o jsonpath='{.data.database-url}' | base64 -d

Never share decoded secret values. In bundled deployments the database URL lives in the CNPG-managed <release>-postgresql-app secret (key uri); the chart-managed <release>-secret holds master-key, license-key, and (for external DB without existingSecret) database-url.

Updating Configuration

Configuration changes go through Helm, not direct ConfigMap edits — the deployment template includes checksum annotations that trigger automatic pod restarts when content changes:

Terminal window
# Example: enable debug logging
helm upgrade ironflow deploy/helm/ironflow/ -n ironflow \
--reuse-values --set ironflow.logLevel=debug
# Example: enable metrics
helm upgrade ironflow deploy/helm/ironflow/ -n ironflow \
--reuse-values --set observability.metrics.enabled=true

Restarting Pods

Terminal window
# Rolling restart (zero downtime with 2+ replicas)
kubectl rollout restart deployment/ironflow -n ironflow
# Watch the rollout progress
kubectl rollout status deployment/ironflow -n ironflow
# Restart a single pod (the deployment controller recreates it)
kubectl delete pod <pod-name> -n ironflow

Pod restarts happen automatically after helm upgrade when ConfigMap or Secret content changes, thanks to the checksum/config and checksum/secret annotations in the deployment template.

NATS Operations

Checking JetStream Streams

Terminal window
# JetStream summary
kubectl exec -n ironflow ironflow-nats-0 -c nats -- wget -qO- http://localhost:8222/jsz
# Stream details (includes message counts and storage)
kubectl exec -n ironflow ironflow-nats-0 -c nats -- wget -qO- 'http://localhost:8222/jsz?streams=1'

Key streams and KV buckets that Ironflow creates:

ResourceTypePurpose
STEPSStream (WorkQueue)Pull-mode job dispatch to workers
SYS_cron_triggersKV bucketCron fire deduplication across nodes
SYS_config_*KV bucketEnvironment-scoped configuration
SYS_secrets_*KV bucketEncrypted secrets storage
APP_*KV bucketUser-facing KV store data

Using the nats CLI Inside the Cluster

For interactive debugging, run a temporary nats-box pod:

Terminal window
kubectl run nats-box --image=natsio/nats-box -n ironflow -it --rm -- sh
# Inside the pod (service name is <release>-nats):
nats stream ls -s nats://ironflow-nats:4222
nats kv ls -s nats://ironflow-nats:4222
nats stream info STEPS -s nats://ironflow-nats:4222

The nats-box image includes the nats CLI and is useful for inspecting JetStream state without installing tools locally. The pod is deleted automatically when you exit (--rm).

PostgreSQL Operations

Connecting to the Database

Terminal window
# Connect to the primary via a temporary pod (passes the auto-generated password)
kubectl run pg-client --rm -it --image=postgres:17-alpine -n ironflow \
--env="PGPASSWORD=$(kubectl get secret ironflow-postgresql-app -n ironflow -o jsonpath='{.data.password}' | base64 -d)" \
-- psql -h ironflow-postgresql-rw -U ironflow -d ironflow

Be careful running write queries against production. Use \x for expanded display and test queries with EXPLAIN first.

Useful Queries

-- Active runs
SELECT count(*) FROM runs WHERE status = 'running';
-- Run breakdown by function and status
SELECT function_id, status, count(*) FROM runs GROUP BY function_id, status;
-- Stale scheduler claims (multi-node clusters)
SELECT * FROM steps
WHERE status = 'waking'
AND claimed_at < NOW() - INTERVAL '5 minutes';
-- Recent failures
SELECT id, function_id, status, error, created_at
FROM runs WHERE status = 'failed'
ORDER BY created_at DESC LIMIT 10;

CNPG Cluster Status

For production deployments using CloudNativePG:

Terminal window
# Cluster overview (instances, ready count, phase)
kubectl get cluster ironflow-db -n ironflow-system
# Detailed status including failover history
kubectl describe cluster ironflow-db -n ironflow-system
# Check PVCs (persistent storage)
kubectl get pvc -n ironflow-system -l cnpg.io/cluster=ironflow-db

Upgrading Ironflow

Rolling Update

Terminal window
# From local chart with a specific image tag
helm upgrade ironflow deploy/helm/ironflow/ -n ironflow \
--reuse-values --set image.tag=0.16.0
# From OCI registry
helm upgrade ironflow oci://ghcr.io/sahina/charts/ironflow -n ironflow \
--reuse-values
# Watch the rollout
kubectl rollout status deployment/ironflow -n ironflow

Ironflow applies database migrations automatically on startup. With PDB enabled (podDisruptionBudget.minAvailable: 1), at least one pod stays available during the upgrade.

Rollback

Terminal window
# View release history
helm history ironflow -n ironflow
# Revert to previous release
helm rollback ironflow -n ironflow

Troubleshooting

CrashLoopBackOff

Terminal window
# Check events and exit codes
kubectl describe pod <pod-name> -n ironflow
# Logs from the previous (crashed) container
kubectl logs <pod-name> -n ironflow --previous

Common causes: database connection failure (check the database URL in ironflow-secret), NATS unreachable (check externalNats.url in Helm values), missing master key.

OOMKilled

Terminal window
kubectl describe pod <pod-name> -n ironflow
# Look for "OOMKilled" in Last State

Increase memory limits:

Terminal window
helm upgrade ironflow deploy/helm/ironflow/ -n ironflow \
--reuse-values --set resources.limits.memory=1Gi

Pending Pods

Terminal window
kubectl describe pod <pod-name> -n ironflow
# Check Conditions and Events sections

Common causes: insufficient node resources (add workers or increase node size), PVC not bound (storage class missing), node affinity constraints.

ImagePullBackOff

The container image is likely in a private registry. Create an image pull secret:

Terminal window
kubectl create secret docker-registry ghcr-pull-secret \
--namespace ironflow \
--docker-server=ghcr.io \
--docker-username=YOUR_USERNAME \
--docker-password=YOUR_TOKEN
helm upgrade ironflow deploy/helm/ironflow/ -n ironflow \
--reuse-values --set "imagePullSecrets[0].name=ghcr-pull-secret"

Database Connection Refused

Terminal window
# Verify PostgreSQL is running
kubectl get pods -n ironflow-system -l cnpg.io/cluster=ironflow-db
# Check the database URL in the secret
kubectl get secret ironflow-secret -n ironflow \
-o jsonpath='{.data.database-url}' | base64 -d
# Verify the service exists
kubectl get svc -n ironflow-system | grep ironflow-db
# Test from an Ironflow pod
kubectl exec -it -n ironflow $(kubectl get pods -n ironflow -l app.kubernetes.io/component=server -o name | head -1) \
-- wget -qO- http://localhost:9123/health

NATS Connection Issues

Terminal window
# Verify NATS pods are running
kubectl get pods -n ironflow-system -l app.kubernetes.io/name=nats
# Check NATS health from inside the cluster
kubectl exec -n ironflow-system nats-0 -c nats -- wget -qO- http://localhost:8222/healthz
# Check the NATS URL in the ConfigMap
kubectl get configmap ironflow-config -n ironflow -o yaml | grep nats
# Check PVCs for JetStream storage
kubectl get pvc -n ironflow-system

Quick Reference

TaskCommand
Set kubeconfigexport KUBECONFIG=~/.kube/clusters/hetzner-ironflow.yaml
Pod statuskubectl get pods -n ironflow -l app.kubernetes.io/component=server
Tail logskubectl logs -l app.kubernetes.io/component=server -n ironflow -f --tail=50
Port-forward dashboardkubectl port-forward svc/ironflow -n ironflow 9123:9123
Liveness checkcurl http://localhost:9123/health
Readiness checkcurl http://localhost:9123/ready
Restart podskubectl rollout restart deployment/ironflow -n ironflow
Scale replicaskubectl scale deployment ironflow -n ironflow --replicas=N
View configkubectl get configmap ironflow-config -n ironflow -o yaml
View eventskubectl get events -n ironflow --sort-by=.lastTimestamp
JetStream status (dev)kubectl exec -n ironflow ironflow-nats-0 -c nats -- wget -qO- http://localhost:8222/jsz
JetStream status (prod)kubectl exec -n ironflow-system nats-0 -c nats -- wget -qO- http://localhost:8222/jsz
Connect to PostgreSQL (CNPG dev)kubectl run pg-client --rm -it --image=postgres:17-alpine -n ironflow --env="PGPASSWORD=$(kubectl get secret ironflow-postgresql-app -n ironflow -o jsonpath='{.data.password}' | base64 -d)" -- psql -h ironflow-postgresql-rw -U ironflow -d ironflow

Monitoring Operations

If the monitoring stack is deployed (kube-prometheus-stack + BlackBox Exporter):

Terminal window
# Access Grafana
kubectl port-forward svc/kube-prometheus-stack-grafana -n monitoring 3000:80
# Open http://localhost:3000 (admin credentials from grafana-admin secret)
# Access Prometheus
kubectl port-forward svc/prometheus-operated -n monitoring 9090:9090
# Check monitoring pods
kubectl get pods -n monitoring
# Check Prometheus targets
# Port-forward Prometheus, then visit http://localhost:9090/targets
# Validate alert rules (render the chart to stdin; alerts live in the Ironflow Helm chart)
helm template ironflow deploy/helm/ironflow/ --show-only templates/ironflow-alerts.yaml \
| promtool check rules /dev/stdin
# Check Alertmanager status
kubectl port-forward svc/kube-prometheus-stack-alertmanager -n monitoring 9093:9093

See Also