kubectl Operations
This guide covers common kubectl operations for managing a running Ironflow deployment on Kubernetes. It assumes you have already deployed Ironflow using the Helm Chart or Hetzner Cloud guide.
Kubeconfig Setup
Before running any kubectl commands, you need to point your CLI at the right cluster. When you provision a cluster with ironflow provision create, the kubeconfig is saved to two locations:
- Workspace-local (gitignored):
deploy/terraform/hetzner/kubeconfig - Durable copy (survives across workspaces):
~/.kube/clusters/hetzner-<name>.yaml
The workspace-local copy is gitignored and won’t be present in new workspaces or git worktrees. Use the durable copy instead:
# Set for the current shell sessionexport KUBECONFIG=~/.kube/clusters/hetzner-ironflow.yaml
# Verify connectivitykubectl get nodesPersist across shell sessions
Add the export to your shell profile (~/.zshrc or ~/.bashrc) so kubectl always targets your cluster:
echo 'export KUBECONFIG=~/.kube/clusters/hetzner-ironflow.yaml' >> ~/.zshrcAlternatively, merge it into your default kubeconfig so it’s available as a named context:
# Merge into ~/.kube/config (back up first)cp ~/.kube/config ~/.kube/config.bakKUBECONFIG=~/.kube/config:~/.kube/clusters/hetzner-ironflow.yaml kubectl config view --flatten > /tmp/merged-configmv /tmp/merged-config ~/.kube/config
# Switch between contextskubectl config get-contextskubectl config use-context <context-name>If the durable copy is missing
If you provisioned the cluster before the auto-save feature was added, or the save failed, copy it manually from wherever the original kubeconfig file exists:
mkdir -p ~/.kube/clusterscp deploy/terraform/hetzner/kubeconfig ~/.kube/clusters/hetzner-ironflow.yamlNamespace conventions
Ironflow server pods run in the ironflow namespace. Infrastructure (NATS, CloudNativePG PostgreSQL) lives in ironflow when using the bundled Helm chart configuration, or ironflow-system when deployed separately (e.g., via the Hetzner bootstrap script). The CNPG operator runs in cnpg-system. Adjust namespace flags accordingly.
Cluster Overview
Get a full picture of all Ironflow components at a glance:
Everything runs in the ironflow namespace:
kubectl get all -n ironflowIronflow in ironflow, infrastructure in ironflow-system:
# Ironflow serverkubectl get all -n ironflow
# NATS and PostgreSQLkubectl get all -n ironflow-system# Ironflow server pods onlykubectl get pods -n ironflow -l app.kubernetes.io/component=server
# Recent events (useful for diagnosing scheduling or startup issues)kubectl get events -n ironflow --sort-by=.lastTimestampHealth Checks
Ironflow Health
# Pod statuskubectl get pods -n ironflow -l app.kubernetes.io/component=server
# Port-forward and check the health endpointkubectl port-forward svc/ironflow -n ironflow 9123:9123
# In another terminal:curl http://localhost:9123/health# {"status":"healthy","timestamp":"...","version":"..."}
# Detailed overview with component status and statscurl http://localhost:9123/api/v1/overview# Check readiness (includes NATS connectivity)curl http://localhost:9123/ready# {"status":"ready"} or {"status":"not ready","issues":{"nats":"disconnected"}}/health and /ready do not require authentication. /health is the liveness probe (checks PostgreSQL only). /ready is the readiness probe (checks PostgreSQL + NATS). The /api/v1/overview endpoint requires an API key (or dev mode).
NATS Health
kubectl get pods -n ironflow -l app.kubernetes.io/name=nats
# NATS health checkkubectl exec -n ironflow ironflow-nats-0 -c nats -- wget -qO- http://localhost:8222/healthz
# JetStream status (streams, consumers, storage)kubectl exec -n ironflow ironflow-nats-0 -c nats -- wget -qO- http://localhost:8222/jsz
# Cluster route connections (clustered NATS only)kubectl exec -n ironflow ironflow-nats-0 -c nats -- wget -qO- http://localhost:8222/routezkubectl get pods -n ironflow-system -l app.kubernetes.io/name=nats
# NATS health checkkubectl exec -n ironflow-system nats-0 -c nats -- wget -qO- http://localhost:8222/healthz
# JetStream status (streams, consumers, storage)kubectl exec -n ironflow-system nats-0 -c nats -- wget -qO- http://localhost:8222/jsz
# Cluster route connectionskubectl exec -n ironflow-system nats-0 -c nats -- wget -qO- http://localhost:8222/routezPostgreSQL Health
# CNPG cluster statuskubectl get cluster ironflow-postgresql -n ironflow
# Pod statuskubectl get pods -n ironflow -l cnpg.io/cluster=ironflow-postgresql
# Check the primary is accepting connectionskubectl exec -n ironflow \ $(kubectl get pod -l cnpg.io/cluster=ironflow-postgresql,role=primary -n ironflow -o name) \ -- pg_isready -U ironflow -d ironflow# Pod statuskubectl get pods -n ironflow-system -l cnpg.io/cluster=ironflow-db
# CNPG cluster status (instances, ready count, phase)kubectl get cluster ironflow-db -n ironflow-system
# Check the primary is accepting connectionskubectl exec -n ironflow-system \ $(kubectl get pods -n ironflow-system -l cnpg.io/cluster=ironflow-db,role=primary -o name | head -1) \ -- pg_isready -U ironflow -d ironflowViewing Logs
Ironflow Server Logs
# Tail recent logs from all Ironflow podskubectl logs -l app.kubernetes.io/component=server -n ironflow --tail=50
# Stream logs in real timekubectl logs -l app.kubernetes.io/component=server -n ironflow -f
# Logs from the last 15 minuteskubectl logs -l app.kubernetes.io/component=server -n ironflow --since=15m
# All pods with pod-name prefix (useful for multi-replica)kubectl logs -l app.kubernetes.io/component=server -n ironflow --all-containers --prefixSet LOG_LEVEL to debug for detailed output. Update via Helm values (ironflow.logLevel) and run helm upgrade — pods restart automatically when the ConfigMap changes.
NATS Logs
kubectl logs ironflow-nats-0 -n ironflow -c nats --tail=30kubectl logs nats-0 -n ironflow-system -c nats --tail=30
# All NATS podskubectl logs -l app.kubernetes.io/name=nats -n ironflow-system --tail=20PostgreSQL Logs
kubectl logs -l cnpg.io/cluster=ironflow-postgresql -n ironflow --tail=30kubectl logs -l cnpg.io/cluster=ironflow-db -n ironflow-system --tail=30Accessing the Dashboard
kubectl port-forward svc/ironflow -n ironflow 9123:9123Open http://localhost:9123 in your browser.
In production mode (devMode: false), you need the admin API key and password from the bootstrap logs. These are printed only once on first boot:
kubectl logs -n ironflow $(kubectl get pods -n ironflow -l app.kubernetes.io/component=server -o name | head -1) | grep -A8 "Admin API Key"Scaling
Manual Scaling
kubectl scale deployment ironflow -n ironflow --replicas=3Scaling beyond 1 replica requires cluster mode (cluster.enabled: true in Helm values) with external NATS and PostgreSQL. See the Docker Compose Deployment guide.
Checking Autoscaler Status
If HPA is enabled (autoscaling.enabled: true):
# HPA statuskubectl get hpa ironflow -n ironflow
# Detailed metrics and scaling eventskubectl describe hpa ironflow -n ironflowPod Disruption Budget
kubectl get pdb ironflow -n ironflowConfiguration
Viewing the ConfigMap
The Helm chart generates an ironflow.yaml ConfigMap from Helm values. This is the server’s runtime configuration:
kubectl get configmap ironflow-config -n ironflow -o yamlViewing Secrets
# List secret keyskubectl get secret ironflow-secret -n ironflow -o yaml
# Decode a specific keykubectl get secret ironflow-secret -n ironflow \ -o jsonpath='{.data.database-url}' | base64 -dNever share decoded secret values. In bundled deployments the database URL lives in the CNPG-managed <release>-postgresql-app secret (key uri); the chart-managed <release>-secret holds master-key, license-key, and (for external DB without existingSecret) database-url.
Updating Configuration
Configuration changes go through Helm, not direct ConfigMap edits — the deployment template includes checksum annotations that trigger automatic pod restarts when content changes:
# Example: enable debug logginghelm upgrade ironflow deploy/helm/ironflow/ -n ironflow \ --reuse-values --set ironflow.logLevel=debug
# Example: enable metricshelm upgrade ironflow deploy/helm/ironflow/ -n ironflow \ --reuse-values --set observability.metrics.enabled=trueRestarting Pods
# Rolling restart (zero downtime with 2+ replicas)kubectl rollout restart deployment/ironflow -n ironflow
# Watch the rollout progresskubectl rollout status deployment/ironflow -n ironflow
# Restart a single pod (the deployment controller recreates it)kubectl delete pod <pod-name> -n ironflowPod restarts happen automatically after helm upgrade when ConfigMap or Secret content changes, thanks to the checksum/config and checksum/secret annotations in the deployment template.
NATS Operations
Checking JetStream Streams
# JetStream summarykubectl exec -n ironflow ironflow-nats-0 -c nats -- wget -qO- http://localhost:8222/jsz
# Stream details (includes message counts and storage)kubectl exec -n ironflow ironflow-nats-0 -c nats -- wget -qO- 'http://localhost:8222/jsz?streams=1'# JetStream summarykubectl exec -n ironflow-system nats-0 -c nats -- wget -qO- http://localhost:8222/jsz
# Stream detailskubectl exec -n ironflow-system nats-0 -c nats -- wget -qO- 'http://localhost:8222/jsz?streams=1'Key streams and KV buckets that Ironflow creates:
| Resource | Type | Purpose |
|---|---|---|
STEPS | Stream (WorkQueue) | Pull-mode job dispatch to workers |
SYS_cron_triggers | KV bucket | Cron fire deduplication across nodes |
SYS_config_* | KV bucket | Environment-scoped configuration |
SYS_secrets_* | KV bucket | Encrypted secrets storage |
APP_* | KV bucket | User-facing KV store data |
Using the nats CLI Inside the Cluster
For interactive debugging, run a temporary nats-box pod:
kubectl run nats-box --image=natsio/nats-box -n ironflow -it --rm -- sh
# Inside the pod (service name is <release>-nats):nats stream ls -s nats://ironflow-nats:4222nats kv ls -s nats://ironflow-nats:4222nats stream info STEPS -s nats://ironflow-nats:4222kubectl run nats-box --image=natsio/nats-box -n ironflow-system -it --rm -- sh
# Inside the pod (Hetzner bootstrap names the release "nats"):nats stream ls -s nats://nats:4222nats kv ls -s nats://nats:4222nats stream info STEPS -s nats://nats:4222The nats-box image includes the nats CLI and is useful for inspecting JetStream state without installing tools locally. The pod is deleted automatically when you exit (--rm).
PostgreSQL Operations
Connecting to the Database
# Connect to the primary via a temporary pod (passes the auto-generated password)kubectl run pg-client --rm -it --image=postgres:17-alpine -n ironflow \ --env="PGPASSWORD=$(kubectl get secret ironflow-postgresql-app -n ironflow -o jsonpath='{.data.password}' | base64 -d)" \ -- psql -h ironflow-postgresql-rw -U ironflow -d ironflowkubectl exec -it -n ironflow-system \ $(kubectl get pods -n ironflow-system -l cnpg.io/cluster=ironflow-db,role=primary -o name | head -1) \ -- psql -U ironflow -d ironflowBe careful running write queries against production. Use \x for expanded display and test queries with EXPLAIN first.
Useful Queries
-- Active runsSELECT count(*) FROM runs WHERE status = 'running';
-- Run breakdown by function and statusSELECT function_id, status, count(*) FROM runs GROUP BY function_id, status;
-- Stale scheduler claims (multi-node clusters)SELECT * FROM stepsWHERE status = 'waking' AND claimed_at < NOW() - INTERVAL '5 minutes';
-- Recent failuresSELECT id, function_id, status, error, created_atFROM runs WHERE status = 'failed'ORDER BY created_at DESC LIMIT 10;CNPG Cluster Status
For production deployments using CloudNativePG:
# Cluster overview (instances, ready count, phase)kubectl get cluster ironflow-db -n ironflow-system
# Detailed status including failover historykubectl describe cluster ironflow-db -n ironflow-system
# Check PVCs (persistent storage)kubectl get pvc -n ironflow-system -l cnpg.io/cluster=ironflow-dbUpgrading Ironflow
Rolling Update
# From local chart with a specific image taghelm upgrade ironflow deploy/helm/ironflow/ -n ironflow \ --reuse-values --set image.tag=0.16.0
# From OCI registryhelm upgrade ironflow oci://ghcr.io/sahina/charts/ironflow -n ironflow \ --reuse-values
# Watch the rolloutkubectl rollout status deployment/ironflow -n ironflowIronflow applies database migrations automatically on startup. With PDB enabled (podDisruptionBudget.minAvailable: 1), at least one pod stays available during the upgrade.
Rollback
# View release historyhelm history ironflow -n ironflow
# Revert to previous releasehelm rollback ironflow -n ironflowTroubleshooting
CrashLoopBackOff
# Check events and exit codeskubectl describe pod <pod-name> -n ironflow
# Logs from the previous (crashed) containerkubectl logs <pod-name> -n ironflow --previousCommon causes: database connection failure (check the database URL in ironflow-secret), NATS unreachable (check externalNats.url in Helm values), missing master key.
OOMKilled
kubectl describe pod <pod-name> -n ironflow# Look for "OOMKilled" in Last StateIncrease memory limits:
helm upgrade ironflow deploy/helm/ironflow/ -n ironflow \ --reuse-values --set resources.limits.memory=1GiPending Pods
kubectl describe pod <pod-name> -n ironflow# Check Conditions and Events sectionsCommon causes: insufficient node resources (add workers or increase node size), PVC not bound (storage class missing), node affinity constraints.
ImagePullBackOff
The container image is likely in a private registry. Create an image pull secret:
kubectl create secret docker-registry ghcr-pull-secret \ --namespace ironflow \ --docker-server=ghcr.io \ --docker-username=YOUR_USERNAME \ --docker-password=YOUR_TOKEN
helm upgrade ironflow deploy/helm/ironflow/ -n ironflow \ --reuse-values --set "imagePullSecrets[0].name=ghcr-pull-secret"Database Connection Refused
# Verify PostgreSQL is runningkubectl get pods -n ironflow-system -l cnpg.io/cluster=ironflow-db
# Check the database URL in the secretkubectl get secret ironflow-secret -n ironflow \ -o jsonpath='{.data.database-url}' | base64 -d
# Verify the service existskubectl get svc -n ironflow-system | grep ironflow-db
# Test from an Ironflow podkubectl exec -it -n ironflow $(kubectl get pods -n ironflow -l app.kubernetes.io/component=server -o name | head -1) \ -- wget -qO- http://localhost:9123/healthNATS Connection Issues
# Verify NATS pods are runningkubectl get pods -n ironflow-system -l app.kubernetes.io/name=nats
# Check NATS health from inside the clusterkubectl exec -n ironflow-system nats-0 -c nats -- wget -qO- http://localhost:8222/healthz
# Check the NATS URL in the ConfigMapkubectl get configmap ironflow-config -n ironflow -o yaml | grep nats
# Check PVCs for JetStream storagekubectl get pvc -n ironflow-systemQuick Reference
| Task | Command |
|---|---|
| Set kubeconfig | export KUBECONFIG=~/.kube/clusters/hetzner-ironflow.yaml |
| Pod status | kubectl get pods -n ironflow -l app.kubernetes.io/component=server |
| Tail logs | kubectl logs -l app.kubernetes.io/component=server -n ironflow -f --tail=50 |
| Port-forward dashboard | kubectl port-forward svc/ironflow -n ironflow 9123:9123 |
| Liveness check | curl http://localhost:9123/health |
| Readiness check | curl http://localhost:9123/ready |
| Restart pods | kubectl rollout restart deployment/ironflow -n ironflow |
| Scale replicas | kubectl scale deployment ironflow -n ironflow --replicas=N |
| View config | kubectl get configmap ironflow-config -n ironflow -o yaml |
| View events | kubectl get events -n ironflow --sort-by=.lastTimestamp |
| JetStream status (dev) | kubectl exec -n ironflow ironflow-nats-0 -c nats -- wget -qO- http://localhost:8222/jsz |
| JetStream status (prod) | kubectl exec -n ironflow-system nats-0 -c nats -- wget -qO- http://localhost:8222/jsz |
| Connect to PostgreSQL (CNPG dev) | kubectl run pg-client --rm -it --image=postgres:17-alpine -n ironflow --env="PGPASSWORD=$(kubectl get secret ironflow-postgresql-app -n ironflow -o jsonpath='{.data.password}' | base64 -d)" -- psql -h ironflow-postgresql-rw -U ironflow -d ironflow |
Monitoring Operations
If the monitoring stack is deployed (kube-prometheus-stack + BlackBox Exporter):
# Access Grafanakubectl port-forward svc/kube-prometheus-stack-grafana -n monitoring 3000:80# Open http://localhost:3000 (admin credentials from grafana-admin secret)
# Access Prometheuskubectl port-forward svc/prometheus-operated -n monitoring 9090:9090
# Check monitoring podskubectl get pods -n monitoring
# Check Prometheus targets# Port-forward Prometheus, then visit http://localhost:9090/targets
# Validate alert rules (render the chart to stdin; alerts live in the Ironflow Helm chart)helm template ironflow deploy/helm/ironflow/ --show-only templates/ironflow-alerts.yaml \ | promtool check rules /dev/stdin
# Check Alertmanager statuskubectl port-forward svc/kube-prometheus-stack-alertmanager -n monitoring 9093:9093See Also
- Helm Chart Development — chart structure, templating, and local development
- Hetzner Cloud Deployment — automated production Kubernetes on Hetzner
- Docker Compose Deployment — multi-node with PostgreSQL and NATS clustering
- Observability — Prometheus metrics, tracing, and monitoring stack