kubectl Operations

This guide covers common kubectl operations for managing a running Ironflow deployment on Kubernetes. It assumes you have already deployed Ironflow using the Helm Chart or Hetzner Cloud guide.

Kubeconfig Setup

Before running any kubectl commands, you need to point your CLI at the right cluster. When you provision a cluster with ironflow provision create, the kubeconfig is saved to two locations:

Workspace-local (gitignored): deploy/terraform/hetzner/kubeconfig
Durable copy (survives across workspaces): ~/.kube/clusters/hetzner-<name>.yaml

The workspace-local copy is gitignored and won’t be present in new workspaces or git worktrees. Use the durable copy instead:

# Set for the current shell session
export KUBECONFIG=~/.kube/clusters/hetzner-ironflow.yaml

# Verify connectivity
kubectl get nodes

Persist across shell sessions

Add the export to your shell profile (~/.zshrc or ~/.bashrc) so kubectl always targets your cluster:

echo 'export KUBECONFIG=~/.kube/clusters/hetzner-ironflow.yaml' >> ~/.zshrc

Alternatively, merge it into your default kubeconfig so it’s available as a named context:

# Merge into ~/.kube/config (back up first)
cp ~/.kube/config ~/.kube/config.bak
KUBECONFIG=~/.kube/config:~/.kube/clusters/hetzner-ironflow.yaml kubectl config view --flatten > /tmp/merged-config
mv /tmp/merged-config ~/.kube/config

# Switch between contexts
kubectl config get-contexts
kubectl config use-context <context-name>

If the durable copy is missing

If you provisioned the cluster before the auto-save feature was added, or the save failed, copy it manually from wherever the original kubeconfig file exists:

mkdir -p ~/.kube/clusters
cp deploy/terraform/hetzner/kubeconfig ~/.kube/clusters/hetzner-ironflow.yaml

Namespace conventions

Ironflow server pods run in the ironflow namespace. Infrastructure (NATS, CloudNativePG PostgreSQL) lives in ironflow when using the bundled Helm chart configuration, or ironflow-system when deployed separately (e.g., via the Hetzner bootstrap script). The CNPG operator runs in cnpg-system. Adjust namespace flags accordingly.

Cluster Overview

Get a full picture of all Ironflow components at a glance:

Dev (bundled subcharts)
Production (external infra)

Everything runs in the ironflow namespace:

kubectl get all -n ironflow

Ironflow in ironflow, infrastructure in ironflow-system:

# Ironflow server
kubectl get all -n ironflow

# NATS and PostgreSQL
kubectl get all -n ironflow-system

# Ironflow server pods only
kubectl get pods -n ironflow -l app.kubernetes.io/component=server

# Recent events (useful for diagnosing scheduling or startup issues)
kubectl get events -n ironflow --sort-by=.lastTimestamp

Health Checks

Ironflow Health

# Pod status
kubectl get pods -n ironflow -l app.kubernetes.io/component=server

# Port-forward and check the health endpoint
kubectl port-forward svc/ironflow -n ironflow 9123:9123

# In another terminal:
curl http://localhost:9123/health
# {"status":"healthy","timestamp":"...","version":"..."}

# Detailed overview with component status and stats
curl http://localhost:9123/api/v1/overview

# Check readiness (includes NATS connectivity)
curl http://localhost:9123/ready
# {"status":"ready"} or {"status":"not ready","issues":{"nats":"disconnected"}}

/health and /ready do not require authentication. /health is the liveness probe (checks PostgreSQL only). /ready is the readiness probe (checks PostgreSQL + NATS). The /api/v1/overview endpoint requires an API key (or dev mode).

kubectl get pods -n ironflow -l app.kubernetes.io/name=nats

# NATS health check
kubectl exec -n ironflow ironflow-nats-0 -c nats -- wget -qO- http://localhost:8222/healthz

# JetStream status (streams, consumers, storage)
kubectl exec -n ironflow ironflow-nats-0 -c nats -- wget -qO- http://localhost:8222/jsz

# Cluster route connections (clustered NATS only)
kubectl exec -n ironflow ironflow-nats-0 -c nats -- wget -qO- http://localhost:8222/routez

kubectl get pods -n ironflow-system -l app.kubernetes.io/name=nats

# NATS health check
kubectl exec -n ironflow-system nats-0 -c nats -- wget -qO- http://localhost:8222/healthz

# JetStream status (streams, consumers, storage)
kubectl exec -n ironflow-system nats-0 -c nats -- wget -qO- http://localhost:8222/jsz

# Cluster route connections
kubectl exec -n ironflow-system nats-0 -c nats -- wget -qO- http://localhost:8222/routez

PostgreSQL Health

CloudNativePG (dev)
CloudNativePG (production)

# CNPG cluster status
kubectl get cluster ironflow-postgresql -n ironflow

# Pod status
kubectl get pods -n ironflow -l cnpg.io/cluster=ironflow-postgresql

# Check the primary is accepting connections
kubectl exec -n ironflow \
  $(kubectl get pod -l cnpg.io/cluster=ironflow-postgresql,role=primary -n ironflow -o name) \
  -- pg_isready -U ironflow -d ironflow

# Pod status
kubectl get pods -n ironflow-system -l cnpg.io/cluster=ironflow-db

# CNPG cluster status (instances, ready count, phase)
kubectl get cluster ironflow-db -n ironflow-system

# Check the primary is accepting connections
kubectl exec -n ironflow-system \
  $(kubectl get pods -n ironflow-system -l cnpg.io/cluster=ironflow-db,role=primary -o name | head -1) \
  -- pg_isready -U ironflow -d ironflow

Viewing Logs

Ironflow Server Logs

# Tail recent logs from all Ironflow pods
kubectl logs -l app.kubernetes.io/component=server -n ironflow --tail=50

# Stream logs in real time
kubectl logs -l app.kubernetes.io/component=server -n ironflow -f

# Logs from the last 15 minutes
kubectl logs -l app.kubernetes.io/component=server -n ironflow --since=15m

# All pods with pod-name prefix (useful for multi-replica)
kubectl logs -l app.kubernetes.io/component=server -n ironflow --all-containers --prefix

Set LOG_LEVEL to debug for detailed output. Update via Helm values (ironflow.logLevel) and run helm upgrade — pods restart automatically when the ConfigMap changes.

NATS Logs

Dev (bundled)
Production

kubectl logs ironflow-nats-0 -n ironflow -c nats --tail=30

kubectl logs nats-0 -n ironflow-system -c nats --tail=30

# All NATS pods
kubectl logs -l app.kubernetes.io/name=nats -n ironflow-system --tail=20

PostgreSQL Logs

CloudNativePG (dev)
CloudNativePG (production)

kubectl logs -l cnpg.io/cluster=ironflow-postgresql -n ironflow --tail=30

kubectl logs -l cnpg.io/cluster=ironflow-db -n ironflow-system --tail=30

Accessing the Dashboard

kubectl port-forward svc/ironflow -n ironflow 9123:9123

Open http://localhost:9123 in your browser.

In production mode (devMode: false), you need the admin API key and password from the bootstrap logs. These are printed only once on first boot:

kubectl logs -n ironflow $(kubectl get pods -n ironflow -l app.kubernetes.io/component=server -o name | head -1) | grep -A8 "Admin API Key"

Scaling

Manual Scaling

kubectl scale deployment ironflow -n ironflow --replicas=3

Scaling beyond 1 replica requires cluster mode (cluster.enabled: true in Helm values) with external NATS and PostgreSQL. See the Docker Compose Deployment guide.

Checking Autoscaler Status

If HPA is enabled (autoscaling.enabled: true):

# HPA status
kubectl get hpa ironflow -n ironflow

# Detailed metrics and scaling events
kubectl describe hpa ironflow -n ironflow

Pod Disruption Budget

kubectl get pdb ironflow -n ironflow

Configuration

Viewing the ConfigMap

The Helm chart generates an ironflow.yaml ConfigMap from Helm values. This is the server’s runtime configuration:

kubectl get configmap ironflow-config -n ironflow -o yaml

Viewing Secrets

# List secret keys
kubectl get secret ironflow-secret -n ironflow -o yaml

# Decode a specific key
kubectl get secret ironflow-secret -n ironflow \
  -o jsonpath='{.data.database-url}' | base64 -d

Never share decoded secret values. In bundled deployments the database URL lives in the CNPG-managed <release>-postgresql-app secret (key uri); the chart-managed <release>-secret holds master-key, license-key, and (for external DB without existingSecret) database-url.

Updating Configuration

Configuration changes go through Helm, not direct ConfigMap edits — the deployment template includes checksum annotations that trigger automatic pod restarts when content changes:

# Example: enable debug logging
helm upgrade ironflow deploy/helm/ironflow/ -n ironflow \
  --reuse-values --set ironflow.logLevel=debug

# Example: enable metrics
helm upgrade ironflow deploy/helm/ironflow/ -n ironflow \
  --reuse-values --set observability.metrics.enabled=true

Restarting Pods

# Rolling restart (zero downtime with 2+ replicas)
kubectl rollout restart deployment/ironflow -n ironflow

# Watch the rollout progress
kubectl rollout status deployment/ironflow -n ironflow

# Restart a single pod (the deployment controller recreates it)
kubectl delete pod <pod-name> -n ironflow

Pod restarts happen automatically after helm upgrade when ConfigMap or Secret content changes, thanks to the checksum/config and checksum/secret annotations in the deployment template.

NATS Operations

Checking JetStream Streams

Dev (bundled)
Production

# JetStream summary
kubectl exec -n ironflow ironflow-nats-0 -c nats -- wget -qO- http://localhost:8222/jsz

# Stream details (includes message counts and storage)
kubectl exec -n ironflow ironflow-nats-0 -c nats -- wget -qO- 'http://localhost:8222/jsz?streams=1'

# JetStream summary
kubectl exec -n ironflow-system nats-0 -c nats -- wget -qO- http://localhost:8222/jsz

# Stream details
kubectl exec -n ironflow-system nats-0 -c nats -- wget -qO- 'http://localhost:8222/jsz?streams=1'

Key streams and KV buckets that Ironflow creates:

Resource	Type	Purpose
`STEPS`	Stream (WorkQueue)	Pull-mode job dispatch to workers
`SYS_cron_triggers`	KV bucket	Cron fire deduplication across nodes
`SYS_config_*`	KV bucket	Environment-scoped configuration
`SYS_secrets_*`	KV bucket	Encrypted secrets storage
`APP_*`	KV bucket	User-facing KV store data

Using the nats CLI Inside the Cluster

For interactive debugging, run a temporary nats-box pod:

Dev (bundled)
Production

kubectl run nats-box --image=natsio/nats-box -n ironflow -it --rm -- sh

# Inside the pod (service name is <release>-nats):
nats stream ls -s nats://ironflow-nats:4222
nats kv ls -s nats://ironflow-nats:4222
nats stream info STEPS -s nats://ironflow-nats:4222

kubectl run nats-box --image=natsio/nats-box -n ironflow-system -it --rm -- sh

# Inside the pod (Hetzner bootstrap names the release "nats"):
nats stream ls -s nats://nats:4222
nats kv ls -s nats://nats:4222
nats stream info STEPS -s nats://nats:4222

The nats-box image includes the nats CLI and is useful for inspecting JetStream state without installing tools locally. The pod is deleted automatically when you exit (--rm).

PostgreSQL Operations

Connecting to the Database

CloudNativePG (dev)
CloudNativePG (production)

# Connect to the primary via a temporary pod (passes the auto-generated password)
kubectl run pg-client --rm -it --image=postgres:17-alpine -n ironflow \
  --env="PGPASSWORD=$(kubectl get secret ironflow-postgresql-app -n ironflow -o jsonpath='{.data.password}' | base64 -d)" \
  -- psql -h ironflow-postgresql-rw -U ironflow -d ironflow

kubectl exec -it -n ironflow-system \
  $(kubectl get pods -n ironflow-system -l cnpg.io/cluster=ironflow-db,role=primary -o name | head -1) \
  -- psql -U ironflow -d ironflow

Be careful running write queries against production. Use \x for expanded display and test queries with EXPLAIN first.

Useful Queries

-- Active runs
SELECT count(*) FROM runs WHERE status = 'running';

-- Run breakdown by function and status
SELECT function_id, status, count(*) FROM runs GROUP BY function_id, status;

-- Stale scheduler claims (multi-node clusters)
SELECT * FROM steps
WHERE status = 'waking'
  AND claimed_at < NOW() - INTERVAL '5 minutes';

-- Recent failures
SELECT id, function_id, status, error, created_at
FROM runs WHERE status = 'failed'
ORDER BY created_at DESC LIMIT 10;

CNPG Cluster Status

For production deployments using CloudNativePG:

# Cluster overview (instances, ready count, phase)
kubectl get cluster ironflow-db -n ironflow-system

# Detailed status including failover history
kubectl describe cluster ironflow-db -n ironflow-system

# Check PVCs (persistent storage)
kubectl get pvc -n ironflow-system -l cnpg.io/cluster=ironflow-db

Upgrading Ironflow

Rolling Update

# From local chart with a specific image tag
helm upgrade ironflow deploy/helm/ironflow/ -n ironflow \
  --reuse-values --set image.tag=0.16.0

# From OCI registry
helm upgrade ironflow oci://ghcr.io/sahina/charts/ironflow -n ironflow \
  --reuse-values

# Watch the rollout
kubectl rollout status deployment/ironflow -n ironflow

Ironflow applies database migrations automatically on startup. With PDB enabled (podDisruptionBudget.minAvailable: 1), at least one pod stays available during the upgrade.

Rollback

# View release history
helm history ironflow -n ironflow

# Revert to previous release
helm rollback ironflow -n ironflow

Troubleshooting

CrashLoopBackOff

# Check events and exit codes
kubectl describe pod <pod-name> -n ironflow

# Logs from the previous (crashed) container
kubectl logs <pod-name> -n ironflow --previous

Common causes: database connection failure (check the database URL in ironflow-secret), NATS unreachable (check externalNats.url in Helm values), missing master key.

OOMKilled

kubectl describe pod <pod-name> -n ironflow
# Look for "OOMKilled" in Last State

Increase memory limits:

helm upgrade ironflow deploy/helm/ironflow/ -n ironflow \
  --reuse-values --set resources.limits.memory=1Gi

Pending Pods

kubectl describe pod <pod-name> -n ironflow
# Check Conditions and Events sections

Common causes: insufficient node resources (add workers or increase node size), PVC not bound (storage class missing), node affinity constraints.

ImagePullBackOff

The container image is likely in a private registry. Create an image pull secret:

kubectl create secret docker-registry ghcr-pull-secret \
  --namespace ironflow \
  --docker-server=ghcr.io \
  --docker-username=YOUR_USERNAME \
  --docker-password=YOUR_TOKEN

helm upgrade ironflow deploy/helm/ironflow/ -n ironflow \
  --reuse-values --set "imagePullSecrets[0].name=ghcr-pull-secret"

Database Connection Refused

# Verify PostgreSQL is running
kubectl get pods -n ironflow-system -l cnpg.io/cluster=ironflow-db

# Check the database URL in the secret
kubectl get secret ironflow-secret -n ironflow \
  -o jsonpath='{.data.database-url}' | base64 -d

# Verify the service exists
kubectl get svc -n ironflow-system | grep ironflow-db

# Test from an Ironflow pod
kubectl exec -it -n ironflow $(kubectl get pods -n ironflow -l app.kubernetes.io/component=server -o name | head -1) \
  -- wget -qO- http://localhost:9123/health

NATS Connection Issues

# Verify NATS pods are running
kubectl get pods -n ironflow-system -l app.kubernetes.io/name=nats

# Check NATS health from inside the cluster
kubectl exec -n ironflow-system nats-0 -c nats -- wget -qO- http://localhost:8222/healthz

# Check the NATS URL in the ConfigMap
kubectl get configmap ironflow-config -n ironflow -o yaml | grep nats

# Check PVCs for JetStream storage
kubectl get pvc -n ironflow-system

Quick Reference

Task	Command
Set kubeconfig	`export KUBECONFIG=~/.kube/clusters/hetzner-ironflow.yaml`
Pod status	`kubectl get pods -n ironflow -l app.kubernetes.io/component=server`
Tail logs	`kubectl logs -l app.kubernetes.io/component=server -n ironflow -f --tail=50`
Port-forward dashboard	`kubectl port-forward svc/ironflow -n ironflow 9123:9123`
Liveness check	`curl http://localhost:9123/health`
Readiness check	`curl http://localhost:9123/ready`
Restart pods	`kubectl rollout restart deployment/ironflow -n ironflow`
Scale replicas	`kubectl scale deployment ironflow -n ironflow --replicas=N`
View config	`kubectl get configmap ironflow-config -n ironflow -o yaml`
View events	`kubectl get events -n ironflow --sort-by=.lastTimestamp`
JetStream status (dev)	`kubectl exec -n ironflow ironflow-nats-0 -c nats -- wget -qO- http://localhost:8222/jsz`
JetStream status (prod)	`kubectl exec -n ironflow-system nats-0 -c nats -- wget -qO- http://localhost:8222/jsz`
Connect to PostgreSQL (CNPG dev)	`kubectl run pg-client --rm -it --image=postgres:17-alpine -n ironflow --env="PGPASSWORD=$(kubectl get secret ironflow-postgresql-app -n ironflow -o jsonpath='{.data.password}' \| base64 -d)" -- psql -h ironflow-postgresql-rw -U ironflow -d ironflow`

Monitoring Operations

If the monitoring stack is deployed (kube-prometheus-stack + BlackBox Exporter):

# Access Grafana
kubectl port-forward svc/kube-prometheus-stack-grafana -n monitoring 3000:80
# Open http://localhost:3000 (admin credentials from grafana-admin secret)

# Access Prometheus
kubectl port-forward svc/prometheus-operated -n monitoring 9090:9090

# Check monitoring pods
kubectl get pods -n monitoring

# Check Prometheus targets
# Port-forward Prometheus, then visit http://localhost:9090/targets

# Validate alert rules (render the chart to stdin; alerts live in the Ironflow Helm chart)
helm template ironflow deploy/helm/ironflow/ --show-only templates/ironflow-alerts.yaml \
  | promtool check rules /dev/stdin

# Check Alertmanager status
kubectl port-forward svc/kube-prometheus-stack-alertmanager -n monitoring 9093:9093