Hetzner Cloud
Set up Hetzner Cloud infrastructure for Ironflow, then deploy using deployment templates. This guide walks through every step from a blank Hetzner account to a running Ironflow instance.
For deploying Ironflow on an existing Kubernetes cluster (any provider), see Kubernetes Deployment.
Prerequisites
Install these tools before starting:
- Terraform 1.9+
- kubectl
- hcloud CLI (
brew install hcloud) - Hetzner Cloud account
Environment Variables
Set these environment variables before starting. The required variables are set in Step 1 and Step 2, but this table serves as a complete reference.
Required
| Variable | Purpose | Where to get it | Used in |
|---|---|---|---|
HCLOUD_TOKEN | Hetzner Cloud API authentication | Hetzner Console → Security → API Tokens | Step 1, Step 4 (provisioning) |
KUBECONFIG | Path to the cluster’s kubeconfig file | Generated by ironflow provision create | Step 4-7 (all kubectl commands) |
GITHUB_USERNAME | GitHub username for container registry | Your GitHub account | Step 5 (image pull secret) |
GITHUB_PAT | GitHub Personal Access Token (read:packages scope) | GitHub Token Settings | Step 5 (image pull secret) |
HETZNER_S3_ACCESS_KEY | Object storage authentication (access key) | Hetzner Console → Object Storage → Manage credentials | Step 2, Step 5 (S3 backup secret) |
HETZNER_S3_SECRET_KEY | Object storage authentication (secret key) | Hetzner Console → Object Storage → Manage credentials | Step 2, Step 5 (S3 backup secret) |
HETZNER_S3_ENDPOINT | Object storage endpoint URL | Hetzner Console → Object Storage → bucket details | Step 5 (S3 backup secret), Step 6 (deploy) |
HETZNER_S3_BUCKET | Object storage bucket name | Hetzner Console → Object Storage | Step 6 (deploy, backup destination path) |
Optional (Terraform overrides)
These override values in terraform.tfvars. You generally don’t need them since the per-template tfvars files are provided, but they’re available for CI/CD or scripted provisioning:
| Variable | Purpose | Default |
|---|---|---|
TF_VAR_cluster_name | Kubernetes cluster name | "ironflow" |
TF_VAR_location | Hetzner datacenter (fsn1, nbg1, hel1) | "fsn1" |
TF_VAR_control_plane_type | Control plane server type | "cpx22" |
TF_VAR_control_plane_count | Control plane node count (must be odd) | 3 |
TF_VAR_worker_type | Worker node server type | "cpx32" |
TF_VAR_worker_count | Worker node count | 2 |
HCLOUD_TOKEN is automatically passed to Terraform as TF_VAR_hcloud_token by the ironflow provision command. You don’t need to set both.
Step 1: Set Up Your Hetzner Project
Create a project in the Hetzner Cloud Console if you don’t have one, then generate an API token.
- Go to your project → Security → API Tokens
- Click Generate API Token with Read & Write permissions
- Save the token
export HCLOUD_TOKEN=your-token-herehcloud context create ironflow # saves the token for hcloud CLIStep 2: Create Backup Storage
Ironflow backs up PostgreSQL to S3-compatible object storage daily. Set this up before provisioning the cluster so everything is ready when you deploy.
Create a bucket
In the Hetzner Cloud Console:
- Go to Object Storage in the left sidebar
- Click Create Bucket
- Name:
ironflow-backups - Visibility: Private
- Click Create & Buy now
Hetzner Object Storage is not yet supported by the Terraform provider, so bucket creation is a manual step.
Generate S3 credentials
- Go to Object Storage → your bucket
- Click Manage credentials under S3 Credentials
- Click Generate credentials
- Note the endpoint URL from your bucket details page
- Export the credentials as environment variables:
export HETZNER_S3_ACCESS_KEY=your-access-keyexport HETZNER_S3_SECRET_KEY=your-secret-keyexport HETZNER_S3_ENDPOINT=https://fsn1.your-objectstorage.com # from bucket detailsexport HETZNER_S3_BUCKET=ironflow-backupsStep 3: Choose Your Template and Node Sizing
There are two independent choices: your Ironflow deployment template (what Ironflow runs) and your Kubernetes cluster size (what hardware it runs on).
Deployment templates
Templates control the Ironflow application: replica count, PostgreSQL HA, NATS topology, and connection pooling. You select a template when you run ironflow deploy --template <name>.
| Template | Ironflow replicas | PostgreSQL | NATS | PgBouncer | Use case |
|---|---|---|---|---|---|
| Small | 1 | Bundled, 1 instance | Bundled, 1 node | No | Dev, staging, small teams |
| Medium | 3 (HA) | Bundled, 2 instances (HA) | Bundled, 3-node cluster | Yes (2 pods) | Production |
| Large | 2-10 (HPA) | External | External | No (BYO) | Enterprise with managed deps |
Kubernetes cluster sizing
Cluster sizing controls the Hetzner servers: how many nodes and how powerful. You configure this in deploy/terraform/hetzner/terraform.tfvars before provisioning. The cluster is provisioned once and you can deploy any template onto it (as long as the hardware has enough resources).
Minimum recommended cluster per template:
| Template | Min worker RAM | Min worker CPU | Recommended cluster | Est. server cost |
|---|---|---|---|---|
| Small | 2 GB | 2 vCPU | 1 control + 1 worker (cpx22 + cpx32) | ~€15/month |
| Medium | 4 GB | 3 vCPU | 3 control + 2 workers (cpx22) | ~€38/month |
| Large | 8 GB+ | 4+ vCPU | 3 control + 2 workers (cpx32) | ~€52/month |
Server costs only. Additional costs apply for load balancer (~€6/month), volumes, Object Storage, and network traffic.
You can deploy the Small template on a Large cluster (safe, just overprovisioned) or upgrade from Small to Medium without reprovisioning — as long as the cluster has enough resources. However, switching from Small to Medium requires deleting and redeploying because NATS topology changes (1 node to 3-node cluster) can’t be upgraded in place.
Configure node sizes
Pre-built Terraform variable files are provided for each template:
deploy/terraform/hetzner/├── terraform.small.tfvars # 1 control + 1 worker├── terraform.medium.tfvars # 3 control + 2 workers├── terraform.large.tfvars # 3 control + 2 workers└── terraform.tfvars.example # Reference with all optionsThe ironflow provision create command uses these files automatically via the --template flag. If using Terraform directly, copy the one that matches your template:
cd deploy/terraform/hetznercp terraform.small.tfvars terraform.tfvars# Edit terraform.tfvars to customize cluster_name, location, etc.Available locations: fsn1 (Falkenstein), nbg1 (Nuremberg), hel1 (Helsinki). Control plane count must be odd (1, 3, or 5) for etcd quorum. For higher throughput, edit the worker type or count in your terraform.tfvars.
Step 4: Provision the Kubernetes Cluster
ironflow provision create --provider hetzner --template small --name ironflowThis runs Terraform to create the cluster (~5-8 minutes), then writes kubeconfig and talosconfig to deploy/terraform/hetzner/. A durable copy is saved to ~/.kube/clusters/hetzner-<name>.yaml.
export KUBECONFIG=~/.kube/clusters/hetzner-ironflow.yamlkubectl get nodesProvision --name vs Deploy --name
The --name you use with ironflow provision is the cluster name (Hetzner servers, networks, firewalls are named after it). The --name you use with ironflow deploy is the Helm release name (an application install within a cluster). They serve different purposes and don’t need to match.
Deploy commands default to your current kubectl context. If you manage multiple clusters, always pass --kubeconfig to deploy commands to ensure you target the correct cluster:
ironflow deploy --template small --name dev \ --kubeconfig ~/.kube/clusters/hetzner-ironflow.yamlcd deploy/terraform/hetznercp terraform.tfvars.example terraform.tfvars# Edit terraform.tfvars with your node sizing from Step 3
terraform initterraform applyProvisions a Talos Linux cluster with Cilium CNI, Hetzner CCM+CSI, cert-manager, and metrics-server (~5-8 minutes).
export KUBECONFIG=$(pwd)/kubeconfigkubectl get nodesStep 5: Create Kubernetes Secrets
With the cluster running, create the namespace and secrets that Ironflow needs.
export KUBECONFIG=~/.kube/clusters/hetzner-ironflow.yaml
# Create the namespacekubectl create namespace ironflowImage pull secret
Required if the Ironflow container image is in a private registry (e.g., private GHCR):
kubectl create secret docker-registry ghcr-pull-secret \ --namespace ironflow \ --docker-server=ghcr.io \ --docker-username=$GITHUB_USERNAME \ --docker-password=$GITHUB_PATThe GITHUB_PAT requires a GitHub Personal Access Token with the read:packages scope.
S3 backup credentials
Uses the environment variables from Step 2:
kubectl create secret generic ironflow-s3-creds -n ironflow \ --from-literal=ACCESS_KEY_ID="$HETZNER_S3_ACCESS_KEY" \ --from-literal=SECRET_ACCESS_KEY="$HETZNER_S3_SECRET_KEY"The default Small and Medium values files reference this secret name (ironflow-s3-creds) and are pre-configured for Hetzner Object Storage. The S3 destination path is auto-derived from the Helm release name (s3://ironflow-backups/<release-name>), so each deployment gets an isolated backup path. The S3 endpoint URL is passed during deploy via --set (see Step 6).
Step 6: Deploy Ironflow
ironflow deploy --template small --name devThe ironflow deploy command automatically:
-
Reads
HETZNER_S3_ENDPOINTandHETZNER_S3_BUCKETfrom environment variables and configures the S3 backup destination -
Installs these prerequisites on first deploy:
- CloudNativePG operator — manages PostgreSQL clusters (Small and Medium only)
- Barman Cloud Plugin — S3-compatible backups (Small and Medium only)
- cert-manager — TLS certificate management (all templates)
- kube-prometheus-stack — Prometheus, Grafana, and alerting (all templates)
If these are already installed, the command detects them and skips installation.
Or for Medium/Large:
# Medium — 3 replicas, NATS cluster, HA PostgreSQLironflow deploy --template medium --name staging
# Medium with Hetzner load balancer — adds Traefik ingress + LB optimizationsironflow deploy --template medium --name prod --hetzner-location fsn1
# Large — HPA, external PostgreSQL + NATSironflow deploy --template large --name prod \ --set externalDatabase.url=postgres://user:pass@host:5432/ironflow \ --set externalNats.url=nats://nats-1:4222,nats://nats-2:4222The --hetzner-location flag installs Traefik as the ingress controller with Hetzner-optimized load balancer settings (proxy protocol, private network routing, health checks). Match the location to your cluster’s datacenter (fsn1, nbg1, or hel1). See Step 8 for enabling Ingress after deploy.
If you are deploying with helm install directly (instead of the ironflow deploy CLI), you must install the prerequisites manually. See Kubernetes Deployment for manual installation commands.
For detailed deploy options (CLI vs Helm, customization, upgrades), see Kubernetes Deployment.
Step 7: Verify
# Check Ironflow podsironflow deploy status --name dev
# Check all pods are runningkubectl get pods -n ironflow
# Check PostgreSQL cluster healthkubectl get cluster -n ironflow
# Check backups are scheduledkubectl get scheduledbackups -n ironflow
# Verify health endpointskubectl port-forward svc/dev-ironflow -n ironflow 9123:9123 &curl -s http://localhost:9123/healthcurl -s http://localhost:9123/ready
# Open the dashboard at http://localhost:9123Retrieve the admin API key from the first-boot logs:
kubectl logs -n ironflow $(kubectl get pods -n ironflow \ -l app.kubernetes.io/component=server -o name | head -1) | grep -A8 "Admin API Key"Verify monitoring
# Check CNPG PodMonitorkubectl get podmonitors -n ironflow
# Check Ironflow ServiceMonitorkubectl get servicemonitors -n ironflow
# Check PostgreSQL alert ruleskubectl get prometheusrules -n ironflow
# Verify Ironflow exposes metricscurl -s http://localhost:9123/metrics | head -5Step 8: External Access via Load Balancer
By default, Ironflow is only accessible inside the cluster (ClusterIP). If you need external access for push-mode webhooks, the dashboard, or API clients, set up a load balancer. Skip this step if port-forward is sufficient (dev/staging) or if your cluster is only accessed via VPN.
When do you need a load balancer?
- Yes: Push-mode functions (external services POST to Ironflow), dashboard access for teams outside the cluster, HA failover across nodes.
- No: Dev/staging accessed via
kubectl port-forward, pull-mode only (workers connect outbound), single-team with VPN.
Option A: Ingress Controller (recommended)
If you deployed with --hetzner-location in Step 6, Traefik and a Hetzner Load Balancer are already installed. If not, re-run deploy with the flag:
ironflow deploy upgrade --template medium --name prod --hetzner-location fsn1Enable Ingress
Once the load balancer has an external IP (shown during deploy), enable Ingress with your domain:
ironflow deploy upgrade --template medium --name prod \ --set ingress.enabled=true \ --set ingress.host=ironflow.example.comPoint your DNS A record to the load balancer IP (shown during deploy).
TLS certificates are automatically issued by cert-manager via Let’s Encrypt.
Verify
# Check load balancer IPkubectl get svc -n traefik traefik
# Check Ingresskubectl get ingress -n ironflow
# Test accesscurl -k https://ironflow.example.com/healthOption B: Direct LoadBalancer Service (simple alternative)
For simple deployments without Ingress routing, you can expose the Ironflow service directly:
ironflow deploy upgrade --template medium --name prod \ --set service.type=LoadBalancer \ --set service.annotations."load-balancer\.hetzner\.cloud/location"=fsn1This creates a dedicated Hetzner LB for the Ironflow service. No hostname routing, no TLS termination at the LB level.
Load Balancer Costs
| Resource | Cost |
|---|---|
| Hetzner LB11 | ~€6/month |
| Additional bandwidth | Included (30 TB/month) |
Load Balancer Troubleshooting
Load balancer stuck in <pending>:
- Check Hetzner CCM is running:
kubectl get pods -n kube-system -l app.kubernetes.io/name=hcloud-cloud-controller-manager - Check HCLOUD_TOKEN is set in the CCM deployment
- Check Hetzner API status:
hcloud load-balancer list
All requests return 400 Bad Request:
- Proxy protocol mismatch. Both sides must be enabled or both disabled.
- Check Traefik args:
kubectl get deploy -n traefik traefik -o yaml | grep proxyProtocol - Check LB annotation:
kubectl get svc -n traefik traefik -o yaml | grep proxyprotocol
TLS certificate not issuing:
- Check cert-manager:
kubectl get certificate -n ironflow - Check ClusterIssuer:
kubectl get clusterissuer - DNS must point to the LB IP for ACME HTTP-01 challenge to work
Multi-Tenant Load Balancing
With Option A (Traefik Ingress), a single Hetzner Load Balancer serves all tenants on the cluster. Traefik reads Ingress resources across all namespaces and routes traffic by hostname.
Internet → Hetzner LB (one, ~€6/mo) → Traefik pods (NodePort, private network) → Ingress: acme.ironflow.example.com → tenant-acme/acme-ironflow → Ingress: globex.ironflow.example.com → tenant-globex/globex-ironflow → Ingress: ironflow.example.com → ironflow/prod-ironflowInstall Traefik once per cluster (see Option A above), then deploy each tenant with Ingress enabled:
# First tenanthelm install acme ./deploy/helm/ironflow \ -n tenant-acme --create-namespace \ -f deploy/helm/ironflow/values-multi-tenant.yaml \ --set ingress.enabled=true \ --set ingress.host=acme.ironflow.example.com \ --set ironflow.masterKey=$(openssl rand -hex 32)
# Install Traefik with Hetzner LB (once per cluster)# See Option A in the section above for Traefik installation
# Additional tenants — reuse the existing LBhelm install globex ./deploy/helm/ironflow \ -n tenant-globex --create-namespace \ -f deploy/helm/ironflow/values-multi-tenant.yaml \ --set ingress.enabled=true \ --set ingress.host=globex.ironflow.example.com \ --set ironflow.masterKey=$(openssl rand -hex 32)Each tenant gets its own TLS certificate (auto-issued by cert-manager) and is network-isolated via NetworkPolicy (defaultDeny: true in values-multi-tenant.yaml). The Traefik namespace is in allowNamespaces so ingress traffic can reach tenant pods.
Avoid Option B for multi-tenant
With Option B (direct LoadBalancer service), each tenant with service.type=LoadBalancer creates a separate Hetzner LB (~€6/mo each). At 10 tenants that’s ~€60/mo in LBs alone, with no hostname routing or shared TLS. Use Option A for multi-tenant deployments.
DNS Configuration
Point your domain to the load balancer IP so Traefik can route traffic and cert-manager can issue TLS certificates.
Find the load balancer IP
# From kubectlkubectl get svc -n traefik traefik -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
# Or from hcloud CLIhcloud load-balancer list -o columns=name,ipv4Option 1: Wildcard DNS (simplest for multi-tenant)
Create a single wildcard A record and all tenant subdomains resolve automatically:
*.ironflow.example.com → A <LB_IP>New tenants work immediately with --set ingress.host=<tenant>.ironflow.example.com — no DNS changes needed per tenant.
Option 2: Per-tenant DNS records
Create individual A records for each tenant:
acme.ironflow.example.com → A <LB_IP>globex.ironflow.example.com → A <LB_IP>ironflow.example.com → A <LB_IP>This gives you explicit control but requires a DNS change for each new tenant.
Hetzner DNS
If your domain uses Hetzner DNS, create records in the Hetzner DNS Console or via the API:
# Wildcard for all tenants# Hetzner DNS Console → your zone → Add Record → Type: A, Name: *, Value: <LB_IP>
# Or per-tenant# Type: A, Name: acme.ironflow, Value: <LB_IP>External DNS providers
For Cloudflare, Route53, Google Cloud DNS, or other providers, create A records pointing to the LB IP using your provider’s dashboard or CLI.
Cloudflare
If using Cloudflare, disable the proxy (orange cloud → grey cloud) for the initial setup so cert-manager’s HTTP-01 ACME challenge can reach the LB directly. You can re-enable the proxy after certificates are issued if you switch to DNS-01 challenges.
Automatic DNS with external-dns (optional)
external-dns can auto-create DNS records from Ingress resources. When a new tenant is deployed with ingress.host=acme.ironflow.example.com, external-dns automatically creates the A record at your DNS provider.
# Install external-dns (example for Hetzner DNS)helm repo add external-dns https://kubernetes-sigs.github.io/external-dnshelm install external-dns external-dns/external-dns \ -n external-dns --create-namespace \ --set provider.name=hetzner \ --set env[0].name=HETZNER_DNS_API_TOKEN \ --set env[0].value=$HETZNER_DNS_TOKENexternal-dns supports Hetzner DNS, Cloudflare, Route53, Google Cloud DNS, and many others.
TLS certificates
TLS is handled automatically. The Helm chart sets cert-manager.io/cluster-issuer: letsencrypt-prod on every Ingress resource when tls: true (the default). Once DNS points to the LB IP:
- cert-manager detects the new Ingress with TLS enabled
- Requests a Let’s Encrypt certificate via HTTP-01 challenge
- Stores the certificate as a Secret (
<release>-ironflow-tls) in the tenant’s namespace - Traefik serves HTTPS automatically
Each tenant gets its own TLS certificate. Check certificate status with:
kubectl get certificate -n tenant-acmekubectl describe certificate -n tenant-acmeCluster Management
Check status
ironflow provision status --provider hetzner --name ironflowUpgrade Ironflow
ironflow deploy upgrade --template small --name devTear down
ironflow provision destroy --provider hetzner --name ironflowFile Structure
deploy/terraform/hetzner/├── main.tf # Cluster module + providers├── variables.tf # Input variables (token, cluster name, node sizes)├── outputs.tf # kubeconfig path, talosconfig path, cluster info├── terraform.tfvars.example # Reference with all options├── terraform.small.tfvars # Small cluster: 1 control + 1 worker├── terraform.medium.tfvars # Medium cluster: 3 control + 2 workers├── terraform.large.tfvars # Large cluster: 3 control + 2 workers├── .terraform.lock.hcl # Provider lock file (committed for reproducibility)├── teardown.sh # Clean destroy with hcloud CLI fallback└── .gitignore # Ignores state files, kubeconfig, talosconfigTroubleshooting
Placement Groups Already Exist
If terraform apply fails with placement_group not unique, leftover resources from a previous run exist:
hcloud placement-group listhcloud placement-group delete <id>Terraform State Issues
If Terraform state gets out of sync, run ironflow provision destroy --provider hetzner --name ironflow (or ./teardown.sh directly) to force-clean all resources, then start fresh.
Node Not Ready
Talos Linux nodes take 1-2 minutes after provisioning to register with the Kubernetes API. If kubectl get nodes shows NotReady, wait and retry.
ImagePullBackOff
Container image is in a private registry. Create the pull secret as described in Step 5.
Firewall Blocks API Access
If kubectl times out connecting to the cluster, your IP may have changed since provisioning. The Hetzner firewall restricts port 6443 to the IP that ran Terraform. Update it:
# Find your current IPcurl -s ifconfig.me
# Update the firewall ruleshcloud firewall describe ironflow# Update Source IPs for the "Allow Incoming Requests to Kube API" rule# with your current IP via the Hetzner Cloud Console or hcloud CLI