Hetzner Cloud

Set up Hetzner Cloud infrastructure for Ironflow, then deploy using deployment templates. This guide walks through every step from a blank Hetzner account to a running Ironflow instance.

For deploying Ironflow on an existing Kubernetes cluster (any provider), see Kubernetes Deployment.

Prerequisites

Install these tools before starting:

Terraform 1.9+
kubectl
hcloud CLI (brew install hcloud)
Hetzner Cloud account

Environment Variables

Set these environment variables before starting. The required variables are set in Step 1 and Step 2, but this table serves as a complete reference.

Required

Variable	Purpose	Where to get it	Used in
`HCLOUD_TOKEN`	Hetzner Cloud API authentication	Hetzner Console → Security → API Tokens	Step 1, Step 4 (provisioning)
`KUBECONFIG`	Path to the cluster’s kubeconfig file	Generated by `ironflow provision create`	Step 4-7 (all kubectl commands)
`GITHUB_USERNAME`	GitHub username for container registry	Your GitHub account	Step 5 (image pull secret)
`GITHUB_PAT`	GitHub Personal Access Token (`read:packages` scope)	GitHub Token Settings	Step 5 (image pull secret)
`HETZNER_S3_ACCESS_KEY`	Object storage authentication (access key)	Hetzner Console → Object Storage → Manage credentials	Step 2, Step 5 (S3 backup secret)
`HETZNER_S3_SECRET_KEY`	Object storage authentication (secret key)	Hetzner Console → Object Storage → Manage credentials	Step 2, Step 5 (S3 backup secret)
`HETZNER_S3_ENDPOINT`	Object storage endpoint URL	Hetzner Console → Object Storage → bucket details	Step 5 (S3 backup secret), Step 6 (deploy)
`HETZNER_S3_BUCKET`	Object storage bucket name	Hetzner Console → Object Storage	Step 6 (deploy, backup destination path)

Optional (Terraform overrides)

These override values in terraform.tfvars. You generally don’t need them since the per-template tfvars files are provided, but they’re available for CI/CD or scripted provisioning:

Variable	Purpose	Default
`TF_VAR_cluster_name`	Kubernetes cluster name	`"ironflow"`
`TF_VAR_location`	Hetzner datacenter (`fsn1`, `nbg1`, `hel1`)	`"fsn1"`
`TF_VAR_control_plane_type`	Control plane server type	`"cpx22"`
`TF_VAR_control_plane_count`	Control plane node count (must be odd)	`3`
`TF_VAR_worker_type`	Worker node server type	`"cpx32"`
`TF_VAR_worker_count`	Worker node count	`2`

HCLOUD_TOKEN is automatically passed to Terraform as TF_VAR_hcloud_token by the ironflow provision command. You don’t need to set both.

Step 1: Set Up Your Hetzner Project

Create a project in the Hetzner Cloud Console if you don’t have one, then generate an API token.

Go to your project → Security → API Tokens
Click Generate API Token with Read & Write permissions
Save the token

export HCLOUD_TOKEN=your-token-here
hcloud context create ironflow   # saves the token for hcloud CLI

Step 2: Create Backup Storage

Ironflow backs up PostgreSQL to S3-compatible object storage daily. Set this up before provisioning the cluster so everything is ready when you deploy.

Create a bucket

In the Hetzner Cloud Console:

Go to Object Storage in the left sidebar
Click Create Bucket
Name: ironflow-backups
Visibility: Private
Click Create & Buy now

Hetzner Object Storage is not yet supported by the Terraform provider, so bucket creation is a manual step.

Generate S3 credentials

Go to Object Storage → your bucket
Click Manage credentials under S3 Credentials
Click Generate credentials
Note the endpoint URL from your bucket details page
Export the credentials as environment variables:

export HETZNER_S3_ACCESS_KEY=your-access-key
export HETZNER_S3_SECRET_KEY=your-secret-key
export HETZNER_S3_ENDPOINT=https://fsn1.your-objectstorage.com  # from bucket details
export HETZNER_S3_BUCKET=ironflow-backups

Step 3: Choose Your Template and Node Sizing

There are two independent choices: your Ironflow deployment template (what Ironflow runs) and your Kubernetes cluster size (what hardware it runs on).

Deployment templates

Templates control the Ironflow application: replica count, PostgreSQL HA, NATS topology, and connection pooling. You select a template when you run ironflow deploy --template <name>.

Template	Ironflow replicas	PostgreSQL	NATS	PgBouncer	Use case
Small	1	Bundled, 1 instance	Bundled, 1 node	No	Dev, staging, small teams
Medium	3 (HA)	Bundled, 2 instances (HA)	Bundled, 3-node cluster	Yes (2 pods)	Production
Large	2-10 (HPA)	External	External	No (BYO)	Enterprise with managed deps

Kubernetes cluster sizing

Cluster sizing controls the Hetzner servers: how many nodes and how powerful. You configure this in deploy/terraform/hetzner/terraform.tfvars before provisioning. The cluster is provisioned once and you can deploy any template onto it (as long as the hardware has enough resources).

Minimum recommended cluster per template:

Template	Min worker RAM	Min worker CPU	Recommended cluster	Est. server cost
Small	2 GB	2 vCPU	1 control + 1 worker (cpx22 + cpx32)	~€15/month
Medium	4 GB	3 vCPU	3 control + 2 workers (cpx22)	~€38/month
Large	8 GB+	4+ vCPU	3 control + 2 workers (cpx32)	~€52/month

Server costs only. Additional costs apply for load balancer (~€6/month), volumes, Object Storage, and network traffic.

You can deploy the Small template on a Large cluster (safe, just overprovisioned) or upgrade from Small to Medium without reprovisioning — as long as the cluster has enough resources. However, switching from Small to Medium requires deleting and redeploying because NATS topology changes (1 node to 3-node cluster) can’t be upgraded in place.

Configure node sizes

Pre-built Terraform variable files are provided for each template:

deploy/terraform/hetzner/
├── terraform.small.tfvars    # 1 control + 1 worker
├── terraform.medium.tfvars   # 3 control + 2 workers
├── terraform.large.tfvars    # 3 control + 2 workers
└── terraform.tfvars.example  # Reference with all options

The ironflow provision create command uses these files automatically via the --template flag. If using Terraform directly, copy the one that matches your template:

cd deploy/terraform/hetzner
cp terraform.small.tfvars terraform.tfvars
# Edit terraform.tfvars to customize cluster_name, location, etc.

Available locations: fsn1 (Falkenstein), nbg1 (Nuremberg), hel1 (Helsinki). Control plane count must be odd (1, 3, or 5) for etcd quorum. For higher throughput, edit the worker type or count in your terraform.tfvars.

ironflow provision create --provider hetzner --template small --name ironflow

This runs Terraform to create the cluster (~5-8 minutes), then writes kubeconfig and talosconfig to deploy/terraform/hetzner/. A durable copy is saved to ~/.kube/clusters/hetzner-<name>.yaml.

export KUBECONFIG=~/.kube/clusters/hetzner-ironflow.yaml
kubectl get nodes

Provision --name vs Deploy --name

The --name you use with ironflow provision is the cluster name (Hetzner servers, networks, firewalls are named after it). The --name you use with ironflow deploy is the Helm release name (an application install within a cluster). They serve different purposes and don’t need to match.

Deploy commands default to your current kubectl context. If you manage multiple clusters, always pass --kubeconfig to deploy commands to ensure you target the correct cluster:

ironflow deploy --template small --name dev \
  --kubeconfig ~/.kube/clusters/hetzner-ironflow.yaml

cd deploy/terraform/hetzner
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your node sizing from Step 3

terraform init
terraform apply

Provisions a Talos Linux cluster with Cilium CNI, Hetzner CCM+CSI, cert-manager, and metrics-server (~5-8 minutes).

export KUBECONFIG=$(pwd)/kubeconfig
kubectl get nodes

Step 5: Create Kubernetes Secrets

With the cluster running, create the namespace and secrets that Ironflow needs.

export KUBECONFIG=~/.kube/clusters/hetzner-ironflow.yaml

# Create the namespace
kubectl create namespace ironflow

Image pull secret

Required if the Ironflow container image is in a private registry (e.g., private GHCR):

kubectl create secret docker-registry ghcr-pull-secret \
  --namespace ironflow \
  --docker-server=ghcr.io \
  --docker-username=$GITHUB_USERNAME \
  --docker-password=$GITHUB_PAT

The GITHUB_PAT requires a GitHub Personal Access Token with the read:packages scope.

S3 backup credentials

Uses the environment variables from Step 2:

kubectl create secret generic ironflow-s3-creds -n ironflow \
  --from-literal=ACCESS_KEY_ID="$HETZNER_S3_ACCESS_KEY" \
  --from-literal=SECRET_ACCESS_KEY="$HETZNER_S3_SECRET_KEY"

The default Small and Medium values files reference this secret name (ironflow-s3-creds) and are pre-configured for Hetzner Object Storage. The S3 destination path is auto-derived from the Helm release name (s3://ironflow-backups/<release-name>), so each deployment gets an isolated backup path. The S3 endpoint URL is passed during deploy via --set (see Step 6).

Step 6: Deploy Ironflow

ironflow deploy --template small --name dev

The ironflow deploy command automatically:

Reads HETZNER_S3_ENDPOINT and HETZNER_S3_BUCKET from environment variables and configures the S3 backup destination
Installs these prerequisites on first deploy:
- CloudNativePG operator — manages PostgreSQL clusters (Small and Medium only)
- Barman Cloud Plugin — S3-compatible backups (Small and Medium only)
- cert-manager — TLS certificate management (all templates)
- kube-prometheus-stack — Prometheus, Grafana, and alerting (all templates)

If these are already installed, the command detects them and skips installation.

Or for Medium/Large:

# Medium — 3 replicas, NATS cluster, HA PostgreSQL
ironflow deploy --template medium --name staging

# Medium with Hetzner load balancer — adds Traefik ingress + LB optimizations
ironflow deploy --template medium --name prod --hetzner-location fsn1

# Large — HPA, external PostgreSQL + NATS
ironflow deploy --template large --name prod \
  --set externalDatabase.url=postgres://user:pass@host:5432/ironflow \
  --set externalNats.url=nats://nats-1:4222,nats://nats-2:4222

The --hetzner-location flag installs Traefik as the ingress controller with Hetzner-optimized load balancer settings (proxy protocol, private network routing, health checks). Match the location to your cluster’s datacenter (fsn1, nbg1, or hel1). See Step 8 for enabling Ingress after deploy.

If you are deploying with helm install directly (instead of the ironflow deploy CLI), you must install the prerequisites manually. See Kubernetes Deployment for manual installation commands.

For detailed deploy options (CLI vs Helm, customization, upgrades), see Kubernetes Deployment.

Step 7: Verify

# Check Ironflow pods
ironflow deploy status --name dev

# Check all pods are running
kubectl get pods -n ironflow

# Check PostgreSQL cluster health
kubectl get cluster -n ironflow

# Check backups are scheduled
kubectl get scheduledbackups -n ironflow

# Verify health endpoints
kubectl port-forward svc/dev-ironflow -n ironflow 9123:9123 &
curl -s http://localhost:9123/health
curl -s http://localhost:9123/ready

# Open the dashboard at http://localhost:9123

Retrieve the admin API key from the first-boot logs:

kubectl logs -n ironflow $(kubectl get pods -n ironflow \
  -l app.kubernetes.io/component=server -o name | head -1) | grep -A8 "Admin API Key"

Verify monitoring

# Check CNPG PodMonitor
kubectl get podmonitors -n ironflow

# Check Ironflow ServiceMonitor
kubectl get servicemonitors -n ironflow

# Check PostgreSQL alert rules
kubectl get prometheusrules -n ironflow

# Verify Ironflow exposes metrics
curl -s http://localhost:9123/metrics | head -5

Step 8: External Access via Load Balancer

By default, Ironflow is only accessible inside the cluster (ClusterIP). If you need external access for push-mode webhooks, the dashboard, or API clients, set up a load balancer. Skip this step if port-forward is sufficient (dev/staging) or if your cluster is only accessed via VPN.

When do you need a load balancer?

Yes: Push-mode functions (external services POST to Ironflow), dashboard access for teams outside the cluster, HA failover across nodes.
No: Dev/staging accessed via kubectl port-forward, pull-mode only (workers connect outbound), single-team with VPN.

Option A: Ingress Controller (recommended)

If you deployed with --hetzner-location in Step 6, Traefik and a Hetzner Load Balancer are already installed. If not, re-run deploy with the flag:

ironflow deploy upgrade --template medium --name prod --hetzner-location fsn1

Enable Ingress

Once the load balancer has an external IP (shown during deploy), enable Ingress with your domain:

ironflow deploy upgrade --template medium --name prod \
  --set ingress.enabled=true \
  --set ingress.host=ironflow.example.com

Point your DNS A record to the load balancer IP (shown during deploy).

TLS certificates are automatically issued by cert-manager via Let’s Encrypt.

Verify

# Check load balancer IP
kubectl get svc -n traefik traefik

# Check Ingress
kubectl get ingress -n ironflow

# Test access
curl -k https://ironflow.example.com/health

Option B: Direct LoadBalancer Service (simple alternative)

For simple deployments without Ingress routing, you can expose the Ironflow service directly:

ironflow deploy upgrade --template medium --name prod \
  --set service.type=LoadBalancer \
  --set service.annotations."load-balancer\.hetzner\.cloud/location"=fsn1

This creates a dedicated Hetzner LB for the Ironflow service. No hostname routing, no TLS termination at the LB level.

Load Balancer Costs

Resource	Cost
Hetzner LB11	~€6/month
Additional bandwidth	Included (30 TB/month)

Load Balancer Troubleshooting

Load balancer stuck in <pending>:

Check Hetzner CCM is running: kubectl get pods -n kube-system -l app.kubernetes.io/name=hcloud-cloud-controller-manager
Check HCLOUD_TOKEN is set in the CCM deployment
Check Hetzner API status: hcloud load-balancer list

All requests return 400 Bad Request:

Proxy protocol mismatch. Both sides must be enabled or both disabled.
Check Traefik args: kubectl get deploy -n traefik traefik -o yaml | grep proxyProtocol
Check LB annotation: kubectl get svc -n traefik traefik -o yaml | grep proxyprotocol

TLS certificate not issuing:

Check cert-manager: kubectl get certificate -n ironflow
Check ClusterIssuer: kubectl get clusterissuer
DNS must point to the LB IP for ACME HTTP-01 challenge to work

Multi-Tenant Load Balancing

With Option A (Traefik Ingress), a single Hetzner Load Balancer serves all tenants on the cluster. Traefik reads Ingress resources across all namespaces and routes traffic by hostname.

Internet → Hetzner LB (one, ~€6/mo)
           → Traefik pods (NodePort, private network)
              → Ingress: acme.ironflow.example.com    → tenant-acme/acme-ironflow
              → Ingress: globex.ironflow.example.com   → tenant-globex/globex-ironflow
              → Ingress: ironflow.example.com           → ironflow/prod-ironflow

Install Traefik once per cluster (see Option A above), then deploy each tenant with Ingress enabled:

# First tenant
helm install acme ./deploy/helm/ironflow \
  -n tenant-acme --create-namespace \
  -f deploy/helm/ironflow/values-multi-tenant.yaml \
  --set ingress.enabled=true \
  --set ingress.host=acme.ironflow.example.com \
  --set ironflow.masterKey=$(openssl rand -hex 32)

# Install Traefik with Hetzner LB (once per cluster)
# See Option A in the section above for Traefik installation

# Additional tenants — reuse the existing LB
helm install globex ./deploy/helm/ironflow \
  -n tenant-globex --create-namespace \
  -f deploy/helm/ironflow/values-multi-tenant.yaml \
  --set ingress.enabled=true \
  --set ingress.host=globex.ironflow.example.com \
  --set ironflow.masterKey=$(openssl rand -hex 32)

Each tenant gets its own TLS certificate (auto-issued by cert-manager) and is network-isolated via NetworkPolicy (defaultDeny: true in values-multi-tenant.yaml). The Traefik namespace is in allowNamespaces so ingress traffic can reach tenant pods.

Avoid Option B for multi-tenant

With Option B (direct LoadBalancer service), each tenant with service.type=LoadBalancer creates a separate Hetzner LB (~€6/mo each). At 10 tenants that’s ~€60/mo in LBs alone, with no hostname routing or shared TLS. Use Option A for multi-tenant deployments.

DNS Configuration

Point your domain to the load balancer IP so Traefik can route traffic and cert-manager can issue TLS certificates.

Find the load balancer IP

# From kubectl
kubectl get svc -n traefik traefik -o jsonpath='{.status.loadBalancer.ingress[0].ip}'

# Or from hcloud CLI
hcloud load-balancer list -o columns=name,ipv4

Option 1: Wildcard DNS (simplest for multi-tenant)

Create a single wildcard A record and all tenant subdomains resolve automatically:

*.ironflow.example.com  →  A  <LB_IP>

New tenants work immediately with --set ingress.host=<tenant>.ironflow.example.com — no DNS changes needed per tenant.

Option 2: Per-tenant DNS records

Create individual A records for each tenant:

acme.ironflow.example.com    →  A  <LB_IP>
globex.ironflow.example.com  →  A  <LB_IP>
ironflow.example.com         →  A  <LB_IP>

This gives you explicit control but requires a DNS change for each new tenant.

Hetzner DNS

If your domain uses Hetzner DNS, create records in the Hetzner DNS Console or via the API:

# Wildcard for all tenants
# Hetzner DNS Console → your zone → Add Record → Type: A, Name: *, Value: <LB_IP>

# Or per-tenant
# Type: A, Name: acme.ironflow, Value: <LB_IP>

External DNS providers

For Cloudflare, Route53, Google Cloud DNS, or other providers, create A records pointing to the LB IP using your provider’s dashboard or CLI.

Cloudflare

If using Cloudflare, disable the proxy (orange cloud → grey cloud) for the initial setup so cert-manager’s HTTP-01 ACME challenge can reach the LB directly. You can re-enable the proxy after certificates are issued if you switch to DNS-01 challenges.

Automatic DNS with external-dns (optional)

external-dns can auto-create DNS records from Ingress resources. When a new tenant is deployed with ingress.host=acme.ironflow.example.com, external-dns automatically creates the A record at your DNS provider.

# Install external-dns (example for Hetzner DNS)
helm repo add external-dns https://kubernetes-sigs.github.io/external-dns
helm install external-dns external-dns/external-dns \
  -n external-dns --create-namespace \
  --set provider.name=hetzner \
  --set env[0].name=HETZNER_DNS_API_TOKEN \
  --set env[0].value=$HETZNER_DNS_TOKEN

external-dns supports Hetzner DNS, Cloudflare, Route53, Google Cloud DNS, and many others.

TLS certificates

TLS is handled automatically. The Helm chart sets cert-manager.io/cluster-issuer: letsencrypt-prod on every Ingress resource when tls: true (the default). Once DNS points to the LB IP:

cert-manager detects the new Ingress with TLS enabled
Requests a Let’s Encrypt certificate via HTTP-01 challenge
Stores the certificate as a Secret (<release>-ironflow-tls) in the tenant’s namespace
Traefik serves HTTPS automatically

Each tenant gets its own TLS certificate. Check certificate status with:

kubectl get certificate -n tenant-acme
kubectl describe certificate -n tenant-acme

Cluster Management

Check status

ironflow provision status --provider hetzner --name ironflow

Upgrade Ironflow

ironflow deploy upgrade --template small --name dev

Tear down

ironflow provision destroy --provider hetzner --name ironflow

File Structure

deploy/terraform/hetzner/
├── main.tf                    # Cluster module + providers
├── variables.tf               # Input variables (token, cluster name, node sizes)
├── outputs.tf                 # kubeconfig path, talosconfig path, cluster info
├── terraform.tfvars.example   # Reference with all options
├── terraform.small.tfvars     # Small cluster: 1 control + 1 worker
├── terraform.medium.tfvars    # Medium cluster: 3 control + 2 workers
├── terraform.large.tfvars     # Large cluster: 3 control + 2 workers
├── .terraform.lock.hcl        # Provider lock file (committed for reproducibility)
├── teardown.sh                # Clean destroy with hcloud CLI fallback
└── .gitignore                 # Ignores state files, kubeconfig, talosconfig

Troubleshooting

Placement Groups Already Exist

If terraform apply fails with placement_group not unique, leftover resources from a previous run exist:

hcloud placement-group list
hcloud placement-group delete <id>

Terraform State Issues

If Terraform state gets out of sync, run ironflow provision destroy --provider hetzner --name ironflow (or ./teardown.sh directly) to force-clean all resources, then start fresh.

Node Not Ready

Talos Linux nodes take 1-2 minutes after provisioning to register with the Kubernetes API. If kubectl get nodes shows NotReady, wait and retry.

ImagePullBackOff

Container image is in a private registry. Create the pull secret as described in Step 5.

Firewall Blocks API Access

If kubectl times out connecting to the cluster, your IP may have changed since provisioning. The Hetzner firewall restricts port 6443 to the IP that ran Terraform. Update it:

# Find your current IP
curl -s ifconfig.me

# Update the firewall rules
hcloud firewall describe ironflow
# Update Source IPs for the "Allow Incoming Requests to Kube API" rule
# with your current IP via the Hetzner Cloud Console or hcloud CLI