Skip to main content

Bootstrapping trust in a Kubernetes cluster: cert-manager, an internal CA, and kubelet TLS

With Talos under the cluster, Cilium handling the network, and Linstor providing storage, pods can talk to each other and persist data. They’re still doing most of it over self-signed TLS that nothing on either side trusts. That’s what this post fixes.

The reason it stays broken for so long is that Kubernetes doesn’t make it loudly broken. A pod hitting another pod’s https:// endpoint with InsecureSkipVerify gets a green response. kubectl logs works because it talks to the apiserver, not directly to the kubelet, and the apiserver’s connection to the kubelet defaults to no verification at all. There’s no error message telling you that half your control-plane traffic is unverified. You have to know.

The fix is three pieces:

  • cert-manager with a self-signed internal CA, so anything in the cluster can ask for a TLS cert and get one.
  • trust-manager to ship that CA to every namespace as a mountable ConfigMap, so workloads can actually verify what cert-manager hands out.
  • A CSR approver so the kubelet itself gets a serving cert signed by a CA the cluster knows about, instead of a self-signed one that nothing trusts.

Plus a Let’s Encrypt issuer for anything that faces the public internet. None of these are Talos-specific; they’re the same shape on any distro. The only Talos-shaped bit is one line in the machine config that turns on kubelet serving cert rotation.

What “TLS” means in a default Kubernetes cluster #

Before adding anything, it’s worth being precise about what’s already encrypted and what isn’t.

The cluster ships with one PKI built around the kubeadm/Talos cluster CA. It signs the apiserver’s serving cert, the apiserver’s client cert for talking to other components, the controller-manager and scheduler client certs, and the kubelet’s client cert. That’s the part that’s wired up correctly out of the box: control-plane components verify each other.

What’s missing from that picture:

  • The kubelet’s serving endpoint. Every kubelet listens on port 10250 with a self-signed cert it generates at startup. The apiserver connects to it for logs, exec, port-forward, and metric scraping, and by default does no verification (the --kubelet-insecure-tls family of flags is implicit in the way the apiserver is configured on most distros). The connection is encrypted but unauthenticated, which means anyone on path can intercept it.
  • Workload-to-workload traffic. A Service in front of a pod is an IP at the CNI layer. There’s no TLS unless the application or a sidecar puts one in. Cilium can encrypt the underlay with WireGuard or IPsec, but that’s transport encryption between nodes, not end-to-end TLS between workloads.
  • Ingress. A pod can serve HTTPS, but it needs a cert. Without an issuer, that cert is either self-signed, manually rotated, or pasted in from somewhere out of band.

The cluster CA is a closed system: cluster components hold its private key (or rather, the controller manager does, via the --cluster-signing-* flags) and it only signs requests submitted through the Kubernetes CSR API. It is deliberately not exposed for general TLS issuance. So the trust layer needs a CA of its own — separate from the cluster CA, available to workloads on demand.

That’s what cert-manager gives you.

cert-manager: a CA the cluster can ask things of #

cert-manager is a controller that watches Certificate CRs and reconciles them into Kubernetes Secrets with TLS keypairs inside. The interesting part is the issuer model: a Certificate references an Issuer or ClusterIssuer, and the issuer decides how the cert is actually produced. Different issuers do different things — ACME (Let’s Encrypt), Vault, an internal CA stored in a Secret, or a self-signed pass-through. The workload manifest doesn’t change when you swap one out; only the issuer reference does.

Installing it is one upstream Helm chart:

helm repo add jetstack https://charts.jetstack.io
helm repo update

helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager --create-namespace \
  --set crds.enabled=true

That gets you the controller, the webhooks, and the three CRDs (Certificate, Issuer, ClusterIssuer). It does not give you any issuers. Out of the box, asking for a Certificate fails because there’s nothing to issue it.

Bootstrapping the internal CA #

The cluster needs one self-signed CA — call it internal-ca — that signs every cert workloads use to talk to each other. It’s the “anything in the cluster trusts this” CA. Public-facing certs come from Let’s Encrypt; that’s a separate issuer, covered further down.

The CA has to be created from scratch on a fresh cluster, and that creation has a chicken-and-egg problem cert-manager solves with a deliberate bootstrap trick.

A CA ClusterIssuer in cert-manager points at a Secret containing the CA’s private key and cert. On a fresh cluster, that Secret doesn’t exist. So you can’t write a ClusterIssuer for the CA yet, which means you can’t issue the CA’s own cert, which means the Secret never gets created. To break the loop, cert-manager ships a selfSigned issuer type: it doesn’t have a Secret behind it; it just signs whatever you give it with the key embedded in the request itself. You use that to issue exactly one Certificate (the CA’s own keypair), and then you point the real ca: ClusterIssuer at the Secret that Certificate just produced.

That’s three manifests:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: selfsigned-bootstrap
spec:
  selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: internal-ca
  namespace: cert-manager
spec:
  isCA: true
  commonName: internal-ca
  secretName: internal-ca
  privateKey:
    algorithm: Ed25519
  issuerRef:
    name: selfsigned-bootstrap
    kind: ClusterIssuer
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: internal-ca
spec:
  ca:
    secretName: internal-ca

The selfsigned-bootstrap ClusterIssuer is one-time scaffolding (it’s also useful for any future ad-hoc self-signed cert). The Certificate named internal-ca is what cert-manager reconciles into the internal-ca Secret in the cert-manager namespace. The second ClusterIssuer, also named internal-ca, is the one workload Certificate resources reference.

A note on the key algorithm. Most cert-manager examples use ECDSA P-256, and it works. Ed25519 is the stronger choice where the consumers support it: it has a cleaner construction (no per-signature nonce, so no nonce-reuse failure mode the way ECDSA has), smaller keys, and faster signatures at the same ~128-bit security level. The one place it gets awkward is anything that has to talk to legacy TLS clients — and an internal CA, by definition, doesn’t. Go services, recent OpenSSL-linked binaries, modern sidecars all handle Ed25519 fine, which is the entire population of consumers an internal CA cares about.

Once those land, kubectl get clusterissuers shows internal-ca as Ready: True and the internal-ca Secret in the cert-manager namespace contains the CA keypair. From there, any namespace can request a workload cert:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: my-app
  namespace: my-app
spec:
  secretName: my-app-tls
  dnsNames:
    - my-app.my-app.svc.cluster.local
    - my-app.example.internal
  issuerRef:
    name: internal-ca
    kind: ClusterIssuer

cert-manager generates the keypair, signs the cert with internal-ca, drops the result into the my-app-tls Secret as tls.crt and tls.key, and rotates it before expiry. The workload mounts the Secret as a volume and forgets about TLS lifecycle entirely.

That gets you one half of trust: any workload can ask for a server cert. The other half is making sure peers can verify it.

trust-manager: shipping the CA bundle everywhere #

A signed cert is only useful if the other side has the CA’s public cert to validate it against. With one CA, that’s just a copy of internal-ca’s tls.crt mounted into every workload that needs to verify a peer.

The naive way is to copy the Secret into every namespace. That works, sort of, but it scales badly: every new namespace needs a copy, and rotation means re-copying everywhere. Worse, the source-of-truth Secret in the cert-manager namespace also contains the CA’s private key. Mounting that Secret to distribute trust would hand the private key to every workload that just wanted the public cert.

trust-manager is the answer to that. It’s a small controller — a separate install from cert-manager, by the same project — that watches a Bundle CR, reads the requested cert (just the public part) from one or more source Secrets or ConfigMaps, and projects it as a ConfigMap into every namespace you select. The ConfigMap has a fixed key (ca-bundle.crt) so workloads can mount it without caring which CA is inside.

Install it from the same jetstack Helm repo:

helm install trust-manager jetstack/trust-manager \
  --namespace cert-manager

Then one Bundle for the internal CA:

apiVersion: trust.cert-manager.io/v1alpha1
kind: Bundle
metadata:
  name: internal-ca
spec:
  sources:
    - secret:
        name: internal-ca
        key: tls.crt
  target:
    configMap:
      key: ca-bundle.crt
    namespaceSelector:
      matchLabels: {}

namespaceSelector.matchLabels: {} is an empty matcher: it propagates the bundle to every namespace. Starting wide is fine because the public cert isn’t sensitive — only the private key is, and that never leaves the cert-manager namespace.

Consuming the bundle in a workload is a one-volume mount:

volumes:
  - name: internal-ca
    configMap:
      name: internal-ca
volumeMounts:
  - name: internal-ca
    mountPath: /etc/ssl/certs/internal-ca.crt
    subPath: ca-bundle.crt
    readOnly: true

Set SSL_CERT_FILE=/etc/ssl/certs/internal-ca.crt for Go binaries, or the runtime’s equivalent, and the workload now verifies any internal-ca-signed peer without InsecureSkipVerify. When the CA rotates, trust-manager updates the ConfigMap and the kubelet refreshes the mount on the next sync.

That covers workload-to-workload TLS. The next two pieces handle the edges: traffic in from the public internet, and traffic between the apiserver and the kubelets.

Let’s Encrypt for the public edge #

For any service that’s reachable from the public internet, an internal CA isn’t useful. Browsers won’t trust it, and you’d have to ship the trust bundle to every device that hits the site. Let’s Encrypt is the standard answer: a publicly-trusted CA that issues free 90-day certificates over the ACME protocol.

cert-manager’s ACME issuer handles the protocol. The only thing you need to configure is how the ACME server’s HTTP-01 challenge gets answered. On a cluster with Gateway API installed, cert-manager has a gatewayHTTPRoute solver that creates a temporary HTTPRoute pointing at cert-manager’s challenge-response server, attached to whatever Gateway you tell it to:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: you@example.com
    privateKeySecretRef:
      name: letsencrypt
    solvers:
      - http01:
          gatewayHTTPRoute:
            parentRefs:
              - group: gateway.networking.k8s.io
                kind: Gateway
                name: shared-gateway
                namespace: gateway-api-system

When a Certificate references the letsencrypt issuer, cert-manager creates a short-lived HTTPRoute on shared-gateway that serves the ACME challenge token at /.well-known/acme-challenge/.... Let’s Encrypt fetches it over HTTP, verifies the response matches the token it issued, signs the cert, and cert-manager tears the HTTPRoute back down. Total elapsed time is usually under thirty seconds.

The Gateway-API solver only needs to touch one resource — the parent Gateway it pins the HTTPRoute under — so the RBAC stays tight and nothing else in the cluster has to know about cert-manager.

While testing, point the issuer at the staging directory (https://acme-staging-v02.api.letsencrypt.org/directory) instead. Staging certificates aren’t publicly trusted, but the rate limits are loose, so a misconfigured Certificate that retries forever can’t burn through your quota. Production enforces 50 certificates per registered domain per week and 5 duplicate certificates per week. A loop on a broken Certificate hits both within an hour.

DNS-01 is an alternative for wildcards or domains that aren’t HTTP-reachable. It needs API credentials for your DNS provider, which is more setup than HTTP-01 deserves on a single cluster. Skip it until you have a wildcard requirement.

Let’s Encrypt covers the public edge. The other edge — apiserver to kubelet — has a different shape entirely, and a different fix.

The kubelet’s serving cert problem #

Every kubelet listens on port 10250 with a TLS endpoint. The apiserver talks to it for kubectl logs, kubectl exec, kubectl port-forward, and any metrics endpoint that proxies through kubelet/metrics. By default, that TLS endpoint serves a cert the kubelet generated itself at startup with no signer involvement.

That cert isn’t in any trust store. On most distros, the apiserver is configured to accept it anyway — either --kubelet-certificate-authority is unset and verification is skipped, or it’s set but the self-signed cert wouldn’t pass verification against the cluster CA in the first place, so the apiserver falls back to trusting the connection without authenticating it. Either way, the result is the same: cluster-internal control-plane traffic between the apiserver and every kubelet is encrypted but unauthenticated. Anyone with a man-in-the-middle position between them can intercept kubectl exec sessions.

The fix is to make the kubelet request a serving cert from a real signer instead of self-signing. Kubernetes has a built-in signer for exactly this — kubernetes.io/kubelet-serving — backed by the cluster CA. When the kubelet starts with serving cert bootstrap enabled, it submits a CertificateSigningRequest to the apiserver asking for a serving cert with the node’s hostname and IP as SANs. If approved, the request is signed by the cluster CA, the kubelet picks up the cert, and the apiserver (with --kubelet-certificate-authority set to the cluster CA) now has something to verify against.

Two things have to be true for that to work:

  1. The kubelet has to be told to do CSR-based bootstrap instead of self-signing.
  2. Something has to approve the CSRs as they come in.

Turning on bootstrap (the only Talos-shaped step) #

On Talos, the kubelet config sits inside the machine config. The relevant knob is in machine.kubelet.extraConfig, which is passed straight through to the kubelet’s own config file:

machine:
  kubelet:
    extraConfig:
      serverTLSBootstrap: true

That single line tells the kubelet to submit a CSR for its serving cert at startup, and to rotate it before expiry by submitting another. After the next reboot (or talosctl service kubelet restart), each node has a pending CSR sitting in the apiserver, waiting for approval.

The same setting exists on every distro, just configured differently — kubeadm clusters set it via the KubeletConfiguration passed to kubeadm init/kubeadm join, k3s via the --kubelet-arg=server-tls-bootstrap=true flag, plain kubeadm-or-equivalent via the kubelet config file directly. The mechanism is identical. Only the surface is different.

Why Kubernetes won’t approve them for you #

The natural next question is: why doesn’t Kubernetes just auto-approve these CSRs? It does, in fact, auto-approve a different signer — kubernetes.io/kube-apiserver-client-kubelet, the kubelet’s client cert used to talk to the apiserver. That one is signed automatically because the kubelet’s bootstrap token authenticates the request unambiguously.

The serving signer is different. The CSR claims a set of SANs (the node’s hostnames and IPs) and Kubernetes has no way to know whether those claims are honest. A compromised kubelet could submit a CSR claiming the SANs of another node and, if auto-approved, get a serving cert valid for that node — which it could then use to impersonate the other kubelet to the apiserver. The Kubernetes maintainers decided the safe default was “leave these CSRs pending, let the cluster admin (or an explicit approver) decide”. So they sit there. Forever. Until something approves them.

Auto-approval, scoped to the actual node #

The pattern that works is a small controller that watches for kubernetes.io/kubelet-serving CSRs and approves them after cross-checking that the claimed SANs match the real node. The standard choice is alex1989hu/kubelet-serving-cert-approver. It’s a single Deployment with a tiny RBAC footprint, and it does exactly one thing: when a CSR comes in for the kubelet-serving signer, it looks up the requesting Node, compares the CSR’s claimed SANs against the Node’s status.addresses and DNS hostname, and approves only if they match.

The project doesn’t ship a Helm chart; it publishes plain manifests under deploy/. The single-replica install is one kubectl apply:

kubectl apply -f https://raw.githubusercontent.com/alex1989hu/kubelet-serving-cert-approver/main/deploy/standalone-install.yaml

(There’s an HA variant under deploy/ha-install.yaml if you want two replicas with leader election.)

The interesting part of what those manifests install is the RBAC. The approver needs three capabilities:

- apiGroups:
  - certificates.k8s.io
  resources:
  - certificatesigningrequests
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - certificates.k8s.io
  resources:
  - certificatesigningrequests/approval
  verbs:
  - update
- apiGroups:
  - authorization.k8s.io
  resources:
  - subjectaccessreviews
  verbs:
  - create
- apiGroups:
  - certificates.k8s.io
  resourceNames:
  - kubernetes.io/kubelet-serving
  resources:
  - signers
  verbs:
  - approve

The first block lets it see CSRs as they’re submitted. The second updates the approval subresource on a CSR. The third, subjectaccessreviews, is how the approver asks the apiserver “is the submitting user the kubelet of the node it’s claiming to be?” — which is the actual security check. The last block, with resourceNames: ["kubernetes.io/kubelet-serving"], is the narrow scoping that matters: the approver can only approve CSRs against the kubelet-serving signer. It can’t, for example, approve a client cert that would let someone impersonate the apiserver, even if it could see the request. The RBAC bounds what a compromised approver can do.

The Deployment itself runs at one replica with tiny resource requests, drops every capability, runs as non-root with a read-only root filesystem, and pins the seccomp profile to RuntimeDefault. Standard hardening, worth mentioning only because it’s the minimum bar for anything with cluster-wide approval authority over a signer the apiserver trusts.

Verifying it actually works #

After the kubelet config rolls out and the approver is running, the CSRs flow through quickly. kubectl get csr shows them with Approved,Issued status:

$ kubectl get csr
NAME        AGE   SIGNERNAME                              REQUESTOR                       CONDITION
csr-7gpld   12s   kubernetes.io/kubelet-serving           system:node:talos-node-0        Approved,Issued
csr-ll3rp   12s   kubernetes.io/kubelet-serving           system:node:talos-node-1        Approved,Issued
csr-wqx4n   12s   kubernetes.io/kubelet-serving           system:node:talos-node-2        Approved,Issued

If a CSR sits in Pending for more than a few seconds, the approver isn’t doing its job — check its logs. The two common causes are an RBAC mismatch (the SA isn’t bound to the cluster role, usually a typo in the namespace) and SAN mismatch (the node’s actual addresses don’t match what the CSR claims, which happens if you renamed a node without rebooting it or if the kubelet picked up a stale hostname).

Once the CSRs are issued, you can confirm end-to-end by hitting the kubelet’s port directly and asking openssl who signed the cert:

echo | openssl s_client -connect <node-ip>:10250 -showcerts 2>/dev/null \
  | openssl x509 -noout -issuer -subject

The issuer should now be the cluster CA, not the kubelet’s own self-signed CN. The subject’s CN is system:node:<node-name> and the SANs include the node’s hostname and addresses. The apiserver, configured with --kubelet-certificate-authority pointing at the cluster CA, can finally verify this cert. kubectl logs and kubectl exec now go over an authenticated TLS connection end-to-end.

This also unblocks metric scrapers that hit the kubelet directly. VictoriaMetrics’s kubelet scrape job, Prometheus’s equivalent, and anything else talking to :10250/metrics/cadvisor can drop --insecure-skip-tls-verify and use the cluster CA bundle for verification. It’s a small thing per workload, but it’s the difference between an observability stack that’s actually verifying what it scrapes and one that’s just trusting whatever responds.

What this unblocks #

Verified TLS at every layer is the floor the rest of the stack sits on. Service-to-service mTLS becomes a Certificate manifest, not a project. External-facing services get publicly-trusted certs with one issuer annotation. Observability tooling stops carrying a long tail of insecure-skip-tls-verify flags. And the next post — on secrets, identity, ingress, and the rest of the cluster layer — can take all of this as given.