Skip to main content

Two GitOps controllers, two layers: FluxCD for the foundation, ArgoCD for applications

With Talos under the cluster, Cilium handling networking, Linstor providing storage, and cert-manager issuing TLS everywhere, the cluster can route traffic, persist data, and trust its own certificates. What it doesn’t have yet is a principled way to stay in that state, absorb changes, and reproduce itself.

The answer is GitOps: a Git repository as the single source of truth for the cluster’s desired state, a controller that watches it, and a contract that says the only way to change the cluster is to change the repo. No imperative changes to nodes, no helm upgrade from a laptop, no wondering what version is actually running. When something is wrong, the answer is in the repo. When you want to make a change, you open a pull request.

But there’s a decision to make upfront: which GitOps controller. Two tools dominate the space, and they have different shapes. I use both, on purpose, for different layers of the platform.

Two controllers, two layers #

FluxCD manages the foundation. ArgoCD manages the applications. That split isn’t arbitrary; each tool fits its layer better than the other would.

The foundation layer is infrastructure-only: Cilium, cert-manager, Linstor, FluxCD itself. It has to exist before any application does. It has hard ordering dependencies: Cilium needs to be running before anything else can. Linstor must be deployed before a PVC gets reconciled in actual storage. The people operating it are cluster admins comfortable with kubectl and Helm. It doesn’t need a UI.

The application layer are workloads deployed on top of the foundation. The audience expands here; teams, or in a homelab yourself in a different mode, need to see deployment status without digging through Kubernetes CRs. A UI helps. Per-application scoping helps. ArgoCD’s opinions, which are liabilities at the foundation layer, become assets here.

There’s also a practical resilience argument: if ArgoCD has a problem, the foundation keeps running. The controller for applications doesn’t sit in the critical path of the infrastructure that supports it.

Why FluxCD for the foundation #

FluxCD is opinionated about one thing: Git is the source of truth. Everything else it leaves to you. It gives you HelmRelease, Kustomization, GitRepository, and HelmRepository CRDs, and reconciles them. That’s most of what it does.

What makes that useful for the foundation is that it doesn’t add opinions on top. No built-in access model, no application dashboard, no concept of projects or scoped visibility. Those are handled elsewhere by purpose-built tools.

The HelmRelease CR is the right shape for infrastructure components. You point it at a chart, give it a values block, and express ordering with dependsOn:

apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
  name: trust-manager
  namespace: flux-system
spec:
  interval: 15m
  dependsOn:
    - name: cert-manager
  chart:
    spec:
      chart: trust-manager
      version: "0.x"
      sourceRef:
        kind: HelmRepository
        name: jetstack
        namespace: flux-system
  targetNamespace: cert-manager
  install:
    createNamespace: false
    remediation:
      retries: 5
  upgrade:
    remediation:
      retries: 5

trust-manager needs cert-manager running before it can do anything useful. The dependsOn field encodes that: FluxCD waits for cert-manager to report Ready before attempting to reconcile trust-manager. The same pattern applies across the whole foundation: Cilium before everything, cert-manager before issuers, issuers before workloads that need certificates. The entire foundation comes up in dependency order, automatically.

ArgoCD also has an ordering mechanism called sync waves, but it works differently from dependsOn. Sync waves are integer annotations on individual Kubernetes resources within an Application. Resources in wave 0 are applied first, ArgoCD waits for them to become healthy, then wave 1 starts. This works well within a single Application, but it doesn’t reach across separate Applications. You can’t say “wait for this specific Application to be Ready before starting that one.” For infrastructure with precise cross-component ordering requirements, dependsOn across HelmReleases is the more expressive tool.

Helm and Kustomize are both first-class in FluxCD. A HelmRelease manages a Helm chart. A Kustomization runs a kustomize build. No wrapper abstraction to work through.

Why ArgoCD for applications #

ArgoCD comes with opinions, and for applications those opinions are mostly useful. There are three worth understanding.

The first is the Application model. Every deployment in ArgoCD is an Application CR that declares a source (a Git repository path, a Helm chart, or a kustomize directory), a destination namespace, and a sync policy. You’re not composing HelmReleases and Kustomizations freely the way FluxCD lets you. Everything is an Application. That constraint is also clarity: each Application has a defined scope, an owner, and a status page in the UI. For a deeper look at how ArgoCD itself is set up and managed, there’s a post from last year on building a self-managing ArgoCD cluster.

The second is sync waves. As mentioned above, sync waves control ordering within a single Application’s sync. For resources that have ordering requirements within one deployment (a database migration job that must complete before the application pods start, for example), this works well. For ordering between separate Applications, you’d rely on Kubernetes to retry until prerequisites exist, which is a looser guarantee than dependsOn. In practice, application deployments rarely have hard cross-Application ordering requirements, so the trade-off is acceptable.

The third is how ArgoCD handles Helm. ArgoCD uses Helm as a template engine: it runs helm template to produce manifests, then applies and tracks those resources through its own sync cycle. This means Helm pre-install hooks are processed as part of ArgoCD’s PreSync phase, but through ArgoCD’s own hook lifecycle rather than Helm’s. Most charts work without modification. Charts that rely on Helm’s rollback mechanism, or that have complex hook delete policies, can behave unexpectedly: ArgoCD is managing the deployment lifecycle, not Helm. A useful mental model: ArgoCD does the deploying, Helm does the templating. Running helm list on the cluster won’t show what you might expect, because ArgoCD tracks releases through its own resource management. Once that distinction is clear, the behavior is consistent and predictable.

What makes ArgoCD the right fit for applications despite those opinions is what they produce: a UI where you can see every resource a deployment created, its health status, and what’s wrong when something fails. When a deploy breaks, you find the unhealthy resource in the browser rather than chasing it through several kubectl describe calls. For teams diagnosing their own deployments, that’s where the time is saved.

Getting both running: the bootstrap sequence #

Before FluxCD can manage anything, FluxCD itself has to exist. You can’t use GitOps to install the GitOps controller. That’s the bootstrapping problem, and it needs a one-time manual sequence.

Step 1: Cilium #

I run Talos Linux as the OS for my cluster nodes, an immutable, API-driven OS built specifically for Kubernetes with no SSH and no shell (more on why here). That said, nothing in the bootstrap sequence below is Talos-specific; the same steps apply to any Kubernetes cluster.

Talos ships with Flannel as its default CNI, but Cilium gives you far more: identity-based policy, observability through Hubble, and Gateway API for HTTP routing. Install it before anything else, since no pods can schedule without a CNI:

helm repo add cilium https://helm.cilium.io
helm repo update

helm install cilium cilium/cilium \
  --namespace kube-system \
  -f cilium-values.yaml \
  --wait

kubectl -n kube-system rollout status daemonset/cilium --timeout=5m

One thing worth knowing: if the values used here differ from what FluxCD will later apply from Git, FluxCD corrects the difference on its first reconciliation. That restarts the Cilium DaemonSet and briefly interrupts cluster networking. On a fresh cluster with no running workloads yet, it’s harmless. To avoid it entirely, use the same values file for the bootstrap install and for what FluxCD reads from the repository.

Step 2: The Flux Operator #

Rather than flux bootstrap, which generates manifests and commits them into the repository, I use the Flux Operator: a Kubernetes operator that manages FluxCD itself as a Helm release. This keeps the FluxCD installation under GitOps management without extra generated commits landing in the repository.

helm upgrade --install fluxcd-operator \
  oci://ghcr.io/controlplaneio-fluxcd/charts/flux-operator \
  --version 0.47.0 \
  --namespace flux-system \
  --create-namespace \
  --wait

kubectl wait crd/fluxinstances.fluxcd.controlplane.io \
  --for=condition=Established --timeout=2m

Step 3: The FluxInstance #

With the operator running, a FluxInstance CR describes the FluxCD distribution to install:

# fluxinstance.yaml
apiVersion: fluxcd.controlplane.io/v1
kind: FluxInstance
metadata:
  name: flux
  namespace: flux-system
spec:
  distribution:
    version: "2.x"
    registry: "ghcr.io/fluxcd"
  cluster:
    type: kubernetes
    size: small
    multitenant: false
    networkPolicy: true
    domain: cluster.local
kubectl apply -f fluxinstance.yaml

kubectl -n flux-system wait fluxinstance/flux \
  --for=condition=Ready --timeout=5m

kubectl -n flux-system wait deployment/source-controller \
  --for=condition=Available --timeout=2m

The operator reconciles this into a running FluxCD distribution: source-controller, helm-controller, kustomize-controller, and notification-controller.

Step 4: Git credentials #

FluxCD pulls the repository over HTTPS. A Secret in flux-system carries the credentials:

kubectl create secret generic flux-system \
  --namespace flux-system \
  --from-literal=username=<your-username> \
  --from-literal=password=<your-token>

Step 5: GitRepository and the app-of-apps #

The GitRepository CR tells FluxCD where to pull from:

# gitrepository.yaml
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
  name: flux-system
  namespace: flux-system
spec:
  interval: 5m
  url: https://github.com/your-org/platform-repo
  secretRef:
    name: flux-system
  ref:
    branch: main
kubectl apply -f gitrepository.yaml

The final step applies the app-of-apps: a Helm chart whose only content is HelmRelease templates, one per foundation component. No workloads, just instructions for FluxCD about what to reconcile and in what order. (bootstrap-infrastructure is the name I use to group all foundation-layer HelmReleases in one place, the name itself is not significant):

helm upgrade --install bootstrap-infrastructure \
  ./helm/charts/bootstrap-infrastructure \
  --namespace flux-system \
  --values platform-values.yaml \
  --values cluster-values.yaml

FluxCD immediately picks up all the HelmRelease CRs this creates and begins reconciling them.

Self-managed from then on #

The interesting thing about that last step is what’s inside the app-of-apps. Among its HelmRelease templates are ones named fluxcd-operator and fluxcd-instance, matching the components installed manually in steps 2 and 3. There’s also a cilium HelmRelease and a self-referential one for bootstrap-infrastructure itself (with wait: false, because if it waited for its own Helm release to be ready, it would block forever, since the chart only becomes ready once all the HelmReleases it created are ready). Because those names and namespaces match what’s already running in the cluster, FluxCD takes ownership of all of them. What started as a handful of manual Helm installs becomes a fully GitOps-managed foundation:

$ flux get helmreleases -n flux-system
NAME                       READY  STATUS
bootstrap-infrastructure   True   Helm upgrade succeeded
cilium                     True   Helm upgrade succeeded
cert-manager               True   Helm upgrade succeeded
trust-manager              True   Helm upgrade succeeded
fluxcd-operator            True   Helm upgrade succeeded
fluxcd-instance            True   Helm upgrade succeeded
argocd-upstream            True   Helm upgrade succeeded
...

Any upgrade to FluxCD or any other foundation component now goes through a pull request to the repository.

The advantages are real #

One source of truth means that when something is wrong, the answer is always in the repository. Configuration isn’t split across cluster state, runbooks, and memory.

FluxCD provides drift detection for the resources it manages. If a HelmRelease or Kustomization drifts because a manual change was applied directly, FluxCD detects and corrects it on the next reconciliation interval. This covers everything in the foundation layer that’s owned by a HelmRelease, which is all of it.

The deployment pipeline is as simple as it gets: push to main, FluxCD reconciles. No build system, no manual steps after the merge. Helm chart version bumped in the repository? FluxCD upgrades the release. Values changed? FluxCD applies them. The CD pipeline is the controller loop.

The rough edges #

The advantages come with cost.

Development means committing. When iterating on a HelmRelease values block, every change requires a commit, push, and a wait. flux reconcile triggers an immediate pull and reconciliation without waiting for the interval:

flux reconcile source git flux-system
flux reconcile helmrelease cert-manager -n flux-system

Manual helm upgrade --install works for quick testing, but FluxCD will overwrite it on the next reconciliation interval. The fastest inner loop for infrastructure development is a local kind cluster where you can shorten or manually trigger intervals; on the homelab cluster, you’re pushing commits.

Troubleshooting requires the CLI. When a HelmRelease fails, the error is in its status:

flux get helmreleases --all-namespaces
flux logs --kind=HelmRelease --name=cert-manager --namespace=flux-system
kubectl describe helmrelease cert-manager -n flux-system

Those commands surface the Helm error. Finding which specific Kubernetes resource inside the release caused the failure still requires drilling with kubectl. FluxCD gives you the Helm error and leaves the rest to you.

The flip side of dependsOn. The dependsOn graph that keeps the foundation ordered also means failures propagate through it. A cert-manager outage pauses every HelmRelease that lists cert-manager in its dependsOn, and that propagates further as any release depending on one of those paused releases also becomes unready, and so on down the chain. Those cascading “not ready” states show up across several releases until the root cause is fixed. The first few times this happens it’s easy to mistake downstream symptoms for the actual problem. The right approach is to follow the graph to the first failing release rather than debugging each blocked one individually.

Tools that help #

k9s is the CLI answer for watching the cluster without writing kubectl commands. A terminal UI with keyboard navigation across pods, HelmReleases, and events. Watching a reconciliation unfold is comfortable in k9s.

Headlamp fills the graphical UI gap. It’s a desktop application that runs on the platform engineer’s laptop and connects directly to the Kubernetes API server. The FluxCD plugin for Headlamp surfaces reconciliation status: which HelmReleases are ready, which are failing, and what the error is, without going to the CLI for every check. It doesn’t replace kubectl describe, but it’s a fast overview for routine status checks.

Both cover the FluxCD layer well: finding which release is failing and surfacing the Helm error. Drilling into the individual Kubernetes resources inside a failing release is still a kubectl job.

Wrapping up #

FluxCD and ArgoCD aren’t competing answers to the same question. FluxCD’s simplicity and dependsOn ordering fit the foundation layer, where infrastructure has to come up in the right sequence and the operators live in the terminal anyway. ArgoCD’s UI fits the application layer, where the better UX matters more than ordering graphs.

After the bootstrap sequence, the Git repository is the only source of truth. Configuration isn’t scattered across imperative history. Changes are reviewable, rollbacks are a revert, and standing up a new environment is a values file and one bootstrap run away.

That’s the operating model the rest of the platform sits on.