GitOps

What Is GitOps?

GitOps is an operational framework that applies DevOps best practices — version control, code review, collaboration, and CI/CD — to infrastructure and application deployment. In a GitOps model, Git repositories serve as the single source of truth for the desired state of both application code and infrastructure configuration. Changes to the system are made exclusively through Git commits and pull requests, and automated agents continuously reconcile the actual state of the system with the desired state defined in Git.

The term was coined by Alexis Richardson, CEO of Weaveworks, in 2017. While the underlying ideas — infrastructure as code, declarative configuration, automated deployment — existed before GitOps, the GitOps model codified them into a coherent operational philosophy with a clear set of principles.

GitOps is most commonly associated with Kubernetes environments, where tools like Argo CD and Flux watch Git repositories and automatically apply changes to clusters. However, the principles are broadly applicable to any infrastructure that can be described declaratively, including cloud resources managed by Terraform, serverless configurations, and even network policies.

How It Works

GitOps relies on four core principles:

Declarative configuration — The entire system is described declaratively in files stored in Git.
Version-controlled source of truth — Git is the single source of truth for the desired state.
Automated application — Approved changes are automatically applied to the system.
Continuous reconciliation — Software agents continuously compare actual state to desired state and correct any drift.

A typical GitOps workflow with Kubernetes and Argo CD looks like this:

# deployment.yaml in the GitOps repository
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
        - name: web-app
          image: registry.example.com/web-app:v2.4.1
          ports:
            - containerPort: 8080
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"
---
# Argo CD Application definition
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: web-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/org/gitops-config.git
    targetRevision: main
    path: apps/web-app
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

The workflow proceeds as follows:

A developer makes a change — updating an image tag, scaling replicas, or modifying configuration — and opens a pull request against the GitOps repository.
The team reviews the change. Automated checks validate the YAML syntax, run policy checks, and optionally diff the change against the live cluster.
Once approved and merged, the GitOps agent (Argo CD, Flux) detects the new commit.
The agent compares the desired state in Git with the actual state in the cluster.
The agent applies the necessary changes to bring the cluster in line with the Git repository.
If someone manually changes the cluster (configuration drift), the agent detects the discrepancy and automatically reverts the cluster to match Git.

This last point — continuous reconciliation — is what distinguishes GitOps from traditional CI/CD-driven deployment. In a push-based CI/CD model, changes are applied once and the system assumes they persist. In GitOps, the agent continuously monitors for drift and self-heals.

Why It Matters

GitOps provides a complete audit trail of every change to production systems. Because all changes flow through Git, you can answer questions like “who changed the replica count last Tuesday?” or “what did the production configuration look like two months ago?” by examining the Git history. This auditability is invaluable for incident response, compliance, and post-mortems.

The pull request workflow creates a natural approval process for operational changes. Instead of granting operators direct access to production clusters — with all the risk that entails — GitOps limits production changes to the pull request process. This reduces the attack surface and ensures that every change is reviewed and documented.

GitOps also simplifies disaster recovery. If a cluster is lost, rebuilding it from scratch is as simple as pointing a new GitOps agent at the same repository. The agent will read the desired state from Git and recreate the entire environment automatically. This stands in stark contrast to traditional operations, where rebuilding a cluster means reconstructing manual changes from runbooks and memory.

For multi-environment and multi-cluster deployments, GitOps provides a consistent mechanism. The same Git repository can define configurations for development, staging, and production environments using branches, directories, or overlays, ensuring that promotion between environments follows a structured path.

Best Practices

Separate application code from configuration repositories. Keep application source code and GitOps configuration in separate repositories. Application repos trigger CI pipelines that build images; the GitOps repo defines which image versions are deployed where. This separation provides clear boundaries of responsibility.
Use sealed secrets or external secret managers. Never store plaintext secrets in Git, even in private repositories. Use tools like Sealed Secrets, SOPS, or external secrets operators that fetch secrets from Vault or cloud-native secret managers at deployment time.
Implement progressive delivery. Combine GitOps with canary deployments or blue-green strategies. Tools like Argo Rollouts extend Argo CD with sophisticated deployment strategies that are still driven by Git-based configuration changes.
Enforce policy with admission controllers. Use Open Policy Agent (OPA) Gatekeeper or Kyverno to enforce policies on what can be deployed. This prevents misconfigured or non-compliant resources from reaching the cluster, even if they pass code review.
Monitor sync status actively. Track whether your GitOps agents are in sync, degraded, or failing. An agent that has been out-of-sync for hours indicates a problem — either the configuration is invalid or the cluster cannot satisfy the desired state.

Common Mistakes

Making manual changes to production. The single most important GitOps rule is that Git is the only way to change production. Any manual change — a quick kubectl edit to fix a problem — will be reverted by the reconciliation agent, or worse, will create drift that goes undetected if self-healing is disabled. Train the team to always go through Git, even for urgent fixes.
Storing too much in a single GitOps repository. A monorepo that contains configuration for every service across every environment becomes difficult to manage, review, and troubleshoot. Organize configurations by team or domain, with clear ownership boundaries.
Ignoring reconciliation failures. When the GitOps agent reports a sync failure, it means the actual state does not match the desired state. These failures must be investigated and resolved promptly. Persistent sync failures often indicate resource constraints, invalid configurations, or permission issues that will worsen over time.