Rolling Deployment

What Is Rolling Deployment?

A rolling deployment is a release strategy that incrementally replaces instances running the old version of an application with instances running the new version. Rather than updating all instances simultaneously (which causes downtime) or maintaining two complete environments (which doubles infrastructure costs), a rolling deployment updates one instance or a small batch at a time. At any point during the rollout, some instances serve the old version while others serve the new version, and the load balancer distributes traffic across all healthy instances.

Rolling deployments are the default deployment strategy in most container orchestration platforms, including Kubernetes, Amazon ECS, and Docker Swarm. Their popularity comes from their balance of simplicity, resource efficiency, and availability — they achieve zero-downtime deployments without requiring duplicate infrastructure.

The tradeoff is that during a rolling deployment, two different versions of the application coexist in production simultaneously. This means the application and its API contracts must be backward-compatible — the new version must work alongside the old version, and both must be able to serve traffic correctly. This requirement for backward compatibility is the primary constraint that teams must design around when using rolling deployments.

How It Works

A rolling deployment proceeds through a controlled sequence of instance updates. The orchestration platform manages the process by tracking how many instances are available and healthy at each step.

In Kubernetes, a rolling deployment is configured through the Deployment resource:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 6
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2        # Allow 2 extra pods during update
      maxUnavailable: 1  # At most 1 pod unavailable at a time
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
        - name: web-app
          image: registry.example.com/web-app:v2.1.0
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 10

The update proceeds as follows:

The orchestrator creates new instances (pods) running the new version, up to the maxSurge limit.
Once a new instance passes its readiness probe, the load balancer starts routing traffic to it.
The orchestrator terminates old instances, respecting the maxUnavailable limit to ensure minimum availability.
The process repeats until all instances run the new version.
At any point, if new instances fail health checks, the rollout pauses or rolls back.

The maxSurge and maxUnavailable parameters give teams fine-grained control over the speed and safety of the rollout. A maxSurge of 1 and maxUnavailable of 0 produces the most conservative rollout — only one extra instance is created at a time, and no old instances are removed until the new one is healthy. This is the safest but slowest option. Higher values speed up the rollout but increase resource usage and risk.

Health checks are critical to the rolling deployment process. The readiness probe tells the load balancer when a new instance is ready to receive traffic, preventing requests from being routed to instances that are still starting up. The liveness probe detects instances that have become unhealthy after startup, triggering automatic restarts.

Why It Matters

Rolling deployments provide zero-downtime releases without the infrastructure overhead of blue-green deployments. Because the deployment reuses the same pool of resources — gradually replacing old instances with new ones — there is no need to maintain a second complete environment. For teams with tight infrastructure budgets or large-scale deployments where doubling resources is impractical, rolling deployments offer the best cost-to-availability ratio.

The gradual nature of rolling deployments also provides a degree of risk mitigation. If the new version has a problem, it initially affects only a fraction of users — those whose requests are routed to the updated instances. This gives the team time to detect the problem and halt the rollout before all instances are affected. While this is not as controlled as a canary deployment (where the traffic percentage is explicitly managed), it provides meaningful protection compared to a big-bang deployment.

Rolling deployments are also operationally simple. Because they are the default strategy in Kubernetes and most orchestration platforms, teams do not need to set up additional infrastructure or tooling. A well-configured rolling deployment with proper health checks provides reliable, zero-downtime releases out of the box.

Best Practices

Implement robust health checks. Health checks are the foundation of safe rolling deployments. A readiness probe should verify that the application is fully initialized and ready to serve traffic — not just that the process is running. Check database connectivity, cache warmth, and dependency availability in the readiness probe.
Ensure backward compatibility. During a rolling deployment, both the old and new versions serve traffic simultaneously. API changes, database schema modifications, and message format changes must be backward-compatible. Use versioned APIs and expand-and-contract migration patterns to maintain compatibility.
Handle graceful shutdown. When an old instance is terminated, it should stop accepting new requests, finish processing in-flight requests, and then exit cleanly. Configure appropriate termination grace periods and ensure the application handles SIGTERM signals correctly.
Set appropriate update parameters. Choose maxSurge and maxUnavailable values based on your application’s characteristics. Stateless web services can tolerate aggressive rollouts. Stateful services or those with long startup times need more conservative settings.
Monitor during rollout. Watch error rates, latency, and resource utilization during the rollout window. Set up alerts that trigger if metrics degrade, and be prepared to pause or rollback the deployment. Use kubectl rollout pause and kubectl rollout undo to control the process.

Common Mistakes

Skipping readiness probes. Without readiness probes, the load balancer may route traffic to instances that are still starting up, causing errors for users. Every application in a rolling deployment must have a readiness probe that accurately reflects when the instance is ready to serve traffic. A simple TCP check is rarely sufficient — use HTTP health endpoints that verify application-level readiness.
Breaking backward compatibility. Renaming a database column, changing an API response format, or modifying a message schema without maintaining backward compatibility will cause errors when the old and new versions coexist. Plan schema and API changes as multi-step processes: add the new format, deploy code that handles both, remove the old format.
Setting termination grace periods too short. If the grace period is shorter than the time needed to complete in-flight requests, those requests will be terminated mid-processing, causing errors and data inconsistencies. Measure your application’s longest typical request duration and set the grace period accordingly, with a comfortable margin.