When I joined the team, a full production deployment took around 45 minutes. By the time it was done, the engineer who triggered it had usually moved on to something else, lost context, and was half-hoping nothing went wrong.

Today the same deployment completes in under 4 minutes, with automatic rollback if anything looks wrong. Here is exactly how we got there.

The old pipeline

We were running Jenkins on a VM, triggering shell scripts that SSH'd into servers to pull new Docker images. No Helm, no ArgoCD, no real observability into what was happening.

The problem wasn't Jenkins — it was that the pipeline had no model of desired state. It just ran commands and hoped for the best.

Moving to GitOps

The core idea behind GitOps is that your Git repository is the single source of truth for what should be running in production. ArgoCD watches your repo and reconciles the cluster to match.

To make this work, we needed to:

  • Package every service with Helm charts
  • Store Helm values in a separate gitops-config repo
  • Set up ArgoCD with auto-sync enabled
  • Update the CI pipeline to bump image tags in the values repo (not deploy directly)

Blue-green on EKS

The 45-minute problem was partly architectural. We were doing rolling updates, which meant waiting for each pod to come up healthy before proceeding. With blue-green, we spin up the entire new environment in parallel and switch traffic only when all health checks pass.

# In your Helm values — point service to the inactive color
service:
  selector:
    slot: green   # switch to 'blue' on next deploy

Key numbers

  • Deploy time: 45 min → 4 min
  • Rollback time: ~25 min → 30 sec
  • Failed deploys reaching production: down 80%
  • Deployment frequency: 3x weekly → daily