From Minikube to AWS EKS: How I Built a Zero-Downtime Blue-Green Deployment Pipeline for ShopSwift
DEV Community Grade 9 10d ago

From Minikube to AWS EKS: How I Built a Zero-Downtime Blue-Green Deployment Pipeline for ShopSwift

I built ShopSwift, a Node.js/Express e-commerce API, and wrapped it in a production-grade blue-green deployment pipeline: Docker, Kubernetes, Minikube local validation, NGINX Ingress , GitHub Actions CI, AWS EKS , Amazon ECR , and Prometheus + Grafana monitoring. Zero failed requests across every switch and rollback. Here is exactly how I did it - including the architecture mistake that caused a 503, and the fix that made it truly zero-downtime. The Real Problem With Shipping Software Releasing code is where theory meets reality. A feature can pass every local test, build cleanly in CI, and still fail the moment real traffic touches it. When it does, the question is not what broke - it is how quickly can you recover without taking users down with you . Traditional rolling deployments reduce this risk but do not eliminate it. During a rollout, old and new code can run simultaneously, creating version skew. If the new version is bad, rollback means redeploying the old one - which takes time users will feel. Blue-green deployment takes a different approach. Two environments run in parallel. One is live. The other is where the new release lands. Traffic switches only after validation. Rollback is a routing change, not a redeployment. The question I wanted to answer with this project was practical: Can I build a blue-green pipeline that delivers genuinely zero failed requests through a traffic switch, a rollback, and a simulated broken release - locally and in the cloud? The answer is yes. But the path had a real failure in it. That failure made the project better. Repository: github.com/gbadedata/shopswift-blue-green What This Project Covers Phase What was built 1 Node.js + Express API with Jest tests 2 Docker image with environment-based versioning 3 Git history and GitHub baseline 4 Minikube blue baseline with NGINX Ingress 5 Green deployment + traffic switch (with a 503 failure and fix) 6 Zero-downtime rollback from Green back to Blue 7 Broken release simulation with readiness probe protection 8 GitHub Actions CI (tests, Docker build, Trivy, kubeconform) 9 AWS EKS cloud deployment with Amazon ECR 10 Prometheus + Grafana monitoring stack - AWS teardown and cost sweep The Application: ShopSwift ShopSwift is a small Node.js and Express e-commerce API. The application itself is intentionally simple. The deployment system around it is not. Endpoint Purpose / Landing /health Liveness probe /ready Readiness probe /version Active version and environment /products Simulated catalogue /products/:id Product detail /cart Simulated cart /checkout Simulated checkout /metrics Prometheus metrics The /version endpoint was the most important one for deployment validation. It returned the running version and environment label, making it trivial to confirm exactly which environment was serving traffic at any moment: // Blue environment { "app" : "ShopSwift" , "version" : "v1.0.0" , "environment" : "blue" , "commit" : "aws-blue" , "port" : 3000 , "status" : "running" } // Green environment { "app" : "ShopSwift" , "version" : "v2.0.0" , "environment" : "green" , "commit" : "aws-green" , "port" : 3000 , "status" : "running" } Technology Stack Layer Tool Application Node.js , Express Testing Jest , Supertest Containerization Docker Local Kubernetes Minikube Cloud Kubernetes AWS EKS Container Registry Amazon ECR Ingress NGINX Ingress Controller CI/CD GitHub Actions Security Scanning Trivy Manifest Validation kubeconform Monitoring Prometheus , Grafana Cloud Tooling AWS CLI , eksctl Package Management Helm Architecture: The Final Routing Model After testing and refinement (including one important failure), the traffic routing model settled into this: User request | AWS Load Balancer (cloud) or kubectl port-forward (local) | NGINX Ingress Controller | shopswift-ingress | shopswift-active-service | selector: environment=blueORenvironment=green | Blue pods Green pods The key design principle: NGINX Ingress never changes . The only thing that changes during a switch is the label selector on shopswift-active-service . This distinction matters - and it came from a real failure. More on that below. Phase 1: The Application I started with the Express API and immediately wrote tests using Jest and Supertest before touching Docker or Kubernetes. Test Suites: 1 passed Tests: 7 passed This was deliberate. Kubernetes deployment should not begin with an untested application. The health, readiness, and version endpoints needed to be correct before any of the deployment logic could trust them. Phase 2: Dockerizing ShopSwift The Docker image was designed to support both Blue and Green from the same codebase using environment variables: # Blue container docker run -e APP_VERSION = v1.0.0 -e APP_ENV = blue shopswift:v1.0.0 # Green container docker run -e APP_VERSION = v2.0.0 -e APP_ENV = green shopswift:v2.0.0 No separate codebases. No duplicated Dockerfiles. One image, configured at runtime. I also wrote smoke test scripts to validate all endpoints qu

I built ShopSwift, a Node.js/Express e-commerce API, and wrapped it in a production-grade blue-green deployment pipeline: Docker, Kubernetes, Minikube local validation, NGINX Ingress, GitHub Actions CI, AWS EKS, Amazon ECR, and Prometheus + Grafana monitoring. Zero failed requests across every switch and rollback. Here is exactly how I did it - including the architecture mistake that caused a 503, and the fix that made it truly zero-downtime. The Real Problem With Shipping Software Releasing code is where theory meets reality. A feature can pass every local test, build cleanly in CI, and still fail the moment real traffic touches it. When it does, the question is not what broke - it is how quickly can you recover without taking users down with you. Traditional rolling deployments reduce this risk but do not eliminate it. During a rollout, old and new code can run simultaneously, creating version skew. If the new version is bad, rollback means redeploying the old one - which takes time users will feel. Blue-green deployment takes a different approach. Two environments run in parallel. One is live. The other is where the new release lands. Traffic switches only after validation. Rollback is a routing change, not a redeployment. The question I wanted to answer with this project was practical: Can I build a blue-green pipeline that delivers genuinely zero failed requests through a traffic switch, a rollback, and a simulated broken release - locally and in the cloud? The answer is yes. But the path had a real failure in it. That failure made the project better. Repository: github.com/gbadedata/shopswift-blue-green What This Project Covers | Phase | What was built | |---|---| | 1 | Node.js + Express API with Jest tests | | 2 | Docker image with environment-based versioning | | 3 | Git history and GitHub baseline | | 4 | Minikube blue baseline with NGINX Ingress | | 5 | Green deployment + traffic switch (with a 503 failure and fix) | | 6 | Zero-downtime rollback from Green back to Blue | | 7 | Broken release simulation with readiness probe protection | | 8 | GitHub Actions CI (tests, Docker build, Trivy, kubeconform) | | 9 | AWS EKS cloud deployment with Amazon ECR | | 10 | Prometheus + Grafana monitoring stack | | - | AWS teardown and cost sweep | The Application: ShopSwift ShopSwift is a small Node.js and Express e-commerce API. The application itself is intentionally simple. The deployment system around it is not. | Endpoint | Purpose | |---|---| / | Landing | /health | Liveness probe | /ready | Readiness probe | /version | Active version and environment | /products | Simulated catalogue | /products/:id | Product detail | /cart | Simulated cart | /checkout | Simulated checkout | /metrics | Prometheus metrics | The /version endpoint was the most important one for deployment validation. It returned the running version and environment label, making it trivial to confirm exactly which environment was serving traffic at any moment: // Blue environment { "app": "ShopSwift", "version": "v1.0.0", "environment": "blue", "commit": "aws-blue", "port": 3000, "status": "running" } // Green environment { "app": "ShopSwift", "version": "v2.0.0", "environment": "green", "commit": "aws-green", "port": 3000, "status": "running" } Technology Stack | Layer | Tool | |---|---| | Application | Node.js, Express | | Testing | Jest, Supertest | | Containerization | Docker | | Local Kubernetes | Minikube | | Cloud Kubernetes | AWS EKS | | Container Registry | Amazon ECR | | Ingress | NGINX Ingress Controller | | CI/CD | GitHub Actions | | Security Scanning | Trivy | | Manifest Validation | kubeconform | | Monitoring | Prometheus, Grafana | | Cloud Tooling | AWS CLI, eksctl | | Package Management | Helm | Architecture: The Final Routing Model After testing and refinement (including one important failure), the traffic routing model settled into this: User request | AWS Load Balancer (cloud) or kubectl port-forward (local) | NGINX Ingress Controller | shopswift-ingress | shopswift-active-service | selector: environment=blue OR environment=green | Blue pods Green pods The key design principle: NGINX Ingress never changes. The only thing that changes during a switch is the label selector on shopswift-active-service . This distinction matters - and it came from a real failure. More on that below. Phase 1: The Application I started with the Express API and immediately wrote tests using Jest and Supertest before touching Docker or Kubernetes. Test Suites: 1 passed Tests: 7 passed This was deliberate. Kubernetes deployment should not begin with an untested application. The health, readiness, and version endpoints needed to be correct before any of the deployment logic could trust them. Phase 2: Dockerizing ShopSwift The Docker image was designed to support both Blue and Green from the same codebase using environment variables: # Blue container docker run -e APP_VERSION=v1.0.0 -e APP_ENV=blue shopswift:v1.0.0 # Green container docker run -e APP_VERSION=v2.0.0 -e APP_ENV=green shopswift:v2.0.0 No separate codebases. No duplicated Dockerfiles. One image, configured at runtime. I also wrote smoke test scripts to validate all endpoints quickly after each build - a habit that paid dividends throughout the project. Challenge: npm ci Caught a Lockfile Mismatch The Docker build failed at: RUN npm ci --omit=dev The cause: package-lock.json was out of sync with package.json . This is precisely why npm ci exists. Unlike npm install , it treats a lockfile mismatch as a hard failure rather than silently correcting it. I regenerated the lockfile and rebuilt - and the build became reproducible. Lesson: A reproducible build that fails loudly is better than a lenient one that silently diverges. Phase 3: Git Baseline After the local application and Docker image were validated, I pushed to GitHub. The commit history became part of the project evidence: feat: build ShopSwift app and Docker baseline feat: deploy ShopSwift blue on Minikube with NGINX Ingress feat: implement zero-downtime switch with active service selector feat: simulate broken green release and validate readiness protection feat: add GitHub Actions CI pipeline feat: deploy and validate blue-green on AWS EKS feat: add Prometheus and Grafana monitoring docs: rewrite README with complete deployment documentation For a project like this, traceability is part of the deliverable. Phase 4: Minikube Blue Baseline Before spending time or money on AWS, I validated the full deployment architecture locally with Minikube. Kubernetes resources deployed: Namespace: ecommerce-bluegreen Deployment: shopswift-blue Service: shopswift-blue-service Ingress: shopswift-ingress Challenge: WSL + Docker Driver = Unreliable Ingress Access Running Minikube with the Docker driver inside WSL meant that accessing shopswift.local directly was unreliable - a known networking limitation of this environment. The app and Service were fine; the issue was local DNS and networking. The solution was to port-forward the NGINX Ingress Controller and pass the correct Host header: kubectl port-forward -n ingress-nginx service/ingress-nginx-controller 8080:80 & curl -H "Host: shopswift.local" http://localhost:8080/version This still exercised the full NGINX Ingress routing path - just without relying on local DNS resolution. It was the right tradeoff for a local validation environment. Phase 5: Deploying Green and Switching Traffic With Blue stable, I deployed Green: Deployment: shopswift-green Service: shopswift-green-service Before switching traffic, I tested Green internally through its own Service. This is non-negotiable in a proper blue-green workflow - Green pods running does not mean Green is ready to serve users. Green internal check confirmed: { "app": "ShopSwift", "version": "v2.0.0", "environment": "green", "commit": "minikube-green", "port": 3000, "status": "running" } Both environments were now running. Time to switch traffic. The Failure That Made This Project Better My first switching approach was to patch the Ingress backend directly: # Before backend: service: name: shopswift-blue-service # After patching backend: service: name: shopswift-green-service Logical. Clean-looking. But during a continuous zero-downtime test: FAILED request: status=503 Failed requests: 1 of 26 A single 503 during a traffic switch means the design cannot honestly be called zero-downtime. I did not hide this result. I used it to understand what was happening. The likely cause: when the Ingress backend is patched, NGINX reloads its configuration. During that reload - even briefly - upstream connections can fail. One request landed in that gap. The Fix: The Stable Active Service Pattern Instead of touching the Ingress, I introduced a stable intermediary: # shopswift-active-service - this never changes in Ingress apiVersion: v1 kind: Service metadata: name: shopswift-active-service spec: selector: app: shopswift environment: blue # <-- only this changes during a switch The Ingress always points to shopswift-active-service . To switch traffic, I only patch the selector: # Switch to Green kubectl patch service shopswift-active-service \ -n ecommerce-bluegreen \ --type='merge' \ -p '{"spec":{"selector":{"app":"shopswift","environment":"green"}}}' # Roll back to Blue kubectl patch service shopswift-active-service \ -n ecommerce-bluegreen \ --type='merge' \ -p '{"spec":{"selector":{"app":"shopswift","environment":"blue"}}}' Kubernetes Service selector updates are atomic. The control plane propagates the change with no NGINX reload, no connection gap. This is the architecture improvement that made zero-downtime achievable. Result After the Fix Blue to Green switch: Total requests: 26 Failed requests: 0 Zero-downtime availability test: PASSED Phase 6: Zero-Downtime Rollback Rollback in a blue-green system should not require rebuilding or redeploying the previous version. Blue never stopped running. It was just not receiving traffic. To roll back, I patched the selec

Comments

No comments yet. Start the discussion.