배포 전략을 고를 때 봐야 할 운영 기준

문제의 시작

배포 전략은 멋진 이름을 고르는 문제가 아니다. 새 버전을 얼마나 조심스럽게 노출할 것인지, 문제가 생겼을 때 얼마나 빨리 되돌릴 수 있는지, 그 사이 사용자에게 어떤 영향을 줄 것인지 결정하는 운영 설계다. 이 글은 여러 배포 방식을 운영 기준으로 비교한 기록이다.

소프트웨어 개발에서 배포 방식은 중요하다.

배포 방식을 결정하는 요인에는 uptime, risk management, resource management, UX 등이 있다.

대표적인 3가지의 배포 방식을 간단한 코드와 함께 알아보도록 한다.

먼저 Blue-Green deployment 이다.

두 개의 동일한 배포 환경을 생성한다. 한 시점에는 하나의 환경만 구동시키도록 한다. 예를 들어 현재 live environment (현재 버전 의 배포 환경) 가 Blue 이면, 새로운 버전에서는 Green 환경에서 배포를 시작한다. Green 환경에서 테스트가 완료되고 (CI/CD 과정을 거쳐 빌드가 완료되고, 유닛 테스트 등이 완료) run 이 완료된 시점에 traffic 이 Blue 에서 Green 으로 넘어가게 된다.

장점으로는 새로운 배포 환경에서 문제가 발생했을 때, 기존 환경으로 즉시 되돌아갈 수 있다. 또한 Green 이 구동되는 시점에 traffic 변화가 생기는 것이기 때문에 uptime 은 지속적이며 유저들은 끊김을 경험하지 않는다.

단점으로는 resource 를 많이 사용한다는 점이다. 보통 Blue-Green deployement 에서 Green 이 시동된다고 해서 Blue 를 내리지 않는다. 큰 장점인 Rollback 이 가능하다는 점이 퇴색되기 때문이다. 하여 server resource, maintenance 측면에서 무거운 느낌이 있다.

Blue-Green workflow 를 위한 Github Actions 로 실행할 수 있는 간단한 yaml file 이다.

구현 흐름

name: Blue-Green Deployment

on:
  push:
    branches:
      - main

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v2

      # Add steps to build/test your application here
      # - name: Build and Test
      #   run: ...

      - name: Deploy to Green Environment
        run: |
          # Add your deployment scripts here
          echo "Deploying to Green Environment"
          # Example: ssh user@server 'deploy-script-green.sh'

      - name: Health Check for Green Environment
        run: |
          # Perform a health check
          echo "Checking Green Environment Health"
          # Example: curl http://green.yourdomain.com/health

      - name: Switch Traffic to Green
        if: success()
        run: |
          # Switch traffic from Blue to Green
          echo "Switching traffic to Green Environment"
          # Example: ssh user@server 'switch-traffic-to-green.sh'

      - name: Monitor Green Environment
        run: |
          # Optional: Monitor the Green environment
          echo "Monitoring Green Environment"
          # Example: Some monitoring script/logic

      - name: Rollback to Blue if Needed
        if: failure()
        run: |
          # Rollback to Blue environment in case of failure
          echo "Rolling back to Blue Environment"
          # Example: ssh user@server 'switch-traffic-to-blue.sh'

다음으로 Canary Deployment 이다.

이미 배포된 버전과 새 버전 간의 트래픽을 분산하여 완전히 어플리케이션이 배포되기 전에 일부 사용자에게 배포하는 ‘점진적’ 배포이다.

가장 큰 장점으로는 ‘점진적’ 배포이기에 생기는 risk management 이다. 작은 그룹의 유저에게 먼저 배포하여 모든 유저에게 배포 전 버그나 에러를 발견할 수 있다. 또한 전 버전으로 돌아가는 rollback 도 간단히 할 수 있다.

눈에 띄는 단점으로는 작은 유저 풀에게 새로운 버전이 배포되기 때문에 큰 풀의 유저가 트래픽을 맞이했을 때 생기는 potential issue 를 예상하기 어렵다는 점이다. 제한된 배포에는 보이지 않는 이슈가 따를 수 있다.

다음은 간단한 Github Actions yaml 코드이다.

name: Canary Deployment

on:
  push:
    branches:
      - main

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v2

      # Add steps to build/test your application here
      # - name: Build and Test
      #   run: ...

      - name: Deploy to Canary
        run: |
          # Add your deployment script here
          echo "Deploying to Canary Environment"
          # Example: deploy-script-canary.sh

      - name: Monitor Canary Deployment
        run: |
          # Implement monitoring. This could be a script that checks the health of the application.
          echo "Monitoring Canary Deployment"
          # Example: curl http://canary.example.com/health

      - name: Gradual Rollout
        run: |
          # Gradually route more traffic to the canary
          echo "Increasing traffic to Canary"
          # Example: increase-traffic-to-canary.sh

      - name: Full Rollout
        run: |
          # If everything is fine, route all traffic to the canary (making it the new production)
          echo "Routing all traffic to Canary"
          # Example: full-traffic-to-canary.sh

코드에서 중요한 부분인 shell script 의 구현과 점진적 배포 자체는 로드밸런서나 서비스 메시(Istio 등), Orchestration tool(k8s) 등을 사용한다.

마지막으로 Rolling Deployment 이다.

새로운 버전을 부분별로 기존 버전에 갈아끼우는 방식이다. 어원은 확실치 않지만 말 그대로 달리는 차에 바퀴를 갈아끼운다 생각해도 될지 모르겠다.

장점으로는 역시 부분별 배포이기 때문에 어떤 부분에서 이슈가 터지는지 알 수 있다는 것과, Downtime 이 최소화된다는 점이다. Blue-Green 과 비교하여 resource 사용량도 확연히 적을 것이다.

단점은 문제가 발생했을 때 기존 버전과 새로운 버전의 compatibility 문제인지, 새로운 버전의 변경 사항이 만든 문제인지 즉시 분리하기 어렵다는 점이다. Deployment time도 다른 방식에 비해 조금 느릴 수 있다.

간단한 yaml 이다.

name: Rolling Deployment

on:
  push:
    branches:
      - main

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v2

      # Build and test steps would go here
      # - name: Build
      #   run: ...

      - name: Deploy
        run: |
          # Replace with your deployment script/command
          echo "Starting Rolling Deployment"
          # For example, if using Kubernetes:
          # kubectl rollout restart deployment/myapp

          # Monitoring the rollout status
          echo "Monitoring deployment status"
          # Example: kubectl rollout status deployment/myapp

배포 전략 기준

좋은 배포 전략은 조직의 트래픽 규모, 장애 허용 범위, monitoring 수준, rollback 자동화 수준에 맞아야 한다. 모든 서비스에 canary가 정답은 아니고, 모든 작은 서비스에 blue-green이 필요한 것도 아니다. 중요한 것은 배포 방식이 아니라 실패했을 때의 다음 동작이 명확한가다.

이 예시에서는 실제 동작까지 구현하지 않고 의사코드만 남겨 두었다. 추후 Kubernetes를 사용한 배포를 별도로 작성해보려 한다.