Amazon EKS Auto Mode を試してみました

2024年12月1日、 Amazon Elastic Kubernetes Service (Amazon EKS) の Auto Mode が発表されました。

これは、Kubernetes クラスターのコンピューティング、ストレージ、およびネットワーキングの管理を完全に自動化する新機能で、いくつかの特徴があります。

今回は、その中で、

* 効率性 * アプリケーション可用性 * アップグレード

について、実際に動作を確認してみました。

EKS Auto Mode の特徴

Kubernetes クラスター管理の合理化:

EKS 自動モードは、運用オーバーヘッドを最小限に抑えながら本番環境対応のクラスターを提供することで、EKS 管理を合理化します。
EKS 自動モードを使用すると、EKS に関する深い専門知識を必要とせずに、要求の厳しい動的なワークロードを自信を持って実行できます。

効率性:

EKS 自動モードは、NodePool とワークロード要件によって定義された柔軟性を遵守しながらコストを計算するように設計されています。
また、未使用のインスタンスを終了し、ワークロードを他のワーカーノードに統合して、コスト効率を向上させます。

アプリケーションの可用性:

EKS 自動モードでは、Kubernetes アプリケーションの需要に基づいて EKS クラスター内のワーカーノードが動的に追加または削除されます。
これにより、手動での容量計画の必要性が最小限に抑えられ、アプリケーションの可用性が確保されます。

セキュリティ:

EKS 自動モードでは、ワーカーノードに対して不変として扱われる AMI を使用します。
これらの AMI は、ロックダウンされたソフトウェアを強制し、SELinux の強制アクセス制御を有効にし、読み取り専用のルートファイルシステムを提供します。
さらに、EKS 自動モードで起動されたワーカーノードの最大有効期間は 21 日間 (短縮可能) で、その後は自動的に新しいワーカーノードに置き換えられます。
このアプローチは、多くのお客様がすでに採用しているベストプラクティスに沿って、定期的にワーカーノードを循環させることでセキュリティ体制を強化します。

自動アップグレード:

EKS 自動モードでは、構成されたポッド中断予算 (PDB) とワーカーノードプール中断予算 (NDB) を尊重しながら、Kubernetes クラスター、ワーカーノード、および関連コンポーネントを最新のパッチで最新の状態に保ちます。
最大 21 日間の有効期間では、PDB をブロックしたり、その他の構成によって更新が妨げられたりする場合は、介入が必要になることがあります。

管理対象コンポーネント:

EKS 自動モードには、Kubernetes と AWS クラウド機能がコアコンポーネントとして含まれており、通常はアドオンとして管理する必要があります。
これには、ポッド IP アドレスの割り当て、ポッドネットワークポリシー、ローカル DNS サービス、GPU プラグイン、ヘルスチェッカー、EBS CSI ストレージの組み込みサポートが含まれます。

カスタマイズ可能な NodePool と NodeClass :

ワークロードでストレージ、コンピューティング、またはネットワーク構成の変更が必要な場合は、EKS 自動モードを使用してカスタム NodePool と NodeClass を作成できます。
デフォルトの NodePool と NodeClass は編集できませんが、特定の要件を満たすために、デフォルトの構成とともに新しいカスタム NodePool または NodeClass を追加できます。

動作確認の結果

効率性について

クラスタ作成直後、最初のアプリ（Pod)起動時にはじめてワーカノードが確保されます。
その後、Podを削除し「未使用のインスタンスが終了」することが確認できました。

アプリケーション可用性について

ポッド水平スケール型アプリの負荷増に伴い、必要に応じてワーカノードが追加されることを確認しました。
上記の「効率性」と合わせて、アプリケーション需要に応じて、必要なリソースが確保されることを確認できました。

アップグレードについて

コンソール画面よりアップデートをクリックすることで、マスターノードからワーカノードまで自動でアップグレードされることを確認できました。
この時、PDBで指定したアプリの可用性を維持した状態で、新たにアップグレード後のワーカノードでPodが再起動され、再起動が完了した後、アップグレード前のワーカノードが削除されることが確認できました。

[追加]Kubernetes クラスター管理の合理化

以上より、クラスタ管理の合理化という点で進化を伺わせる結果となりました。

（再掲）

EKS 自動モードは、運用オーバーヘッドを最小限に抑えながら本番環境対応のクラスターを提供することで、EKS 管理を合理化します。
EKS 自動モードを使用すると、EKS に関する深い専門知識を必要とせずに、要求の厳しい動的なワークロードを自信を持って実行できます。

動作確認の内容

クラスターの作成

作成にはマネジメントコンソールを利用しました。詳細手順は割愛します。詳細を知りたい方は公式ドキュメントを参照願います。
作成したクラスタの環境を構成します

[cloudshell-user@ip-10-134-24-146 ~]$ aws eks update-kubeconfig --name "${CLUSTER_NAME}"
Added new context arn:aws:eks:ap-northeast-1:account-id:cluster/interesting-funk-otter to /home/cloudshell-user/.kube/config

効率性の確認

クラスター作成直後のリソース状態を確認してみます。
従来（通常）参照できた kube-system ネームスペース内のリソースやワーカーノードは確認できません。

[cloudshell-user@ip-10-134-24-146 ~]$ kubectl get po -A
No resources found

[cloudshell-user@ip-10-134-24-146 ~]$ kubectl get po -n kube-system
No resources found in kube-system namespace.

[cloudshell-user@ip-10-134-24-146 ~]$ kubectl get node
No resources found

この状態で、NGINXのPodを作成してみます。

[cloudshell-user@ip-10-132-90-84 ~]$ kubectl run nginx --image=nginx
pod/nginx created
[cloudshell-user@ip-10-132-90-84 ~]$ kubectl get po
NAME    READY   STATUS    RESTARTS   AGE
nginx   0/1     Pending   0          10s
[cloudshell-user@ip-10-132-90-84 ~]$ kubectl get po
NAME    READY   STATUS    RESTARTS   AGE
nginx   0/1     Pending   0          14s
[cloudshell-user@ip-10-132-90-84 ~]$ kubectl get po
NAME    READY   STATUS    RESTARTS   AGE
nginx   0/1     Pending   0          17s
[cloudshell-user@ip-10-132-90-84 ~]$ kubectl get po
NAME    READY   STATUS    RESTARTS   AGE
nginx   0/1     Pending   0          20s
[cloudshell-user@ip-10-132-90-84 ~]$ kubectl get node
NAME                  STATUS   ROLES    AGE   VERSION
i-11111111111111111   Ready    <none>   5s    v1.31.1-eks-1b3e656
[cloudshell-user@ip-10-132-90-84 ~]$ kubectl get po
NAME    READY   STATUS              RESTARTS   AGE
nginx   0/1     ContainerCreating   0          34s
[cloudshell-user@ip-10-132-90-84 ~]$ kubectl get po
NAME    READY   STATUS              RESTARTS   AGE
nginx   0/1     ContainerCreating   0          38s
[cloudshell-user@ip-10-132-90-84 ~]$ kubectl get po
NAME    READY   STATUS              RESTARTS   AGE
nginx   0/1     ContainerCreating   0          45s
[cloudshell-user@ip-10-132-90-84 ~]$ kubectl get po
NAME    READY   STATUS    RESTARTS   AGE
nginx   1/1     Running   0          52s

Pod起動直後、PodはPending 状態で、その後ワーカーノードが確保された段階で、ContaineCreateting となり、52秒後には Running となります。
Podが起動されて、はじめて i-11111111111111111 ワーカーノードが作成されました。需要に応じてリソースが確保されています。
Podを削除すると、しばらくしてワーカーノードも削除されます。

[cloudshell-user@ip-10-132-90-84 ~]$ kubectl delete po nginx
pod "nginx" deleted
[cloudshell-user@ip-10-132-90-84 ~]$ kubectl get po
No resources found in default namespace.
[cloudshell-user@ip-10-132-90-84 ~]$ kubectl get node
NAME                  STATUS   ROLES    AGE   VERSION
i-11111111111111111   Ready    <none>   8m    v1.31.1-eks-1b3e656
[cloudshell-user@ip-10-132-90-84 ~]$ kubectl get node
NAME                  STATUS   ROLES    AGE    VERSION
i-11111111111111111      Ready    <none>   8m4s   v1.31.1-eks-1b3e656
[cloudshell-user@ip-10-132-90-84 ~]$ kubectl get node
NAME                  STATUS   ROLES    AGE    VERSION
i-11111111111111111   Ready    <none>   8m8s   v1.31.1-eks-1b3e656
[cloudshell-user@ip-10-132-90-84 ~]$ kubectl get node
NAME                  STATUS   ROLES    AGE     VERSION
i-11111111111111111   Ready    <none>   8m11s   v1.31.1-eks-1b3e656
[cloudshell-user@ip-10-132-90-84 ~]$ kubectl get node
NAME                  STATUS   ROLES    AGE     VERSION
i-11111111111111111   Ready    <none>   8m18s   v1.31.1-eks-1b3e656
[cloudshell-user@ip-10-132-90-84 ~]$ kubectl get node
NAME                  STATUS   ROLES    AGE     VERSION
i-11111111111111111   Ready    <none>   8m30s   v1.31.1-eks-1b3e656
[cloudshell-user@ip-10-132-90-84 ~]$ kubectl get node
NAME                  STATUS   ROLES    AGE     VERSION
i-11111111111111111   Ready    <none>   8m46s   v1.31.1-eks-1b3e656
[cloudshell-user@ip-10-132-90-84 ~]$ kubectl get node
NAME                  STATUS   ROLES    AGE     VERSION
i-11111111111111111   Ready    <none>   8m59s   v1.31.1-eks-1b3e656
[cloudshell-user@ip-10-132-90-84 ~]$ kubectl get node
NAME                  STATUS     ROLES    AGE     VERSION
i-11111111111111111   NotReady   <none>   9m10s   v1.31.1-eks-1b3e656
[cloudshell-user@ip-10-132-90-84 ~]$ kubectl get node
NAME                  STATUS     ROLES    AGE     VERSION
i-11111111111111111   NotReady   <none>   9m21s   v1.31.1-eks-1b3e656
[cloudshell-user@ip-10-132-90-84 ~]$ kubectl get node
No resources found

「未使用のインスタンスを終了」することが確認できました。
なるほど、需要に合わせてリソースを確保することでコスト効率を向上させているとは、このことでしょうか。

アプリケーション可用性の確認

続いて、Auto Scale の動作を見ていきます。
テスト用に php-apache デプロイメントと同じ名前のサービスを作成します。

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-apache
spec:
  selector:
    matchLabels:
      run: php-apache
  replicas: 1
  template:
    metadata:
      labels:
        run: php-apache
    spec:
      containers:
      - name: php-apache
        image: k8s.gcr.io/hpa-example
        ports:
        - containerPort: 80
        resources:
          limits:
            cpu: 500m
          requests:
            cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
  name: php-apache
  labels:
    run: php-apache
spec:
  ports:
  - port: 80
  selector:
    run: php-apache
EOF

php-apache デプロイメントに対して Horizontal Pod Auto Scaler (HPA) を作成します。（CPU使用率50%を超えると自動でスケール）

[cloudshell-user@ip-10-134-24-146 sample]$ kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
horizontalpodautoscaler.autoscaling/php-apache autoscaled

負荷生成用の load-generator デプロイメントを作成します。作成直後から負荷テストが実行されます。初期のレプリカ数は５．

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: load-generator
spec:
  replicas: 5
  selector:
    matchLabels:
      app: load-generator
  template:
    metadata:
      labels:
        app: load-generator
    spec:
      containers:
      - name: load-generator
        image: alpine
        command: ["/bin/sh", "-c"]
        args:
        - "apk add --no-cache curl && while true; do curl -s http://php-apache > /dev/null; done"
EOF

より負荷を上げるために load-generator のレプリカ数を５０に拡大して実行します。

[cloudshell-user@ip-10-134-24-146 ~]$ kubectl scale deploy load-generator --replicas=50
deployment.apps/load-generator scaled

ワーカーノードの状態を確認します。

[cloudshell-user@ip-10-134-24-146 ~]$ kubectl get node

NAME                  STATUS   ROLES    AGE   VERSION
i-22222222222222222   Ready    <none>   10m   v1.30.5-eks-baa6d11

[cloudshell-user@ip-10-134-24-146 ~]$ kubectl get node

NAME                  STATUS   ROLES    AGE   VERSION
i-22222222222222222   Ready    <none>   10m   v1.30.5-eks-baa6d11

[cloudshell-user@ip-10-134-24-146 ~]$ kubectl get node

NAME                  STATUS   ROLES    AGE   VERSION
i-33333333333333333   Ready    <none>   30s   v1.30.5-eks-baa6d11
i-22222222222222222   Ready    <none>   11m   v1.30.5-eks-baa6d11

テスト開始直後は、i-22222222222222222 ワーカーノード１台でしたが、レプリカ数の拡大で負荷が増加し２台目のワーカーノード i-33333333333333333 が作成されました。
アプリケーションの需要に基づいて EKS クラスター内のワーカーノードが動的に追加されることを確認できました。
これも、需要に合わせてリソースを確保することでコスト効率を向上させているポイントですね。

アップグレードの動作確認

最後にアップグレードの動作を確認します。
アップグレードインサイトを確認します。
アップグレードインサイト
１件存在するようです。
インサイトステータス：合格

[cloudshell-user@ip-10-134-1-45 ~]$ aws eks list-insights --region ap-northeast-1 --cluster-name $CLUSTER

{
    "insights": [
        {
            "id": "e93f18ce-bb38-43c5-8128-1fb4d9eff704",
            "name": "Deprecated APIs removed in Kubernetes v1.32",
            "category": "UPGRADE_READINESS",
            "kubernetesVersion": "1.32",
            "lastRefreshTime": "2024-12-11T07:54:58+00:00",
            "lastTransitionTime": "2024-12-11T07:54:57+00:00",
            "description": "Checks for usage of deprecated APIs that are scheduled for removal in Kubernetes v1.32. Upgrading your cluster before migrating to the updated APIs supported by v1.32 could cause application impact.",
            "insightStatus": {
                "status": "PASSING",
                "reason": "No deprecated API usage detected within the last 30 days."
            }
        }
    ]
}

[cloudshell-user@ip-10-134-1-45 ~]$ aws eks describe-insight --region ap-northeast-1 --id e93f18ce-bb38-43c5-8128-1fb4d9eff704 --cluster-name $CLUSTER

{
    "insight": {
        "id": "e93f18ce-bb38-43c5-8128-1fb4d9eff704",
        "name": "Deprecated APIs removed in Kubernetes v1.32",
        "category": "UPGRADE_READINESS",
        "kubernetesVersion": "1.32",
        "lastRefreshTime": "2024-12-11T07:54:58+00:00",
        "lastTransitionTime": "2024-12-11T07:54:57+00:00",
        "description": "Checks for usage of deprecated APIs that are scheduled for removal in Kubernetes v1.32. Upgrading your cluster before migrating to the updated APIs supported by v1.32 could cause application impact.",
        "insightStatus": {
            "status": "PASSING",
            "reason": "No deprecated API usage detected within the last 30 days."
        },
        "recommendation": "Update manifests and API clients to use newer Kubernetes APIs if applicable before upgrading to Kubernetes v1.32.",
        "additionalInfo": {
            "EKS update cluster documentation": "https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html",
            "Kubernetes v1.32 deprecation guide": "https://kubernetes.io/docs/reference/using-api/deprecation-guide/#v1-32"
        },
        "resources": [],
        "categorySpecificSummary": {
            "deprecationDetails": [
                {
                    "usage": "/apis/flowcontrol.apiserver.k8s.io/v1beta3/flowschemas",
                    "replacedWith": "/apis/flowcontrol.apiserver.k8s.io/v1/flowschemas",
                    "stopServingVersion": "1.32",
                    "startServingReplacementVersion": "1.29",
                    "clientStats": []
                },
                {
                    "usage": "/apis/flowcontrol.apiserver.k8s.io/v1beta3/prioritylevelconfigurations",
                    "replacedWith": "/apis/flowcontrol.apiserver.k8s.io/v1/prioritylevelconfigurations",
                    "stopServingVersion": "1.32",
                    "startServingReplacementVersion": "1.29",
                    "clientStats": []
                }
            ]
        }
    }
}

ステータスが「合格 ( Passing ) 」となっているので、そのまま進めます。
IPアドレスの空きを確認します。

[cloudshell-user@ip-10-134-1-45 ~]$ aws ec2 describe-subnets --subnet-ids \
> $(aws eks describe-cluster --name ${CLUSTER} \
> --query 'cluster.resourcesVpcConfig.subnetIds' \
> --output text) \
> --query 'Subnets[*].[SubnetId,AvailabilityZone,AvailableIpAddressCount]' \
> --output table
---------------------------------------------------------
|                    DescribeSubnets                    |
+---------------------------+-------------------+-------+
|  subnet-xxxxxxxxxxxxxxxxx |  ap-northeast-1c  |  4090 |
|  subnet-yyyyyyyyyyyyyyyyy |  ap-northeast-1a  |  4090 |
+---------------------------+-------------------+-------+

十分に空きIPアドレスはあるようなので IAM ROle と Add-on を確認します。

[cloudshell-user@ip-10-134-1-45 ~]$ aws iam get-role --role-name ${ROLE_ARN##*/} \

>   --query 'Role.AssumeRolePolicyDocument'
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "eks.amazonaws.com"
            },
            "Action": [
                "sts:AssumeRole",
                "sts:TagSession"
            ]
        }
    ]
}

[cloudshell-user@ip-10-134-1-45 ~]$ aws eks list-addons --cluster-name $CLUSTER

{
    "addons": []
}

アップデートによるアプリへの影響を確認するため、テスト用の myapp でデプロイメントを作成します。
myapp には PDB（PodDisruptionBudget）を定義し最低でもに80%の可用性を確保します。
可用性を確認するために、myapp はレプリカ数を１０とします。

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: myapp
spec:
  minAvailable: "80%"
  selector:
    matchLabels:
      app: myapp
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 10
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
        name: myapp
        resources:
          requests:
            cpu: "1"
            memory: 256M
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app: host-zone-spread
        maxSkew: 2
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
      - labelSelector:
          matchLabels:
            app: host-zone-spread
        maxSkew: 2
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule

上記をデプロイし、Pod がRuning状態になるのを待ちます。

[cloudshell-user@ip-10-132-66-135 sample]$ kubectl apply -f pdb.yaml
poddisruptionbudget.policy/myapp created

[cloudshell-user@ip-10-132-66-135 sample]$ kubectl apply -f myapp.yaml
deployment.apps/myapp created

[cloudshell-user@ip-10-132-66-135 sample]$ kubectl get node
No resources found

[cloudshell-user@ip-10-132-66-135 sample]$ kubectl get node
No resources found

[cloudshell-user@ip-10-132-66-135 sample]$ kubectl get pod

NAME                    READY   STATUS    RESTARTS   AGE
myapp-5865fb57f-7rsqm   0/1     Pending   0          22s
myapp-5865fb57f-9d6k6   0/1     Pending   0          22s
myapp-5865fb57f-9g2js   0/1     Pending   0          22s
myapp-5865fb57f-bdsvj   0/1     Pending   0          22s
myapp-5865fb57f-h9rj9   0/1     Pending   0          22s
myapp-5865fb57f-hp9g5   0/1     Pending   0          22s
myapp-5865fb57f-r8dzc   0/1     Pending   0          22s
myapp-5865fb57f-rp7kx   0/1     Pending   0          22s
myapp-5865fb57f-sjwrm   0/1     Pending   0          22s
myapp-5865fb57f-x2sdq   0/1     Pending   0          22s

[cloudshell-user@ip-10-132-66-135 sample]$ kubectl get node
No resources found

[cloudshell-user@ip-10-132-66-135 sample]$ kubectl get node

NAME                  STATUS   ROLES    AGE   VERSION
i-44444444444444444   Ready    <none>   8s    v1.30.5-eks-baa6d11

[cloudshell-user@ip-10-132-66-135 sample]$ kubectl get pod

NAME                    READY   STATUS              RESTARTS   AGE
myapp-5865fb57f-7rsqm   0/1     ContainerCreating   0          38s
myapp-5865fb57f-9d6k6   0/1     ContainerCreating   0          38s
myapp-5865fb57f-9g2js   0/1     ContainerCreating   0          38s
myapp-5865fb57f-bdsvj   0/1     ContainerCreating   0          38s
myapp-5865fb57f-h9rj9   0/1     ContainerCreating   0          38s
myapp-5865fb57f-hp9g5   0/1     ContainerCreating   0          38s
myapp-5865fb57f-r8dzc   0/1     ContainerCreating   0          38s
myapp-5865fb57f-rp7kx   0/1     ContainerCreating   0          38s
myapp-5865fb57f-sjwrm   0/1     ContainerCreating   0          38s
myapp-5865fb57f-x2sdq   0/1     ContainerCreating   0          38s

[cloudshell-user@ip-10-132-66-135 sample]$ kubectl get node

NAME                  STATUS   ROLES    AGE   VERSION
i-44444444444444444   Ready    <none>   19s   v1.30.5-eks-baa6d11

[cloudshell-user@ip-10-132-66-135 sample]$ kubectl get pod

NAME                    READY   STATUS              RESTARTS   AGE
myapp-5865fb57f-7rsqm   1/1     Running             0          50s
myapp-5865fb57f-9d6k6   1/1     Running             0          50s
myapp-5865fb57f-9g2js   1/1     Running             0          50s
myapp-5865fb57f-bdsvj   1/1     Running             0          50s
myapp-5865fb57f-h9rj9   1/1     Running             0          50s
myapp-5865fb57f-hp9g5   0/1     ContainerCreating   0          50s
myapp-5865fb57f-r8dzc   1/1     Running             0          50s
myapp-5865fb57f-rp7kx   0/1     ContainerCreating   0          50s
myapp-5865fb57f-sjwrm   1/1     Running             0          50s
myapp-5865fb57f-x2sdq   1/1     Running             0          50s

ワーカーノードの詳細を確認します。

[cloudshell-user@ip-10-132-66-135 sample]$ kubectl describe node i-44444444444444444   

Name:               i-44444444444444444   
Roles:              <none>
Labels:             app.kubernetes.io/managed-by=eks
                    beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=c5a.4xlarge
                    beta.kubernetes.io/os=linux
                    eks.amazonaws.com/compute-type=auto
                    eks.amazonaws.com/instance-category=c
                    eks.amazonaws.com/instance-cpu=16
                    eks.amazonaws.com/instance-cpu-manufacturer=amd
                    eks.amazonaws.com/instance-cpu-sustained-clock-speed-mhz=3300
                    eks.amazonaws.com/instance-ebs-bandwidth=3170
                    eks.amazonaws.com/instance-encryption-in-transit-supported=true
                    eks.amazonaws.com/instance-family=c5a
                    eks.amazonaws.com/instance-generation=5
                    eks.amazonaws.com/instance-hypervisor=nitro
                    eks.amazonaws.com/instance-memory=32768
                    eks.amazonaws.com/instance-network-bandwidth=5000
                    eks.amazonaws.com/instance-size=4xlarge
                    eks.amazonaws.com/nodeclass=default
                    failure-domain.beta.kubernetes.io/region=ap-northeast-1
                    failure-domain.beta.kubernetes.io/zone=ap-northeast-1a
                    k8s.io/cloud-provider-aws=8929092e801005c613d919fa59bb4a6a
                    karpenter.sh/capacity-type=on-demand
                    karpenter.sh/initialized=true
                    karpenter.sh/nodepool=general-purpose
                    karpenter.sh/registered=true
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=i-44444444444444444   
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=c5a.4xlarge
                    topology.ebs.csi.eks.amazonaws.com/zone=ap-northeast-1a
                    topology.k8s.aws/zone-id=apne1-az4
                    topology.kubernetes.io/region=ap-northeast-1
                    topology.kubernetes.io/zone=ap-northeast-1a
Annotations:        alpha.kubernetes.io/provided-node-ip: 10.0.142.77
                    csi.volume.kubernetes.io/nodeid: {"ebs.csi.eks.amazonaws.com":"i-44444444444444444   "}
                    eks.amazonaws.com/nodeclass-hash: 3809868827326149578
                    eks.amazonaws.com/nodeclass-hash-version: v1
                    karpenter.sh/nodepool-hash: 4012513481623584108
                    karpenter.sh/nodepool-hash-version: v3
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 12 Dec 2024 06:51:22 +0000
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  i-44444444444444444   
  AcquireTime:     <unset>
  RenewTime:       Thu, 12 Dec 2024 07:01:02 +0000
Conditions:
  Type                    Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                    ------  -----------------                 ------------------                ------                       -------
  MemoryPressure          False   Thu, 12 Dec 2024 06:56:57 +0000   Thu, 12 Dec 2024 06:51:21 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure            False   Thu, 12 Dec 2024 06:56:57 +0000   Thu, 12 Dec 2024 06:51:21 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure             False   Thu, 12 Dec 2024 06:56:57 +0000   Thu, 12 Dec 2024 06:51:21 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                   True    Thu, 12 Dec 2024 06:56:57 +0000   Thu, 12 Dec 2024 06:51:21 +0000   KubeletReady                 kubelet is posting ready status
  ContainerRuntimeReady   True    Thu, 12 Dec 2024 06:56:27 +0000   Thu, 12 Dec 2024 06:51:27 +0000   ContainerRuntimeIsReady      Monitoring for the ContainerRuntime system is active
  StorageReady            True    Thu, 12 Dec 2024 06:56:27 +0000   Thu, 12 Dec 2024 06:51:27 +0000   DiskIsReady                  Monitoring for the Disk system is active
  NetworkingReady         True    Thu, 12 Dec 2024 06:56:27 +0000   Thu, 12 Dec 2024 06:51:27 +0000   NetworkingIsReady            Monitoring for the Networking system is active
  KernelReady             True    Thu, 12 Dec 2024 06:56:27 +0000   Thu, 12 Dec 2024 06:51:27 +0000   KernelIsReady                Monitoring for the Kernel system is active
Addresses:
  InternalIP:   10.0.142.77
  InternalDNS:  ip-10-0-142-77.ap-northeast-1.compute.internal
  Hostname:     ip-10-0-142-77.ap-northeast-1.compute.internal
Capacity:
  cpu:                16
  ephemeral-storage:  83781432Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             32455708Ki
  pods:               110
Allocatable:
  cpu:                15890m
  ephemeral-storage:  76139225780
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             30853148Ki
  pods:               110
System Info:
  Machine ID:                 ec2b1408e63c0d220f96abacf8631237
  System UUID:                ec2b1408-e63c-0d22-0f96-abacf8631237
  Boot ID:                    6c78faf7-2a3d-44f7-b54b-4f0ba5fa7d47
  Kernel Version:             6.1.115
  OS Image:                   Bottlerocket (EKS Auto) 2024.12.8 (aws-k8s-1.30)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.7.22+bottlerocket
  Kubelet Version:            v1.30.5-eks-baa6d11
  Kube-Proxy Version:         v1.30.5-eks-baa6d11
ProviderID:                   aws:///ap-northeast-1a/i-44444444444444444   
Non-terminated Pods:          (10 in total)
  Namespace                   Name                     CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                     ------------  ----------  ---------------  -------------  ---
  default                     myapp-5865fb57f-7rsqm    1 (6%)        0 (0%)      256M (0%)        0 (0%)         10m
  default                     myapp-5865fb57f-9d6k6    1 (6%)        0 (0%)      256M (0%)        0 (0%)         10m
  default                     myapp-5865fb57f-9g2js    1 (6%)        0 (0%)      256M (0%)        0 (0%)         10m
  default                     myapp-5865fb57f-bdsvj    1 (6%)        0 (0%)      256M (0%)        0 (0%)         10m
  default                     myapp-5865fb57f-h9rj9    1 (6%)        0 (0%)      256M (0%)        0 (0%)         10m
  default                     myapp-5865fb57f-hp9g5    1 (6%)        0 (0%)      256M (0%)        0 (0%)         10m
  default                     myapp-5865fb57f-r8dzc    1 (6%)        0 (0%)      256M (0%)        0 (0%)         10m
  default                     myapp-5865fb57f-rp7kx    1 (6%)        0 (0%)      256M (0%)        0 (0%)         10m
  default                     myapp-5865fb57f-sjwrm    1 (6%)        0 (0%)      256M (0%)        0 (0%)         10m
  default                     myapp-5865fb57f-x2sdq    1 (6%)        0 (0%)      256M (0%)        0 (0%)         10m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                10 (62%)    0 (0%)
  memory             2560M (8%)  0 (0%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:
  Type     Reason                   Age                    From                   Message
  ----     ------                   ----                   ----                   -------
  Normal   Starting                 9m46s                  kube-proxy             
  Normal   NodeHasSufficientPID     9m48s (x2 over 9m48s)  kubelet                Node i-44444444444444444   status is now: NodeHasSufficientPID
  Normal   Starting                 9m48s                  kubelet                Starting kubelet.
  Warning  InvalidDiskCapacity      9m48s                  kubelet                invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory  9m48s (x2 over 9m48s)  kubelet                Node i-44444444444444444   status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    9m48s (x2 over 9m48s)  kubelet                Node i-44444444444444444   status is now: NodeHasNoDiskPressure
  Normal   NodeAllocatableEnforced  9m48s                  kubelet                Updated Node Allocatable limit across pods
  Normal   NodeReady                9m48s                  kubelet                Node i-44444444444444444   status is now: NodeReady
  Normal   Ready                    9m47s                  karpenter              Status condition transitioned, Type: Ready, Status: False -> True, Reason: KubeletReady, Message: kubelet is posting ready status
  Normal   RegisteredNode           9m46s                  node-controller        Node i-44444444444444444   event: Registered Node i-44444444444444444   in Controller
  Normal   Synced                   9m46s                  cloud-node-controller  Node synced successfully
  Normal   DisruptionBlocked        9m43s                  karpenter              Cannot disrupt Node: state node is nominated for a pending pod
  Normal   Unconsolidatable         9m3s                   karpenter              Can't replace with a cheaper node

では、いよいよ、コンソール画面よりアップグレードを実行します。
Kubernetesバージョンを1.30から1.31へアップグレードします。
マスターノードのアップグレードが完了しだい、ワーカーノードのアップグレードが始まります。

アップグレードの様子をワーカーノードの側面から見ていきます。

[cloudshell-user@ip-10-132-78-22 ~]$ kubectl get nodes --watch

NAME                  STATUS   ROLES    AGE   VERSION
i-44444444444444444   Ready    <none>   96m   v1.30.5-eks-baa6d11
i-55555555555555555 NotReady   <none>   0s    v1.31.1-eks-1b3e656
i-55555555555555555 NotReady   <none>   0s    v1.31.1-eks-1b3e656
i-55555555555555555 NotReady   <none>   0s    v1.31.1-eks-1b3e656
i-55555555555555555 Ready      <none>   0s    v1.31.1-eks-1b3e656
i-55555555555555555 Ready      <none>   0s    v1.31.1-eks-1b3e656
i-55555555555555555 Ready      <none>   0s    v1.31.1-eks-1b3e656
i-55555555555555555 Ready      <none>   0s    v1.31.1-eks-1b3e656
i-55555555555555555 Ready      <none>   0s    v1.31.1-eks-1b3e656
i-55555555555555555 Ready      <none>   1s    v1.31.1-eks-1b3e656
i-55555555555555555 Ready      <none>   1s    v1.31.1-eks-1b3e656
i-55555555555555555 Ready      <none>   4s    v1.31.1-eks-1b3e656
i-44444444444444444   Ready      <none>   96m   v1.30.5-eks-baa6d11
i-44444444444444444   Ready      <none>   96m   v1.30.5-eks-baa6d11

最初に作成された i-44444444444444444 に加え、２台目のワーカーノード i-55555555555555555 が作成され、最後は i-55555555555555555 が残ります。

[cloudshell-user@ip-10-132-78-22 ~]$ kubectl get nodes --watch

NAME                  STATUS   ROLES    AGE     VERSION
i-55555555555555555 Ready    <none>   9m40s   v1.31.1-eks-1b3e656

同様にアップグレードの様子をPodの側面から見ていきます。

[cloudshell-user@ip-10-132-78-22 ~]$ kubectl get pod --watch

NAME                    READY   STATUS    RESTARTS   AGE
myapp-5865fb57f-7jfln   1/1     Running   0          11s
myapp-5865fb57f-7rsqm   1/1     Running   0          97m
myapp-5865fb57f-9g2js   1/1     Running   0          97m
myapp-5865fb57f-btt2j   1/1     Running   0          30s
myapp-5865fb57f-d92s5   1/1     Running   0          1s
myapp-5865fb57f-gt4l7   1/1     Running   0          13s
myapp-5865fb57f-pnn8l   1/1     Running   0          12s
myapp-5865fb57f-pvd6r   1/1     Running   0          29s
myapp-5865fb57f-vpwzq   1/1     Running   0          1s
myapp-5865fb57f-x2sdq   1/1     Running   0          97m
myapp-5865fb57f-x2sdq   1/1     Running   0          97m
myapp-5865fb57f-x2sdq   1/1     Terminating   0          97m
myapp-5865fb57f-x2sdq   1/1     Terminating   0          97m
myapp-5865fb57f-s68bm   0/1     Pending       0          0s
myapp-5865fb57f-s68bm   0/1     Pending       0          0s
myapp-5865fb57f-s68bm   0/1     ContainerCreating   0          0s
myapp-5865fb57f-9g2js   1/1     Running             0          97m
myapp-5865fb57f-9g2js   1/1     Terminating         0          97m
myapp-5865fb57f-9g2js   1/1     Terminating         0          97m
myapp-5865fb57f-ljmnx   0/1     Pending             0          0s
myapp-5865fb57f-ljmnx   0/1     Pending             0          0s
myapp-5865fb57f-x2sdq   0/1     Completed           0          97m
myapp-5865fb57f-9g2js   0/1     Completed           0          97m
myapp-5865fb57f-ljmnx   0/1     ContainerCreating   0          1s
myapp-5865fb57f-s68bm   1/1     Running             0          1s
myapp-5865fb57f-x2sdq   0/1     Completed           0          97m
myapp-5865fb57f-x2sdq   0/1     Completed           0          97m
myapp-5865fb57f-9g2js   0/1     Completed           0          97m
myapp-5865fb57f-9g2js   0/1     Completed           0          97m
myapp-5865fb57f-ljmnx   1/1     Running             0          2s
myapp-5865fb57f-7rsqm   1/1     Running             0          97m
myapp-5865fb57f-7rsqm   1/1     Terminating         0          97m
myapp-5865fb57f-2rg5j   0/1     Pending             0          0s
myapp-5865fb57f-7rsqm   1/1     Terminating         0          97m
myapp-5865fb57f-2rg5j   0/1     Pending             0          0s
myapp-5865fb57f-2rg5j   0/1     ContainerCreating   0          0s
myapp-5865fb57f-7rsqm   0/1     Completed           0          97m
myapp-5865fb57f-7rsqm   0/1     Completed           0          97m
myapp-5865fb57f-7rsqm   0/1     Completed           0          97m
myapp-5865fb57f-2rg5j   1/1     Running             0          1s

上記の履歴を整理すると、以下の起動して間もない１０個のレプリカが起動され、可用性が確保されています。

myapp-5865fb57f-7jfln   1/1     Running   0          11s
myapp-5865fb57f-btt2j   1/1     Running   0          30s
myapp-5865fb57f-d92s5   1/1     Running   0          1s
myapp-5865fb57f-gt4l7   1/1     Running   0          13s
myapp-5865fb57f-pnn8l   1/1     Running   0          12s
myapp-5865fb57f-pvd6r   1/1     Running   0          29s
myapp-5865fb57f-vpwzq   1/1     Running   0          1s
myapp-5865fb57f-s68bm   1/1     Running             0          1s
myapp-5865fb57f-ljmnx   1/1     Running             0          2s
myapp-5865fb57f-2rg5j   1/1     Running             0          1s

これらの新たに生成されたPodは２台目のワーカーノードで実行されていることがわかります。
Kubernetesバージョンも 1.31 になっています。

[cloudshell-user@ip-10-132-78-22 ~]$ kubectl describe node i-55555555555555555 

Name:               i-55555555555555555 
Roles:              <none>
Labels:             app.kubernetes.io/managed-by=eks
                    beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=c5a.4xlarge
                    beta.kubernetes.io/os=linux
                    eks.amazonaws.com/compute-type=auto
                    eks.amazonaws.com/instance-category=c
                    eks.amazonaws.com/instance-cpu=16
                    eks.amazonaws.com/instance-cpu-manufacturer=amd
                    eks.amazonaws.com/instance-cpu-sustained-clock-speed-mhz=3300
                    eks.amazonaws.com/instance-ebs-bandwidth=3170
                    eks.amazonaws.com/instance-encryption-in-transit-supported=true
                    eks.amazonaws.com/instance-family=c5a
                    eks.amazonaws.com/instance-generation=5
                    eks.amazonaws.com/instance-hypervisor=nitro
                    eks.amazonaws.com/instance-memory=32768
                    eks.amazonaws.com/instance-network-bandwidth=5000
                    eks.amazonaws.com/instance-size=4xlarge
                    eks.amazonaws.com/nodeclass=default
                    failure-domain.beta.kubernetes.io/region=ap-northeast-1
                    failure-domain.beta.kubernetes.io/zone=ap-northeast-1a
                    k8s.io/cloud-provider-aws=8929092e801005c613d919fa59bb4a6a
                    karpenter.sh/capacity-type=on-demand
                    karpenter.sh/initialized=true
                    karpenter.sh/nodepool=general-purpose
                    karpenter.sh/registered=true
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=i-55555555555555555 
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=c5a.4xlarge
                    topology.ebs.csi.eks.amazonaws.com/zone=ap-northeast-1a
                    topology.k8s.aws/zone-id=apne1-az4
                    topology.kubernetes.io/region=ap-northeast-1
                    topology.kubernetes.io/zone=ap-northeast-1a
Annotations:        alpha.kubernetes.io/provided-node-ip: 10.0.140.203
                    csi.volume.kubernetes.io/nodeid: {"ebs.csi.eks.amazonaws.com":"i-55555555555555555 "}
                    eks.amazonaws.com/nodeclass-hash: 3809868827326149578
                    eks.amazonaws.com/nodeclass-hash-version: v1
                    karpenter.sh/nodepool-hash: 4012513481623584108
                    karpenter.sh/nodepool-hash-version: v3
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 12 Dec 2024 08:27:34 +0000
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  i-55555555555555555 
  AcquireTime:     <unset>
  RenewTime:       Thu, 12 Dec 2024 08:44:03 +0000
Conditions:
  Type                    Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                    ------  -----------------                 ------------------                ------                       -------
  MemoryPressure          False   Thu, 12 Dec 2024 08:43:23 +0000   Thu, 12 Dec 2024 08:27:32 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure            False   Thu, 12 Dec 2024 08:43:23 +0000   Thu, 12 Dec 2024 08:27:32 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure             False   Thu, 12 Dec 2024 08:43:23 +0000   Thu, 12 Dec 2024 08:27:32 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                   True    Thu, 12 Dec 2024 08:43:23 +0000   Thu, 12 Dec 2024 08:27:33 +0000   KubeletReady                 kubelet is posting ready status
  NetworkingReady         True    Thu, 12 Dec 2024 08:42:38 +0000   Thu, 12 Dec 2024 08:27:38 +0000   NetworkingIsReady            Monitoring for the Networking system is active
  KernelReady             True    Thu, 12 Dec 2024 08:42:38 +0000   Thu, 12 Dec 2024 08:27:38 +0000   KernelIsReady                Monitoring for the Kernel system is active
  ContainerRuntimeReady   True    Thu, 12 Dec 2024 08:42:38 +0000   Thu, 12 Dec 2024 08:27:38 +0000   ContainerRuntimeIsReady      Monitoring for the ContainerRuntime system is active
  StorageReady            True    Thu, 12 Dec 2024 08:42:38 +0000   Thu, 12 Dec 2024 08:27:38 +0000   DiskIsReady                  Monitoring for the Disk system is active
Addresses:
  InternalIP:   10.0.140.203
  InternalDNS:  ip-10-0-140-203.ap-northeast-1.compute.internal
  Hostname:     ip-10-0-140-203.ap-northeast-1.compute.internal
Capacity:
  cpu:                16
  ephemeral-storage:  83781432Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             32455708Ki
  pods:               110
Allocatable:
  cpu:                15890m
  ephemeral-storage:  76139225780
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             30853148Ki
  pods:               110
System Info:
  Machine ID:                 ec24ab668e09345f5f4b1c62388eff05
  System UUID:                ec24ab66-8e09-345f-5f4b-1c62388eff05
  Boot ID:                    97d92ddb-bd09-4938-802f-0eb5f57a3042
  Kernel Version:             6.1.115
  OS Image:                   Bottlerocket (EKS Auto) 2024.12.8 (aws-k8s-1.31)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.7.22+bottlerocket
  Kubelet Version:            v1.31.1-eks-1b3e656
  Kube-Proxy Version:         v1.31.1-eks-1b3e656
ProviderID:                   aws:///ap-northeast-1a/i-55555555555555555 
Non-terminated Pods:          (10 in total)
  Namespace                   Name                     CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                     ------------  ----------  ---------------  -------------  ---
  default                     myapp-5865fb57f-2rg5j    1 (6%)        0 (0%)      256M (0%)        0 (0%)         15m
  default                     myapp-5865fb57f-7jfln    1 (6%)        0 (0%)      256M (0%)        0 (0%)         16m
  default                     myapp-5865fb57f-btt2j    1 (6%)        0 (0%)      256M (0%)        0 (0%)         16m
  default                     myapp-5865fb57f-d92s5    1 (6%)        0 (0%)      256M (0%)        0 (0%)         15m
  default                     myapp-5865fb57f-gt4l7    1 (6%)        0 (0%)      256M (0%)        0 (0%)         16m
  default                     myapp-5865fb57f-ljmnx    1 (6%)        0 (0%)      256M (0%)        0 (0%)         15m
  default                     myapp-5865fb57f-pnn8l    1 (6%)        0 (0%)      256M (0%)        0 (0%)         16m
  default                     myapp-5865fb57f-pvd6r    1 (6%)        0 (0%)      256M (0%)        0 (0%)         16m
  default                     myapp-5865fb57f-s68bm    1 (6%)        0 (0%)      256M (0%)        0 (0%)         15m
  default                     myapp-5865fb57f-vpwzq    1 (6%)        0 (0%)      256M (0%)        0 (0%)         15m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                10 (62%)    0 (0%)
  memory             2560M (8%)  0 (0%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-1Gi      0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:
  Type     Reason                   Age                From                   Message
  ----     ------                   ----               ----                   -------
  Normal   Starting                 16m                kube-proxy             
  Normal   NodeHasSufficientPID     16m (x2 over 16m)  kubelet                Node i-55555555555555555 status is now: NodeHasSufficientPID
  Normal   Starting                 16m                kubelet                Starting kubelet.
  Warning  InvalidDiskCapacity      16m                kubelet                invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory  16m (x2 over 16m)  kubelet                Node i-55555555555555555 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    16m (x2 over 16m)  kubelet                Node i-55555555555555555 status is now: NodeHasNoDiskPressure
  Normal   NodeAllocatableEnforced  16m                kubelet                Updated Node Allocatable limit across pods
  Normal   NodeReady                16m                kubelet                Node i-55555555555555555 status is now: NodeReady
  Normal   Ready                    16m                karpenter              Status condition transitioned, Type: Ready, Status: False -> True, Reason: KubeletReady, Message: kubelet is posting ready status
  Normal   Synced                   16m                cloud-node-controller  Node synced successfully
  Normal   RegisteredNode           16m                node-controller        Node i-55555555555555555 event: Registered Node i-55555555555555555 in Controller
  Normal   DisruptionBlocked        16m                karpenter              Cannot disrupt Node: state node is nominated for a pending pod
  Normal   Unconsolidatable         14m                karpenter              Can't replace with a cheaper node

バージョン1.31にアップグレードされました。

ワーカーノードのアップグレードの動きをまとめると以下のようになります。
- アップグレード先バージョンのワーカーノードが追加される
- 稼働中のPodは一旦、Terminate され、追加されたワーカーノードで再度起動される
- この時、PDBで指定した可用性は維持される
- 全てのPodが再起動された後、もとの（旧バージョン）のワーカーノードが削除される
アプリの可用性を維持した状態でワーカノードを自動でアップグレードできることが確認できました。