[AEWS] #5์ฃผ์ฐจ Karpenter ์‹ค์Šต (3)

25๋…„๋„ AWS EKS Hands-on Study ์Šคํ„ฐ๋”” ์ •๋ฆฌ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค.

 

Karpenter ์‹ค์Šต

Karpenter์˜ ๋™์ž‘ ์›๋ฆฌ

Karpenter๋Š” ์ฟ ๋ฒ„๋„คํ‹ฐ์Šค์˜ ๋™์  ๋…ธ๋“œ ์ž๋™ ํ™•์žฅ(Autoscaling) ์†”๋ฃจ์…˜์ด๋‹ค.
์ฟ ๋ฒ„๋„คํ‹ฐ์Šค ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ์‚ฌ์šฉ์ž์˜ ์š”๊ตฌ์‚ฌํ•ญ์— ๋งž๊ฒŒ EC2 ์ธ์Šคํ„ด์Šค๋ฅผ ์ž๋™์œผ๋กœ ์ถ”๊ฐ€ ๋ฐ ์ œ๊ฑฐํ•˜๋Š” ์—ญํ• ์„ ํ•˜์—ฌ ์ฟ ๋ฒ„๋„คํ‹ฐ์Šค์˜ Cluster Autoscaler๋ณด๋‹ค ๋” ๋น ๋ฅด๊ณ  ํšจ์œจ์ ์œผ๋กœ ๋…ธ๋“œ๋ฅผ ๊ด€๋ฆฌํ•˜๋Š” ๋„๊ตฌ์ด๋‹ค.

 

Karpenter์˜ ํ•ต์‹ฌ ๊ตฌ์„ฑ ์š”์†Œ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

  1. Provisioner (NodePool)
    • Karpenter๊ฐ€ ์–ด๋–ค ๋…ธ๋“œ๋ฅผ ์ƒ์„ฑํ•ด์•ผ ํ•˜๋Š”์ง€ ๊ฒฐ์ •ํ•˜๋Š” ์ •์ฑ…์„ ์ •์˜ํ•˜๋Š” ๋ถ€๋ถ„์ด๋‹ค.
    • ํด๋Ÿฌ์Šคํ„ฐ์˜ ์›Œํฌ๋กœ๋“œ ์š”๊ตฌ ์‚ฌํ•ญ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ตœ์ ์˜ EC2 ์ธ์Šคํ„ด์Šค๋ฅผ ์„ ํƒํ•˜์—ฌ ์ƒ์„ฑํ•œ๋‹ค.
    • ํŠน์ • AZ์—์„œ ์˜จ๋””๋งจ๋“œ ๋…ธ๋“œ๋งŒ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜ ํŠน์ • ์ธ์Šคํ„ด์Šค ์œ ํ˜•๋งŒ ์‚ฌ์šฉํ•˜๋„๋ก ์ œํ•œํ•  ์ˆ˜ ์žˆ๋‹ค.
  2. Controller
    • ์ฟ ๋ฒ„๋„คํ‹ฐ์Šค์˜ API ์„œ๋ฒ„๋ฅผ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๋ฉฐ, ์ƒˆ๋กœ์šด ๋…ธ๋“œ๊ฐ€ ํ•„์š”ํ•˜๊ฑฐ๋‚˜ ๊ธฐ์กด ๋…ธ๋“œ๋ฅผ ์ •๋ฆฌํ•ด์•ผ ํ•˜๋Š” ์ƒํ™ฉ์„ ๊ฐ์ง€ํ•œ๋‹ค.
    • NodeClaim์„ ์ƒ์„ฑํ•˜์—ฌ EC2 ์ธ์Šคํ„ด์Šค๋ฅผ ์š”์ฒญํ•˜๊ณ  ๋…ธ๋“œ๊ฐ€ ํ•„์š” ์—†์–ด์ง€๋ฉด ์ •๋ฆฌํ•œ๋‹ค.
    • ํŒŒ๋“œ์˜ ์Šค์ผ€์ค„๋ง ์š”์ฒญ์„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๊ฐ์ง€ํ•˜์—ฌ ์ƒˆ๋กœ์šด ๋…ธ๋“œ๋ฅผ ๋™์ ์œผ๋กœ ์ƒ์„ฑํ•œ๋‹ค.
  3. ์›นํ›…
  • API ์„œ๋ฒ„์™€ ํ†ต์‹ ํ•˜๋ฉด์„œ, Karpenter๊ฐ€ ์ƒ์„ฑํ•˜๋Š” ๋ฆฌ์†Œ์Šค๋ฅผ ๊ฒ€์ฆํ•˜๊ณ  ์กฐ์ •ํ•œ๋‹ค.
  • MutatingWebhook๊ณผ ValidatingWebhook์„ ์‚ฌ์šฉํ•˜์—ฌ NodePool๊ณผ NodeClaim์„ ๊ด€๋ฆฌํ•œ๋‹ค.
  • ์ž˜๋ชป๋œ ์š”์ฒญ์ด ๋“ค์–ด์˜ค์ง€ ์•Š๋„๋ก ํ•„ํ„ฐ๋งํ•˜๋Š” ์—ญํ• ์„ ํ•œ๋‹ค.

 

์Šค์ผ€์ผ์—… ํ…Œ์ŠคํŠธ

cat <<EOF | envsubst | kubectl apply -f -
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: kubernetes.io/os
          operator: In
          values: ["linux"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["2"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      expireAfter: 720h # 30 * 24h = 720h
  limits:
    cpu: 1000
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  role: "KarpenterNodeRole-${CLUSTER_NAME}" # replace with your cluster name
  amiSelectorTerms:
    - alias: "al2023@${ALIAS_VERSION}" # ex) al2023@latest
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}" # replace with your cluster name
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: "${CLUSTER_NAME}" # replace with your cluster name
EOF

 

์˜จ๋””๋งจ๋“œ ์ธ์Šคํ„ด์Šค๋ฅผ ํ™œ์šฉํ•˜์—ฌ karpenter ์‹ค์Šต ํ™˜๊ฒฝ์„ ๊ตฌ์ถ•ํ•œ๋‹ค.

 

kubectl get nodepool,ec2nodeclass,nodeclaims
NAME                            NODECLASS   NODES   READY   AGE
nodepool.karpenter.sh/default   default     0       True    4m3s

NAME                                     READY   AGE
ec2nodeclass.karpenter.k8s.aws/default   True    4m3s

 

 

์ฒ˜์Œ์—๋Š” 5๊ฐœ๊ฐ€ ํŽœ๋”ฉ ์ƒํƒœ๋กœ ๋ผ์žˆ์œผ๋‚˜

 

 

์‹œ๊ฐ„์ด ์ง€๋‚˜๋ฉด์„œ ๋…ธ๋“œ๊ฐ€ ์ฆ์„ค๋˜๊ณ  ํŒŒ๋“œ๊ฐ€ ํ• ๋‹น๋˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

์ƒ์„ฑ๋œ ์›Œ์ปค ๋…ธ๋“œ

kubectl get nodeclaims
NAME            TYPE          CAPACITY    ZONE              NODE                                                 READY   AGE
default-hw4wj   c5a.2xlarge   on-demand   ap-northeast-2b   ip-192-168-105-117.ap-northeast-2.compute.internal   True    8m44s

 

 

karpenter ๋กœ๊ทธ ๋ถ„์„

 

์นดํŽœํ„ฐ๊ฐ€ api ์š”์ฒญ์„ ํ•˜๋Š” ๋ถ€๋ถ„์„ ์‚ดํŽด๋ณด๋ฉด lawest-price๋ฅผ ์„ ํƒํ•ด์„œ ์›Œ์ปค ๋…ธ๋“œ๋ฅผ ์ƒ์„ฑํ•˜๊ฒŒ ๋œ๋‹ค.

{
  "level": "INFO",
  "time": "2025-03-07T14:43:55.112Z",
  "logger": "controller.controller-runtime.metrics",
  "message": "Starting metrics server",
  "commit": "058c665"
}
{
  "level": "INFO",
  "time": "2025-03-07T14:43:55.112Z",
  "logger": "controller.controller-runtime.metrics",
  "message": "Serving metrics server",
  "commit": "058c665",
  "bindAddress": ":8080",
  "secure": false
}

 

 

Karpenter ์ปจํŠธ๋กค๋Ÿฌ๊ฐ€ ์‹œ์ž‘๋˜๋ฉด์„œ ๋ฉ”ํŠธ๋ฆญ ์„œ๋ฒ„๊ฐ€ ์‹œ์ž‘๋œ๋‹ค.

{
  "level": "INFO",
  "time": "2025-03-07T15:40:15.226Z",
  "logger": "controller",
  "message": "launched nodeclaim",
  "commit": "058c665",
  "controller": "nodeclaim.lifecycle",
  "controllerGroup": "karpenter.sh",
  "controllerKind": "NodeClaim",
  "NodeClaim": {
    "name": "default-hw4wj"
  },
  "namespace": "",
  "name": "default-hw4wj",
  "reconcileID": "bab16b84-9695-4e51-8e0c-481e7f9d804d",
  "provider-id": "aws:///ap-northeast-2b/i-0335961f7c4185177",
  "instance-type": "c5a.2xlarge",
  "zone": "ap-northeast-2b",
  "capacity-type": "on-demand",
  "allocatable": {
    "cpu": "7910m",
    "ephemeral-storage": "17Gi",
    "memory": "14162Mi",
    "pods": "58",
    "vpc.amazonaws.com/pod-eni": "38"
  }
}

 

Karpenter๊ฐ€ ์ƒˆ๋กœ์šด ๋…ธ๋“œ(default-hw4wj)๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.

{
  "level": "INFO",
  "time": "2025-03-07T15:40:33.750Z",
  "logger": "controller",
  "message": "registered nodeclaim",
  "commit": "058c665",
  "controller": "nodeclaim.lifecycle",
  "controllerGroup": "karpenter.sh",
  "controllerKind": "NodeClaim",
  "NodeClaim": {
    "name": "default-hw4wj"
  },
  "namespace": "",
  "name": "default-hw4wj",
  "reconcileID": "6ff1af57-3e4f-4e67-b22c-957d23f60776",
  "provider-id": "aws:///ap-northeast-2b/i-0335961f7c4185177",
  "Node": {
    "name": "ip-192-168-105-117.ap-northeast-2.compute.internal"
  }
}

 

๋…ธ๋“œ(default-hw4wj)๊ฐ€ Kubernetes ํด๋Ÿฌ์Šคํ„ฐ์— ๋“ฑ๋ก๋œ๋‹ค.

 

{
  "level": "INFO",
  "time": "2025-03-07T15:40:43.686Z",
  "logger": "controller",
  "message": "initialized nodeclaim",
  "commit": "058c665",
  "controller": "nodeclaim.lifecycle",
  "controllerGroup": "karpenter.sh",
  "controllerKind": "NodeClaim",
  "NodeClaim": {
    "name": "default-hw4wj"
  },
  "namespace": "",
  "name": "default-hw4wj",
  "reconcileID": "6affb0d5-822c-460d-a82d-bc7c01b21be5",
  "provider-id": "aws:///ap-northeast-2b/i-0335961f7c4185177",
  "Node": {
    "name": "ip-192-168-105-117.ap-northeast-2.compute.internal"
  },
  "allocatable": {
    "cpu": "7910m",
    "ephemeral-storage": "18181869946",
    "hugepages-1Gi": "0",
    "hugepages-2Mi": "0",
    "memory": "15140112Ki",
    "pods": "58"
  }
}

 

๋…ธ๋“œ๊ฐ€ ์ •์ƒ์ ์œผ๋กœ ์ดˆ๊ธฐํ™”๋˜๊ณ  (initialized nodeclaim), ํ• ๋‹น ๊ฐ€๋Šฅ ๋ฆฌ์†Œ์Šค(allocatable)๊ฐ€ ๋‹ค์‹œ ํ‘œ์‹œ๋จ์œผ๋กœ์จ ๋ฆฌ์†Œ์Šค๊ฐ€ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์ค€๋น„ ์ƒํƒœ์ด๋‹ค.

 

์Šค์ผ€์ผ ๋‹ค์šด

kubectl scale deployment/inflate --replicas 1

 

๋ ˆํ”Œ๋ฆฌ์นด๋ฅผ 1๋กœ ํ•˜์—ฌ ์Šค์ผ€์ผ ๋‹ค์šด์„ ํ•ด๋ณธ๋‹ค.

์Šค์ผ€์ผ ๋‹ค์šด ์‹œ ํ•œ๋ฒˆ์— ์›Œ์ปค ๋…ธ๋“œ ๊ฐ์†Œ ๋ฟ๋ฝ๋ฝํ•˜๊ณ  ์ค„์–ด๋“œ๋Š” ๊ฒƒ์ด ์•„๋‹Œ, ์ ์ง„์ ์œผ๋กœ ์ค„์–ด๋“œ๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

{
  "level": "INFO",
  "time": "2025-03-07T15:57:32.564Z",
  "logger": "controller",
  "message": "disrupting nodeclaim(s) via replace, terminating 1 nodes (1 pods) ip-192-168-105-117.ap-northeast-2.compute.internal/c5a.2xlarge/on-demand and replacing with on-demand node from types c5a.large, c7i-flex.large, c5.large, c6i.large, c7i.large and 52 other(s)",
  "reason": "underutilized"
}

 

 

Karpenter๋Š” ๊ธฐ์กด c5a.2xlarge ๋…ธ๋“œ๊ฐ€ ์‚ฌ์šฉ๋ฅ ์ด ๊ฐ์†Œํ•˜๊ณ  ์žˆ๋‹ค(underutilized)๋ผ๊ณ  ํŒ๋‹จํ•˜๊ณ  ํ•ด๋‹น ๋…ธ๋“œ๋ฅผ ์ข…๋ฃŒํ•˜๊ณ  ๋” ์ž‘์€ ๋…ธ๋“œ๋กœ ์ž๋™ ๋ฆฌ์†Œ์Šค ์ตœ์ ํ™” ๊ธฐ๋Šฅ์„ ํ†ตํ•ด ๊ต์ฒดํ•œ๋‹ค.

 

 

{
  "level": "INFO",
  "time": "2025-03-07T15:57:54.489Z",
  "logger": "controller",
  "message": "registered nodeclaim",
  "NodeClaim": {
    "name": "default-rzhmm"
  },
  "provider-id": "aws:///ap-northeast-2b/i-06669f2b43bb3a7ad",
  "Node": {
    "name": "ip-192-168-2-126.ap-northeast-2.compute.internal"
  }
}

 

๋” ์ž‘์€ ์‚ฌ์ด์ฆˆ์˜ ์ƒˆ๋กœ์šด ๋…ธ๋“œ(default-rzhmm)๊ฐ€ ํด๋Ÿฌ์Šคํ„ฐ์— ๋“ฑ๋ก๋˜๋Š”๋ฐ ์ƒˆ๋กœ์šด ์ธ์Šคํ„ด์Šค๋Š” ap-northeast-2b ๊ฐ€์šฉ ์˜์—ญ์— ๋ฐฐ์น˜๋œ๋‹ค.

ํด๋Ÿฌ์Šคํ„ฐ๋Š” ์ ์ง„์ ์œผ๋กœ ๋” ์ž‘์€ ์ธ์Šคํ„ด์Šค๋ฅผ ํ™œ์šฉํ•˜๋„๋ก ์กฐ์ •๋œ๋‹ค.

 

{
  "level": "INFO",
  "time": "2025-03-07T15:58:12.143Z",
  "logger": "controller",
  "message": "initialized nodeclaim",
  "NodeClaim": {
    "name": "default-rzhmm"
  },
  "allocatable": {
    "cpu": "1930m",
    "memory": "3229360Ki",
    "pods": "29"
  }
}

 

 

๊ธฐ์กด ํฐ ๋…ธ๋“œ ๋Œ€์‹  ๋” ์ž‘์€ ๋…ธ๋“œ๊ฐ€ ์‚ฌ์šฉ๋˜๋ฉด์„œ ์ž์›์„ ์ตœ์ ํ™”ํ•œ๋‹ค.

 

{
  "level": "INFO",
  "time": "2025-03-07T15:58:18.998Z",
  "logger": "controller",
  "message": "tainted node",
  "Node": {
    "name": "ip-192-168-105-117.ap-northeast-2.compute.internal"
  },
  "taint.Key": "karpenter.sh/disrupted",
  "taint.Effect": "NoSchedule"
}

 

 

 

๊ธฐ์กด ๋…ธ๋“œ ip-192-168-105-117 ์— NoSchedule ํƒœ์ธํŠธ๊ฐ€ ์ ์šฉ๋˜์–ด, ์ด ๋…ธ๋“œ์—๋Š” ์ƒˆ๋กœ์šด ํŒŒ๋“œ๋ฅผ ๋ฐฐ์น˜ํ•  ์ˆ˜ ์—†๋„๋กํ•œ๋‹ค.

 

{
  "level": "INFO",
  "time": "2025-03-07T15:59:02.087Z",
  "logger": "controller",
  "message": "deleted node",
  "Node": {
    "name": "ip-192-168-105-117.ap-northeast-2.compute.internal"
  }
}

 

 

ํด๋Ÿฌ์Šคํ„ฐ์—์„œ ๋” ์ด์ƒ ํ•„์š” ์—†๋Š” ๋…ธ๋“œ๋ฅผ ์ œ๊ฑฐํ•œ๋‹ค.

{
  "level": "INFO",
  "time": "2025-03-07T15:59:02.338Z",
  "logger": "controller",
  "message": "deleted nodeclaim",
  "NodeClaim": {
    "name": "default-hw4wj"
  }
}

 

๊ธฐ์กด NodeClaim(default-hw4wj)์ด ์‚ญ์ œ๋œ๋‹ค.

 

์›Œ์ปค ๋…ธ๋“œ ํ™•์ธ

kubectl get nodeclaims
NAME            TYPE        CAPACITY    ZONE              NODE                                               READY   AGE
default-rzhmm   c5a.large   on-demand   ap-northeast-2b   ip-192-168-2-126.ap-northeast-2.compute.internal   True    8m1s

 

c5a.2xlarge์—์„œ c5a.large๋กœ ๊ฐ์†Œ ์™„๋ฃŒ๋œ ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

Spot-to-Spot Consolidation ์‹ค์Šต

Karpenter๋ฅผ ์ด์šฉํ•ด AWS EC2 Spot ์ธ์Šคํ„ด์Šค๋ฅผ ์ž๋™์œผ๋กœ ๊ด€๋ฆฌํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ…Œ์ŠคํŠธํ•ด๋ณธ๋‹ค.

 

Karpenter node pool, ec2 node class ์ƒ์„ฑ

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]

 

EC2 nodeclass ์ƒ์„ฑ

apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  role: "KarpenterNodeRole-${CLUSTER_NAME}"
  amiSelectorTerms:
    - alias: "bottlerocket@latest"

 

ํ…Œ์ŠคํŠธ ์›Œํฌ๋กœ๋“œ ๋ฐฐํฌ

apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 5
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      containers:
      - name: inflate
        image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
        resources:
          requests:
            cpu: 1
            memory: 1.5Gi

 

 

"error": "launching nodeclaim, creating instance, AuthFailure.ServiceLinkedRoleCreationNotPermitted: The provided credentials do not have permission to create the service-linked role for EC2 Spot Instances."

 

๊ทธ๋Ÿฐ๋ฐ ์‹ค์Šต์„ ํ•˜๋‹ค๊ฐ€ ์œ„์™€ ๊ฐ™์ด ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ–ˆ๋Š”๋ฐ

aws iam create-service-linked-role --aws-service-name spot.amazonaws.com

 

Karpenter์˜ IAM Role(KarpenterNodeRole-${CLUSTER_NAME})์— EC2 Spot ๊ด€๋ จ ์—ญํ• ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ๊ถŒํ•œ์„ ์ถ”๊ฐ€ํ•ด์ฃผ์—ˆ๋‹ค.

 

์Šค์ผ€์ผ ์—…

kubectl get nodeclaims
NAME            TYPE          CAPACITY   ZONE              NODE                                                 READY   AGE
default-qwvbv   c6g.2xlarge   spot       ap-northeast-2d   ip-192-168-132-105.ap-northeast-2.compute.internal   True    4m55s
default-qxbz6   c6g.2xlarge   spot       ap-northeast-2d   ip-192-168-46-137.ap-northeast-2.compute.internal    True    59s

 

์Šค์ผ€์ผ ๋‹ค์šด

kubectl get nodeclaims
NAME            TYPE          CAPACITY   ZONE              NODE                                                READY   AGE
default-qxbz6   c6g.2xlarge   spot       ap-northeast-2d   ip-192-168-46-137.ap-northeast-2.compute.internal   True    4m43s