25๋ ๋ AWS EKS Hands-on Study ์คํฐ๋ ์ ๋ฆฌ ๋ด์ฉ์ ๋๋ค.
Karpenter ์ค์ต
Karpenter์ ๋์ ์๋ฆฌ
Karpenter๋ ์ฟ ๋ฒ๋คํฐ์ค์ ๋์ ๋
ธ๋ ์๋ ํ์ฅ(Autoscaling) ์๋ฃจ์
์ด๋ค.
์ฟ ๋ฒ๋คํฐ์ค ํด๋ฌ์คํฐ์์ ์ฌ์ฉ์์ ์๊ตฌ์ฌํญ์ ๋ง๊ฒ EC2 ์ธ์คํด์ค๋ฅผ ์๋์ผ๋ก ์ถ๊ฐ ๋ฐ ์ ๊ฑฐํ๋ ์ญํ ์ ํ์ฌ ์ฟ ๋ฒ๋คํฐ์ค์ Cluster Autoscaler๋ณด๋ค ๋ ๋น ๋ฅด๊ณ ํจ์จ์ ์ผ๋ก ๋
ธ๋๋ฅผ ๊ด๋ฆฌํ๋ ๋๊ตฌ์ด๋ค.
Karpenter์ ํต์ฌ ๊ตฌ์ฑ ์์๋ ์๋์ ๊ฐ๋ค.
- Provisioner (NodePool)
- Karpenter๊ฐ ์ด๋ค ๋ ธ๋๋ฅผ ์์ฑํด์ผ ํ๋์ง ๊ฒฐ์ ํ๋ ์ ์ฑ ์ ์ ์ํ๋ ๋ถ๋ถ์ด๋ค.
- ํด๋ฌ์คํฐ์ ์ํฌ๋ก๋ ์๊ตฌ ์ฌํญ์ ๊ธฐ๋ฐ์ผ๋ก ์ต์ ์ EC2 ์ธ์คํด์ค๋ฅผ ์ ํํ์ฌ ์์ฑํ๋ค.
- ํน์ AZ์์ ์จ๋๋งจ๋ ๋ ธ๋๋ง ์ฌ์ฉํ๊ฑฐ๋ ํน์ ์ธ์คํด์ค ์ ํ๋ง ์ฌ์ฉํ๋๋ก ์ ํํ ์ ์๋ค.
- Controller
- ์ฟ ๋ฒ๋คํฐ์ค์ API ์๋ฒ๋ฅผ ๋ชจ๋ํฐ๋งํ๋ฉฐ, ์๋ก์ด ๋ ธ๋๊ฐ ํ์ํ๊ฑฐ๋ ๊ธฐ์กด ๋ ธ๋๋ฅผ ์ ๋ฆฌํด์ผ ํ๋ ์ํฉ์ ๊ฐ์งํ๋ค.
- NodeClaim์ ์์ฑํ์ฌ EC2 ์ธ์คํด์ค๋ฅผ ์์ฒญํ๊ณ ๋ ธ๋๊ฐ ํ์ ์์ด์ง๋ฉด ์ ๋ฆฌํ๋ค.
- ํ๋์ ์ค์ผ์ค๋ง ์์ฒญ์ ์ค์๊ฐ์ผ๋ก ๊ฐ์งํ์ฌ ์๋ก์ด ๋ ธ๋๋ฅผ ๋์ ์ผ๋ก ์์ฑํ๋ค.
- ์นํ
- API ์๋ฒ์ ํต์ ํ๋ฉด์, Karpenter๊ฐ ์์ฑํ๋ ๋ฆฌ์์ค๋ฅผ ๊ฒ์ฆํ๊ณ ์กฐ์ ํ๋ค.
- MutatingWebhook๊ณผ ValidatingWebhook์ ์ฌ์ฉํ์ฌ NodePool๊ณผ NodeClaim์ ๊ด๋ฆฌํ๋ค.
- ์๋ชป๋ ์์ฒญ์ด ๋ค์ด์ค์ง ์๋๋ก ํํฐ๋งํ๋ ์ญํ ์ ํ๋ค.
์ค์ผ์ผ์ ํ ์คํธ
cat <<EOF | envsubst | kubectl apply -f -
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["c", "m", "r"]
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["2"]
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
expireAfter: 720h # 30 * 24h = 720h
limits:
cpu: 1000
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 1m
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default
spec:
role: "KarpenterNodeRole-${CLUSTER_NAME}" # replace with your cluster name
amiSelectorTerms:
- alias: "al2023@${ALIAS_VERSION}" # ex) al2023@latest
subnetSelectorTerms:
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}" # replace with your cluster name
securityGroupSelectorTerms:
- tags:
karpenter.sh/discovery: "${CLUSTER_NAME}" # replace with your cluster name
EOF
์จ๋๋งจ๋ ์ธ์คํด์ค๋ฅผ ํ์ฉํ์ฌ karpenter ์ค์ต ํ๊ฒฝ์ ๊ตฌ์ถํ๋ค.
kubectl get nodepool,ec2nodeclass,nodeclaims
NAME NODECLASS NODES READY AGE
nodepool.karpenter.sh/default default 0 True 4m3s
NAME READY AGE
ec2nodeclass.karpenter.k8s.aws/default True 4m3s
์ฒ์์๋ 5๊ฐ๊ฐ ํ๋ฉ ์ํ๋ก ๋ผ์์ผ๋
์๊ฐ์ด ์ง๋๋ฉด์ ๋ ธ๋๊ฐ ์ฆ์ค๋๊ณ ํ๋๊ฐ ํ ๋น๋๋ ๊ฒ์ ํ์ธํ ์ ์๋ค.
์์ฑ๋ ์์ปค ๋ ธ๋
kubectl get nodeclaims
NAME TYPE CAPACITY ZONE NODE READY AGE
default-hw4wj c5a.2xlarge on-demand ap-northeast-2b ip-192-168-105-117.ap-northeast-2.compute.internal True 8m44s
karpenter ๋ก๊ทธ ๋ถ์
์นดํํฐ๊ฐ api ์์ฒญ์ ํ๋ ๋ถ๋ถ์ ์ดํด๋ณด๋ฉด lawest-price๋ฅผ ์ ํํด์ ์์ปค ๋ ธ๋๋ฅผ ์์ฑํ๊ฒ ๋๋ค.
{
"level": "INFO",
"time": "2025-03-07T14:43:55.112Z",
"logger": "controller.controller-runtime.metrics",
"message": "Starting metrics server",
"commit": "058c665"
}
{
"level": "INFO",
"time": "2025-03-07T14:43:55.112Z",
"logger": "controller.controller-runtime.metrics",
"message": "Serving metrics server",
"commit": "058c665",
"bindAddress": ":8080",
"secure": false
}
Karpenter ์ปจํธ๋กค๋ฌ๊ฐ ์์๋๋ฉด์ ๋ฉํธ๋ฆญ ์๋ฒ๊ฐ ์์๋๋ค.
{
"level": "INFO",
"time": "2025-03-07T15:40:15.226Z",
"logger": "controller",
"message": "launched nodeclaim",
"commit": "058c665",
"controller": "nodeclaim.lifecycle",
"controllerGroup": "karpenter.sh",
"controllerKind": "NodeClaim",
"NodeClaim": {
"name": "default-hw4wj"
},
"namespace": "",
"name": "default-hw4wj",
"reconcileID": "bab16b84-9695-4e51-8e0c-481e7f9d804d",
"provider-id": "aws:///ap-northeast-2b/i-0335961f7c4185177",
"instance-type": "c5a.2xlarge",
"zone": "ap-northeast-2b",
"capacity-type": "on-demand",
"allocatable": {
"cpu": "7910m",
"ephemeral-storage": "17Gi",
"memory": "14162Mi",
"pods": "58",
"vpc.amazonaws.com/pod-eni": "38"
}
}
Karpenter๊ฐ ์๋ก์ด ๋ ธ๋(default-hw4wj)๋ฅผ ์์ฑํ๋ค.
{
"level": "INFO",
"time": "2025-03-07T15:40:33.750Z",
"logger": "controller",
"message": "registered nodeclaim",
"commit": "058c665",
"controller": "nodeclaim.lifecycle",
"controllerGroup": "karpenter.sh",
"controllerKind": "NodeClaim",
"NodeClaim": {
"name": "default-hw4wj"
},
"namespace": "",
"name": "default-hw4wj",
"reconcileID": "6ff1af57-3e4f-4e67-b22c-957d23f60776",
"provider-id": "aws:///ap-northeast-2b/i-0335961f7c4185177",
"Node": {
"name": "ip-192-168-105-117.ap-northeast-2.compute.internal"
}
}
๋ ธ๋(default-hw4wj)๊ฐ Kubernetes ํด๋ฌ์คํฐ์ ๋ฑ๋ก๋๋ค.
{
"level": "INFO",
"time": "2025-03-07T15:40:43.686Z",
"logger": "controller",
"message": "initialized nodeclaim",
"commit": "058c665",
"controller": "nodeclaim.lifecycle",
"controllerGroup": "karpenter.sh",
"controllerKind": "NodeClaim",
"NodeClaim": {
"name": "default-hw4wj"
},
"namespace": "",
"name": "default-hw4wj",
"reconcileID": "6affb0d5-822c-460d-a82d-bc7c01b21be5",
"provider-id": "aws:///ap-northeast-2b/i-0335961f7c4185177",
"Node": {
"name": "ip-192-168-105-117.ap-northeast-2.compute.internal"
},
"allocatable": {
"cpu": "7910m",
"ephemeral-storage": "18181869946",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"memory": "15140112Ki",
"pods": "58"
}
}
๋ ธ๋๊ฐ ์ ์์ ์ผ๋ก ์ด๊ธฐํ๋๊ณ (initialized nodeclaim), ํ ๋น ๊ฐ๋ฅ ๋ฆฌ์์ค(allocatable)๊ฐ ๋ค์ ํ์๋จ์ผ๋ก์จ ๋ฆฌ์์ค๊ฐ ์ฌ์ฉ ๊ฐ๋ฅํ ์ค๋น ์ํ์ด๋ค.
์ค์ผ์ผ ๋ค์ด
kubectl scale deployment/inflate --replicas 1
๋ ํ๋ฆฌ์นด๋ฅผ 1๋ก ํ์ฌ ์ค์ผ์ผ ๋ค์ด์ ํด๋ณธ๋ค.
์ค์ผ์ผ ๋ค์ด ์ ํ๋ฒ์ ์์ปค ๋ ธ๋ ๊ฐ์ ๋ฟ๋ฝ๋ฝํ๊ณ ์ค์ด๋๋ ๊ฒ์ด ์๋, ์ ์ง์ ์ผ๋ก ์ค์ด๋๋ ๊ฒ์ ํ์ธํ ์ ์๋ค.
{
"level": "INFO",
"time": "2025-03-07T15:57:32.564Z",
"logger": "controller",
"message": "disrupting nodeclaim(s) via replace, terminating 1 nodes (1 pods) ip-192-168-105-117.ap-northeast-2.compute.internal/c5a.2xlarge/on-demand and replacing with on-demand node from types c5a.large, c7i-flex.large, c5.large, c6i.large, c7i.large and 52 other(s)",
"reason": "underutilized"
}
Karpenter๋ ๊ธฐ์กด c5a.2xlarge ๋ ธ๋๊ฐ ์ฌ์ฉ๋ฅ ์ด ๊ฐ์ํ๊ณ ์๋ค(underutilized)๋ผ๊ณ ํ๋จํ๊ณ ํด๋น ๋ ธ๋๋ฅผ ์ข ๋ฃํ๊ณ ๋ ์์ ๋ ธ๋๋ก ์๋ ๋ฆฌ์์ค ์ต์ ํ ๊ธฐ๋ฅ์ ํตํด ๊ต์ฒดํ๋ค.
{
"level": "INFO",
"time": "2025-03-07T15:57:54.489Z",
"logger": "controller",
"message": "registered nodeclaim",
"NodeClaim": {
"name": "default-rzhmm"
},
"provider-id": "aws:///ap-northeast-2b/i-06669f2b43bb3a7ad",
"Node": {
"name": "ip-192-168-2-126.ap-northeast-2.compute.internal"
}
}
๋ ์์ ์ฌ์ด์ฆ์ ์๋ก์ด ๋ ธ๋(default-rzhmm)๊ฐ ํด๋ฌ์คํฐ์ ๋ฑ๋ก๋๋๋ฐ ์๋ก์ด ์ธ์คํด์ค๋ ap-northeast-2b ๊ฐ์ฉ ์์ญ์ ๋ฐฐ์น๋๋ค.
ํด๋ฌ์คํฐ๋ ์ ์ง์ ์ผ๋ก ๋ ์์ ์ธ์คํด์ค๋ฅผ ํ์ฉํ๋๋ก ์กฐ์ ๋๋ค.
{
"level": "INFO",
"time": "2025-03-07T15:58:12.143Z",
"logger": "controller",
"message": "initialized nodeclaim",
"NodeClaim": {
"name": "default-rzhmm"
},
"allocatable": {
"cpu": "1930m",
"memory": "3229360Ki",
"pods": "29"
}
}
๊ธฐ์กด ํฐ ๋ ธ๋ ๋์ ๋ ์์ ๋ ธ๋๊ฐ ์ฌ์ฉ๋๋ฉด์ ์์์ ์ต์ ํํ๋ค.
{
"level": "INFO",
"time": "2025-03-07T15:58:18.998Z",
"logger": "controller",
"message": "tainted node",
"Node": {
"name": "ip-192-168-105-117.ap-northeast-2.compute.internal"
},
"taint.Key": "karpenter.sh/disrupted",
"taint.Effect": "NoSchedule"
}
๊ธฐ์กด ๋ ธ๋ ip-192-168-105-117 ์ NoSchedule ํ์ธํธ๊ฐ ์ ์ฉ๋์ด, ์ด ๋ ธ๋์๋ ์๋ก์ด ํ๋๋ฅผ ๋ฐฐ์นํ ์ ์๋๋กํ๋ค.
{
"level": "INFO",
"time": "2025-03-07T15:59:02.087Z",
"logger": "controller",
"message": "deleted node",
"Node": {
"name": "ip-192-168-105-117.ap-northeast-2.compute.internal"
}
}
ํด๋ฌ์คํฐ์์ ๋ ์ด์ ํ์ ์๋ ๋ ธ๋๋ฅผ ์ ๊ฑฐํ๋ค.
{
"level": "INFO",
"time": "2025-03-07T15:59:02.338Z",
"logger": "controller",
"message": "deleted nodeclaim",
"NodeClaim": {
"name": "default-hw4wj"
}
}
๊ธฐ์กด NodeClaim(default-hw4wj)์ด ์ญ์ ๋๋ค.
์์ปค ๋ ธ๋ ํ์ธ
kubectl get nodeclaims
NAME TYPE CAPACITY ZONE NODE READY AGE
default-rzhmm c5a.large on-demand ap-northeast-2b ip-192-168-2-126.ap-northeast-2.compute.internal True 8m1s
c5a.2xlarge์์ c5a.large๋ก ๊ฐ์ ์๋ฃ๋ ๊ฒ์ ํ์ธํ ์ ์๋ค.
Spot-to-Spot Consolidation ์ค์ต
Karpenter๋ฅผ ์ด์ฉํด AWS EC2 Spot ์ธ์คํด์ค๋ฅผ ์๋์ผ๋ก ๊ด๋ฆฌํ๋ ๋ฐฉ๋ฒ์ ํ ์คํธํด๋ณธ๋ค.
Karpenter node pool, ec2 node class ์์ฑ
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
EC2 nodeclass ์์ฑ
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
name: default
spec:
role: "KarpenterNodeRole-${CLUSTER_NAME}"
amiSelectorTerms:
- alias: "bottlerocket@latest"
ํ ์คํธ ์ํฌ๋ก๋ ๋ฐฐํฌ
apiVersion: apps/v1
kind: Deployment
metadata:
name: inflate
spec:
replicas: 5
selector:
matchLabels:
app: inflate
template:
metadata:
labels:
app: inflate
spec:
containers:
- name: inflate
image: public.ecr.aws/eks-distro/kubernetes/pause:3.7
resources:
requests:
cpu: 1
memory: 1.5Gi
"error": "launching nodeclaim, creating instance, AuthFailure.ServiceLinkedRoleCreationNotPermitted: The provided credentials do not have permission to create the service-linked role for EC2 Spot Instances."
๊ทธ๋ฐ๋ฐ ์ค์ต์ ํ๋ค๊ฐ ์์ ๊ฐ์ด ์๋ฌ๊ฐ ๋ฐ์ํ๋๋ฐ
aws iam create-service-linked-role --aws-service-name spot.amazonaws.com
Karpenter์ IAM Role(KarpenterNodeRole-${CLUSTER_NAME})์ EC2 Spot ๊ด๋ จ ์ญํ ์ ์์ฑํ ์ ์๋ ๊ถํ์ ์ถ๊ฐํด์ฃผ์๋ค.
์ค์ผ์ผ ์
kubectl get nodeclaims
NAME TYPE CAPACITY ZONE NODE READY AGE
default-qwvbv c6g.2xlarge spot ap-northeast-2d ip-192-168-132-105.ap-northeast-2.compute.internal True 4m55s
default-qxbz6 c6g.2xlarge spot ap-northeast-2d ip-192-168-46-137.ap-northeast-2.compute.internal True 59s
์ค์ผ์ผ ๋ค์ด
kubectl get nodeclaims
NAME TYPE CAPACITY ZONE NODE READY AGE
default-qxbz6 c6g.2xlarge spot ap-northeast-2d ip-192-168-46-137.ap-northeast-2.compute.internal True 4m43s
'Infra > AWS' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
[AEWS] #6์ฃผ์ฐจ x.509 ์ธ์ฆ์ ๊ตฌ์กฐ ๋ฐ ๊ฒ์ฆ ์ค์ต (1) (0) | 2025.03.16 |
---|---|
AWS MFA ์ญ์ ํ์ ๋ ์กฐ์น ๋ฐ PC๋ก ์ธ์ฆ ๋ณ๊ฒฝ (0) | 2025.03.10 |
[AEWS] #5์ฃผ์ฐจ KEDA, CAS ์ค์ต (2) (0) | 2025.03.07 |
EKS cloudformation DELETE_IN_PROGRESS ์ํ์ผ ๋ ์กฐ์น (0) | 2025.03.07 |
[AEWS] #5์ฃผ์ฐจ HPA๋ฅผ ํ์ฉํ ์คํ ์ค์ผ์ผ๋ง, ์ปค์คํ ๋งคํธ๋ฆญ ํ์ฉ (1) (0) | 2025.03.07 |