- pod创建的过程
- 资源限制(cpu,memory)
- nodeSelector
- nodeAffinity
- 污点和污点容忍
- 指定调度节点
@pod创建的过程
1、kubectl run(创建一个pod,请求发送给)-> apiserver(将数据存储到) -> etcd
2、scheduler(将创建的pod根据调度算法选择一个合适的节点并标记,返回给) -> apiserver -> etcd
3、kubelet(发现有新的pod分配,调用docker api创建容器,将容器状态返回给) -> apiserver -> etcd
Kubernetes基于list-watch机制的控制器架构
其它组件监控自己负责的资源,当这些资源发生变化时,kube apiserver会通知这些组件
Etcd存储集群的数据信息,apiserver作为统一入口,任何对数据的操作都必须经过apiserver。
客户端(kubelet/scheduler/controller-manager)通过list-watch监听apiserver中资源(pod/rs/rc等等)的create,update和delete事件,并针对事件类型调用相应的事件处理函数。
list-watch传送门 -> 理解 K8S 的设计精髓之 List-Watch机制和Informer模块 https://zhuanlan.zhihu.com/p/59660536
@资源限制(cpu,memory)
limitpod.yaml内容如下,
apiVersion: v1
kind: Pod
metadata:
name: testpod
spec:
containers:
- image: nginx
name: testpod
resources:
requests:
memory: 100Mi
cpu: 400m
limits:
memory: 128Mi
cpu: 500m
说明:
containers.resources.limits 容器使用的最大资源上限
containers.resources.requests 容器资源预留值、不是实际占用 -> 用于资源分配参考、判断节点可否容纳
当请求的值没有节点能够满足时,pod处于pendding
cpu的1核=1000m
kubectl describe pod <pod name>可以查看资源limit和request
看看pod在哪个节点,当前pod在k8s-node1
[root@k8s-master ~]# kubectl get pod testpod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
testpod 1/1 Running 0 19m 10.244.36.66 k8s-node1 <none> <none>
[root@k8s-master ~]#
然后用kubectl describe node <node name>看节点的资源
多创建几个pod,占满节点资源,当resources.requests值没有节点能够满足时,pod处于Pending
kubectl describe pod 查看原因是资源不足
@nodeSelector
nodeSelector:用于将Pod调度到匹配标签的节点上,如果没有匹配的标签会调度失败。
应用场景举例,
专用节点:根据业务线将节点分组管理
配备特殊硬件:部分节点配有固态硬盘(比机械硬盘读写性能好)
节点添加、查看、删除标签命令,
给节点添加标签 kubectl label node <node-name> <label-key>=<label-value> (可以没有value)
查看节点标签 kubectl get node --show-labels
去除节点标签
节点上默认会打一些标签
[root@k8s-master ~]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8s-master Ready control-plane,master 23d v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=
k8s-node1 Ready <none> 23d v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node1,kubernetes.io/os=linux
k8s-node2 Ready <none> 23d v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node2,kubernetes.io/os=linux
[root@k8s-master ~]#
为k8s-node1添加一个标签ssd
[root@k8s-master ~]# kubectl label node k8s-node1 disktype=ssd
node/k8s-node1 labeled
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8s-master Ready control-plane,master 23d v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=
k8s-node1 Ready <none> 23d v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node1,kubernetes.io/os=linux
k8s-node2 Ready <none> 23d v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node2,kubernetes.io/os=linux
[root@k8s-master ~]#
生成一个pod的yaml文件,kubectl run testpod --image=nginx --dry-run=client -o yaml > testpod.yaml,编辑testpod.yaml的内容如下
apiVersion: v1
kind: Pod
metadata:
name: testpod
spec:
nodeSelector:
disktype: ssd
containers:
- image: nginx
name: testpod
创建deployment,kubectl apply -f testpod.yaml,查看pod所在位置有无匹配到有ssd的k8s-node1
[root@k8s-master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
testpod 1/1 Running 0 24s 10.244.36.112 k8s-node1 <none> <none>
[root@k8s-master ~]#
去除标签 kubectl label node k8s-node1 disktype-
[root@k8s-master ~]# kubectl get node --show-labels | grep ssd
k8s-node1 Ready <none> 23d v1.20.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,disktype=ssd,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-node1,kubernetes.io/os=linux
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl label node k8s-node1 disktype-
node/k8s-node1 labeled
[root@k8s-master ~]# kubectl get node --show-labels | grep ssd
[root@k8s-master ~]#
@nodeAffinity
节点亲和(nodeAffinity) 可以根据节点上的标签来约束Pod可以调度到哪些节点
相比nodeSelector,
1) 支持的操作符有:In,NotIn,Exists,DoesNotExist,Gt,Lt
可以使用 NotIn 和 DoesNotExist 来实现节点反亲和性行为,或者使用节点污点将 Pod 从特定节点中驱逐。
2) 调度分为软策略和硬策略,而不是硬性要求
- 硬策略(required)必须满足
- 软策略(preferred)尝试满足
【例】创建一个Pod,节点亲和性有一个硬策略和一个软策略
当前节点没有新加额外的标签,所以预期Pod创建不成功,node-affinity.yaml内容如下
apiVersion: v1
kind: Pod
metadata:
name: node-affinity
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: project
operator: In
values:
- makePeopleHappy
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: group
operator: In
values:
- test1
containers:
- name: web
image: nginx
说明:
硬策略 spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution
软策略 spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution, weight 字段的取值范围是 1-100
注意:
- 同时指定nodeSelector和nodeAffinity,两者必须都满足,才能将Pod调度到候选节点上。
- 指定多个与 nodeAffinity 类型关联的 nodeSelectorTerms,其中一个nodeSelectorTerms满足,pod就可以调度到节点上。
- 指定多个与 nodeSelectorTerms 关联的 matchExpressions,当所有matchExpressions满足,pod才可以调度到节点上。
- 修改或删除 pod所调度到的节点的标签,pod不会被删除。即亲和性选择只在Pod调度期间有效。
硬策略说明:
创建Pod后一直处于Pending状态
[root@k8s-master ~]# kubectl apply -f node-affinity.yaml
pod/node-affinity created
[root@k8s-master ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
node-affinity 0/1 Pending 0 6s
[root@k8s-master ~]#
使用 kubectl describe pod <node name> 查看三个节点,一个有污点,另外两个不符合节点亲和
[root@k8s-master ~]# kubectl describe pod node-affinity
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 80s default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
Warning FailedScheduling 80s default-scheduler 0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 2 node(s) didn't match Pod's node affinity.
[root@k8s-master ~]#
现在给k8s-node2添加一个符合刚才创建Pod应策略的标签(预期预期Pod会被分配到k8s-node2)
然后看看刚才Pod的状态由Pending变为了ContainerCreating
[root@k8s-master ~]# kubectl label node k8s-node2 project=makePeopleHappy
node/k8s-node2 labeled
[root@k8s-master ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
node-affinity 0/1 ContainerCreating 0 5m3s
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
node-affinity 1/1 Running 0 6m
[root@k8s-master ~]#
软策略的标签没有符合的节点,但不是必须满足的,而是尽量满足
软策略说明:
现在把刚才的pod删除,k8s-node1和k8s-node2都打上应策略匹配的标签,k8s-node1打上软策略需要的标签
然后创建pod(预期预期Pod会被分配到k8s-node1)
[root@k8s-master ~]# kubectl apply -f node-affinity.yaml
pod/node-affinity created
[root@k8s-master ~]# kubectl get pod node-affinity -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
node-affinity 0/1 ContainerCreating 0 16s <none> k8s-node1 <none> <none>
[root@k8s-master ~]#
去除标签恢复环境:
[root@k8s-master ~]# kubectl label node k8s-node1 project-
node/k8s-node1 labeled
[root@k8s-master ~]# kubectl label node k8s-node2 project-
node/k8s-node2 labeled
[root@k8s-master ~]# kubectl label node k8s-node1 group-
node/k8s-node1 labeled
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl get nodes --show-labels | egrep 'project|group'
[root@k8s-master ~]#
@污点和污点容忍
污点 taints:避免Pod调度到特定Node上
污点容忍 tolerations:允许Pod调度到持有Taints的Node上
应用场景
- 专用节点
如果想将某些节点专门分配给特定的一组用户使用,可以给这些节点添加一个taint,然后给这组用户的Pod添加一个相对应的toleration
- 配备了特殊硬件的节点
在部分节点配备了特殊硬件(比如 GPU)的集群中,我们希望不需要这类硬件的Pod不要被分配到这些特殊节点,以便为后继需要这类硬件的Pod保留资源
- 基于污点的驱逐
这是在每个 Pod 中配置的在节点出现问题时的驱逐行为
1.污点
给节点添加污点 kubectl taint node <node name> <key>=<value>:<effect>
例如:kubectl taint node k8s-node1 gpu=yes:NoSchedule
查看节点的污点 kubectl describe node <node name> |grep Taint
effect取值,
NoSchedule :不会将 Pod 调度到该节点
PreferNoSchedule:尽量避免将Pod调度到该节点上,但不是强制的
NoExecute:不会调度,并且驱逐Node上已有的Pod
给节点去除污点 kubectl taint node <node name> <key>:<effect>-
[root@k8s-master ~]# kubectl taint node k8s-node1 gpu=yes:NoSchedule
node/k8s-node1 tainted
[root@k8s-master ~]# kubectl describe node k8s-node1 | grep Taint
Taints: gpu=yes:NoSchedule
[root@k8s-master ~]#
然后通过deployment创建pod试试, test-taint-deploy.yaml内容如下
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: web
name: web
namespace: default
spec:
replicas: 4
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- image: nginx
name: nginx
可以看到4个副本都避开了打了污点的k8s-node1
[root@k8s-master ~]# kubectl apply -f test-taint-deploy.yaml
deployment.apps/web created
[root@k8s-master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
web-96d5df5c8-gmspn 0/1 ContainerCreating 0 10s <none> k8s-node2 <none> <none>
web-96d5df5c8-hh86q 0/1 ContainerCreating 0 10s <none> k8s-node2 <none> <none>
web-96d5df5c8-n4587 0/1 ContainerCreating 0 10s <none> k8s-node2 <none> <none>
web-96d5df5c8-wrmxv 0/1 ContainerCreating 0 10s <none> k8s-node2 <none> <none>
[root@k8s-master ~]#
2.污点容忍
如果希望Pod可以被分配到带有污点的节点上,要在Pod配置中添加污点容忍(tolrations)字段
一个容忍度和一个污点相“匹配”是指它们有一样的键名和效果,并且
如果 operator是 Exists, 此时容忍度不能指定 value
如果 operator是 Equal, 则它们的value应该相等
如果 key为空, 且operator为 Exists, 表示这个容忍度能容忍任意 taint
如果 effect为空,则可以与所有指定键名的效果相匹配
为test-taint-deploy.yaml添加tolerations,更新后内容如下,
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: web
name: web
namespace: default
spec:
replicas: 4
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- image: nginx
name: nginx
tolerations:
- key: gpu
operator: Equal
value: "yes"
effect: NoSchedule
说明: tolerations.value的值为yes,得加上引号,不然会报这么个错。之前的yaml没太注意,最好参考官方文档,该加引号的值都加上引号。
创建deployment, 这次不会避开打了污点的k8s-node1
[root@k8s-master ~]# kubectl delete -f test-taint-deploy.yaml
deployment.apps "web" deleted
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl apply -f test-taint-deploy.yaml
deployment.apps/web created
[root@k8s-master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
web-6854887cb-68sgr 1/1 Running 0 41s 10.244.36.74 k8s-node1 <none> <none>
web-6854887cb-8627z 1/1 Running 0 41s 10.244.36.75 k8s-node1 <none> <none>
web-6854887cb-c74f4 1/1 Running 0 41s 10.244.36.80 k8s-node1 <none> <none>
web-6854887cb-hkbm2 1/1 Running 0 41s 10.244.169.145 k8s-node2 <none> <none>
[root@k8s-master ~]#
@指定调度节点
不经过调度器,因此以上限制统统失效
test-fixednode-deploy.yaml内容如下, 指定调度到 nodeName: k8s-node1, 而k8s-node1有污点且该yaml文件没有污点容忍字段
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: web
name: web
namespace: default
spec:
replicas: 4
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
nodeName: k8s-node1
containers:
- image: nginx
name: nginx
可以看到所有pod副本都调度到了指定节点,即使该节点存在污点且pod未设置污点容忍
[root@k8s-master ~]# kubectl apply -f test-fixednode-deploy.yaml
deployment.apps/web created
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
web-57495dbb4b-fsn2j 1/1 Running 0 65s 10.244.36.86 k8s-node1 <none> <none>
web-57495dbb4b-kjkx6 1/1 Running 0 65s 10.244.36.83 k8s-node1 <none> <none>
web-57495dbb4b-lmntn 1/1 Running 0 65s 10.244.36.85 k8s-node1 <none> <none>
web-57495dbb4b-tx96l 1/1 Running 0 65s 10.244.36.87 k8s-node1 <none> <none>
[root@k8s-master ~]#
官方文档参考
将Pod分配给节点-> nodeSelector,pod亲和性 https://kubernetes.io/zh/docs/concepts/scheduling-eviction/assign-pod-node/
污点和容忍度 https://kubernetes.io/zh/docs/concepts/scheduling-eviction/taint-and-toleration/