一、tolerationSeconds实验
### --- 为容器打NoExecute,只能在上面停留60s;验证60s之后是否会被驱逐掉
~~~ 为demo-nginx容器打NoExecute,只能在上面停留60s;60s之后还是会被驱逐的
[root@k8s-master01 ~]# kubectl edit deploy demo-nginx
tolerations:
- effect: NoSchedule
key: master-test
operator: Equal
value: test
- effect: NoExecute // 加入一行NoExecute参数,
key: master-test
operator: Equal
value: test
tolerationSeconds: 60 // 满足这个污点之后,只需要在上面停留60s;
[root@k8s-master01 ~]# kubectl get po -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
demo-nginx-544b7d8754-6lzg5 2/2 Running 0 14s 172.25.244.233 k8s-master01 <none> <none>
demo-nginx-544b7d8754-mswqx 2/2 Running 0 14s 172.25.244.232 k8s-master01 <none> <none>
demo-nginx-6fddc76f8d-6n88v 0/2 Terminating 0 23m <none> k8s-master01 <none> <none>
demo-nginx-6fddc76f8d-jd65q 0/2 Terminating 0 23m <none> k8s-master01 <none> <none>
[root@k8s-master01 ~]# kubectl taint node k8s-master01 master-test=test:NoExecute
node/k8s-master01 tainted
~~~ # 已经被驱逐到了,但是因为有NoSchedule,又部署在上面了
[root@k8s-master01 ~]# kubectl get po -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
demo-nginx-544b7d8754-4sk8n 0/2 ContainerCreating 0 6s <none> k8s-master01 <none> <none>
demo-nginx-544b7d8754-6lzg5 0/2 Terminating 0 114s 172.25.244.233 k8s-master01 <none> <none>
demo-nginx-544b7d8754-ftldc 0/2 ContainerCreating 0 7s <none> k8s-master01 <none> <none>
demo-nginx-544b7d8754-mswqx 0/2 Terminating 0 114s <none> k8s-master01 <none> <none>
### --- 把NoSchedule参数去掉,让pod漂移去其它节点
~~~ 注:可以看到k8s-master01节点全部被驱逐掉
~~~ 漂移到k8s-master02和k8s-node02节点上面。
[root@k8s-master01 ~]# kubectl edit deploy demo-nginx
nodeSelector: // 删除这2个参数
kubernetes.io/hostname: k8s-master01
[root@k8s-master01 ~]# kubectl get po -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
demo-nginx-544b7d8754-kjj24 0/2 Terminating 0 57s <none> k8s-master01 <none> <none>
demo-nginx-544b7d8754-qmpmw 0/2 Terminating 0 56s <none> k8s-master01 <none> <none>
demo-nginx-85fb8dfb75-2wvdz 2/2 Running 0 30s 172.25.92.84 k8s-master02 <none> <none>
demo-nginx-85fb8dfb75-wr9gd 2/2 Running 0 29s 172.27.14.205 k8s-node02 <none> <none>
二、为k8s-node02节点打上一个污点;让漂移到其它节点
### --- 把master01上面的污点 全部去掉
[root@k8s-master01 ~]# kubectl taint node k8s-master01 master-test-
node/k8s-master01 untainted
[root@k8s-master01 ~]# kubectl describe node k8s-master01
Taints: <none>
### --- 添加NoExecute参数
[root@k8s-master01 ~]# kubectl taint node k8s-node02 master-test=test:NoExecute
node/k8s-node02 tainted
### --- 验证k8s-node02上的容器是否被驱逐,
~~~ 注:可以看到没有被驱逐,等待60s之后验证是否被驱逐
[root@k8s-master01 ~]# kubectl get po -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
busybox 1/1 Running 0 28m 172.17.125.20 k8s-node01 <none> <none>
demo-nginx-85fb8dfb75-2wvdz 2/2 Running 0 33m 172.25.92.84 k8s-master02 <none> <none>
demo-nginx-85fb8dfb75-wr9gd 2/2 Running 0 33m 172.27.14.205 k8s-node02 <none> <none>
~~~ # 可以看到容器没有被立即驱逐;等待60s之后,验证
~~~ 可以看到容器被从k8s-node02驱逐,漂移到了k8s-master01节点上
[root@k8s-master01 ~]# kubectl get po -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
busybox 1/1 Running 0 29m 172.17.125.20 k8s-node01 <none> <none>
demo-nginx-85fb8dfb75-22pnm 0/2 ContainerCreating 0 2s <none> k8s-master01 <none> <none>
demo-nginx-85fb8dfb75-2wvdz 2/2 Running 0 34m 172.25.92.84 k8s-master02 <none> <none>
demo-nginx-85fb8dfb75-wr9gd 2/2 Terminating 0 34m 172.27.14.205 k8s-node02 <none> <none>
三、准入控制
### --- 查看准入控制
~~~ 每个Pod部署完之后会自动的添加2个Tolerations
~~~ 若是你的节点变为不正常状态;k8s-master会把这个节点打一个污点,
~~~ 一个unreachable的污点,
[root@k8s-master01 ~]# kubectl describe po demo-nginx-85fb8dfb75-22pnm
Tolerations: master-test=test:NoSchedule
master-test=test:NoExecute for 60s
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
### --- 查看不正常状态的节点
~~~ 当这个k8s-node01节点不正常之后,
~~~ k8s-master会把这个节点打一个污点:unreachable,effect是:NoExecute和NoSchedule。
~~~ NoExecute:不让它调度,NoSchedule:驱逐;
[root@k8s-master01 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master01 Ready <none> 19d v1.20.0
k8s-master02 Ready <none> 19d v1.20.0
k8s-master03 Ready <none> 19d v1.20.0
k8s-node01 NotReady <none> 19d v1.20.0
k8s-node02 Ready <none> 19d v1.20.0
[root@k8s-master01 ~]# kubectl describe node k8s-node01
Tolerations: node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unreachable:NoSchedule
### --- 容器部署完之后容忍
[root@k8s-master01 ~]# kubectl describe po demo-nginx-85fb8dfb75-22pnm
Tolerations: master-test=test:NoSchedule
master-test=test:NoExecute for 60s
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
### --- 正常情况下:这个容器部署完之后,k8s-master会给他加2个容忍,
~~~ 可以在这个节点加2个容忍。
~~~ 可以在节点不正常的状态下,可以容忍300s,
~~~ 为了防止因为网络波动而导致的节点的状态的不正常或者不稳定,防止节点五杀这个容器。
~~~ # 就是当时网络是正常的服务也是正常的,
~~~ 就是应为网络波动而没有及时的把自己的节点信息报告给master节点,
~~~ master节点就会把它标记not-ready。
~~~ # 这个是因为网络波动而没有上报状态,而标记为notready;
~~~ 若是立刻标记为not-ready,立刻就打了一个污点NoExecute,
~~~ 就会立刻驱逐走。所以有可能你刚把节点驱逐都,网络就恢复正常。
~~~ 所以说等待你,当你在300s时间内有没有恢复你的节点状态,停留300s之后再去驱逐;
~~~ k8s-master节点来驱逐的。node节点是没有这个功能的。
### --- 若是你的节点可用域非常高的话,可以把这个容忍值设置的非常小,
~~~ # 比如30s的时间网络波动;若是30s之内没有恢复,就立刻漂移走;根据自己的需求来定义
~~~ 不能设置太低或太高;太低可能会早造成误判;太高可能会造成节点的数量不足。
~~~ 所以按需设置;一般设置为10~60s之内。不会设置太长。
四、node.kubernets.io/not-ready污点
### --- 新加节点的时候,污点不为true
~~~ node.kubernets.io/not-ready污点
### --- 此外,kubernetes 1.6引入了对节点问题的展示.也就是说当满足了特定条件,
~~~ 节点控制器会自动为符合条件的节点添加taint,以下是一些内置的taint
~~~ node.kubernetes.io/not-ready // 节点还没有准备好,对应节点状态Ready值为false
~~~ node.kubernetes.io/unreachable // 节点控制器无法触及节点,对应节点状态ready值为Unknown
~~~ node.kubernetes.io/out-of-disk // 磁盘空间不足
~~~ node.kubernetes.io/memory-pressure // 节点存在内存压力
~~~ node.kubernetes.io/disk-pressure // 节点磁盘存在压力
~~~ node.kubernetes.io/network-unavailable // 节点网络不可用
~~~ node.kubernetes.io/unschedulable // 节点不可被调度
~~~ node.cloudprovider.kubernetes.io/uninitialize
Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart
——W.S.Landor