上一节的例子中,我们用Init容器去检测service是否能被成功解析,这似乎不是一个好主意。因为Init容器能成功检测并不代表后面的业务容器也能,所以最好的情况就是有专门的模块不停去检测,这个模块就是我们这一节要学习到的探针。



文章目录

  • 探针
  • 探针实例操作
  • readinessProbe操作
  • livenessProbe操作
  • 两种方式混合
  • 总结


探针

探针(Probe)是由kubelet对容器进行的定期诊断。注意,探针的目标是容器而不是pod

探针可以执行如下三种操作:

  • ExecAction
    在容器内执行指定命令,如果命令退出时返回码为0,则认为成功
  • TCPSocketAction
    对指定容器的IP+端口进行TCP检测,类似于telnet操作,如果端口为开,则认为成功
  • HTTPGetAction
    对指定的容器IP地址进行GET请求,如果HTTP响应码大于等于200且小于400,则认为成功

每次探测都会返回如下三种结果之一:

  • 容器通过探测
  • 容器未通过探测
  • 未知,即探测本身失败

然后根据探测方式又可以分为如下两类

  • readinessProbe

就绪探针,指示容器是否准备好服务请求。如果失败,该pod将会从所有匹配的service中被移除。如果容器不提供就绪探针,则默认为Success。对应容器的ready状态

  • livenessProbe

存活探针,指示容器是否正在运行。如果失败,则kubelet会杀死容器,并根据restartPolicy决定是否重启容器。如果容器不提供存活探针,则默认为Succes。对应pod的running状态以及restart次数

探针实例操作

下面分别针对两种探针做一下实例操作

下面所有操作的源码都被托管在github:
https://github.com/Victor2Code/centos-k8s-init/tree/master/test%20yaml/Pod%20Lifecycle%20-%20Probe

readinessProbe操作

通过yaml文件test-readiness-httpget.yaml来创建一个带就绪探针的pod

apiVersion: v1
kind: Pod
metadata:
  name: test-readiness-httpget
  namespace: default
  labels:
    app: myapp
    version: v1
spec:
  containers:
    - name: mynginx
      image: nginx
      imagePullPolicy: IfNotPresent
      readinessProbe:
        httpGet:
          port: 80
          path: /fake_index.html
        initialDelaySeconds: 1
        periodSeconds: 3

可以看到主容器使用nginx镜像,如果访问pod_ip:80/fake_index.html的返回码不在200到400之间则认为容器还没有就绪

创建容器,查看状态

[root@k8s-master k8s-test]# kubectl apply -f test-readiness-httpget.yaml
pod/test-readiness-httpget created
[root@k8s-master k8s-test]# kubectl get pod -o wide
NAME                     READY   STATUS    RESTARTS   AGE    IP            NODE        NOMINATED NODE   READINESS GATES
curl-6bf6db5c4f-kljp4    1/1     Running   1          2d2h   10.244.1.2    k8s-node1   <none>           <none>
hellok8s                 2/2     Running   0          31h    10.244.1.6    k8s-node1   <none>           <none>
test-init-main           1/1     Running   6          9h     10.244.1.8    k8s-node1   <none>           <none>
test-readiness-httpget   0/1     Running   0          6s     10.244.1.10   k8s-node1   <none>           <none>

可以看到pod状态虽然是running,但是其中的容器却还没有显示ready状态。进去看一下pod的详细信息

[root@k8s-master k8s-test]# kubectl describe pod test-readiness-httpget
Name:         test-readiness-httpget
Namespace:    default
Priority:     0
Node:         k8s-node1/172.29.56.176
Start Time:   Thu, 30 Apr 2020 18:48:41 +0800
Labels:       app=myapp
              version=v1
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"labels":{"app":"myapp","version":"v1"},"name":"test-readiness-httpget","name...
Status:       Running
IP:           10.244.1.10
Containers:
  mynginx:
    Container ID:   docker://272b07aca3c7b9900f6fe91d7418c03315d5f079553c4108f20cd36aadc48f65
    Image:          nginx
    Image ID:       docker-pullable://nginx@sha256:86ae264c3f4acb99b2dee4d0098c40cb8c46dcf9e1148f05d3a51c4df6758c12
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Thu, 30 Apr 2020 18:48:42 +0800
    Ready:          False
    Restart Count:  0
    Readiness:      http-get http://:80/fake_index.html delay=1s timeout=1s period=3s #success=1 #failure=3
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-hln8x (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  default-token-hln8x:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-hln8x
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                             From                Message
  ----     ------     ----                            ----                -------
  Normal   Scheduled  2m33s                           default-scheduler   Successfully assigned default/test-readiness-httpget to k8s-node1
  Normal   Pulled     <invalid>                       kubelet, k8s-node1  Container image "nginx" already present on machine
  Normal   Created    <invalid>                       kubelet, k8s-node1  Created container mynginx
  Normal   Started    <invalid>                       kubelet, k8s-node1  Started container mynginx
  Warning  Unhealthy  <invalid> (x22 over <invalid>)  kubelet, k8s-node1  Readiness probe failed: HTTP probe failed with statuscode: 404

在最下面的Events中可以看到,因为就绪探针返回了404,所以失败,进而导致容器不能处于ready状态。

这个时候进入pod中的容器,人为添加一个目标文件

[root@k8s-master k8s-test]# kubectl exec test-readiness-httpget -it -- /bin/bash
root@test-readiness-httpget:/# echo "hello fake index" > /usr/share/nginx/html/fake_index.html
root@test-readiness-httpget:/# exit
exit

进入容器的命令为kubectl exec <pod_name> -c <container_name> -it -- <command>,如果pod里面只有一个容器的话可以省略容器名。这里就是运行容器内的bash,并创建了一个目标文件。

再次查看发现容器已经就绪,并且也可以获取到刚才添加的文件内容

[root@k8s-master k8s-test]# kubectl get pod -o wide
NAME                     READY   STATUS    RESTARTS   AGE     IP            NODE        NOMINATED NODE   READINESS GATES
curl-6bf6db5c4f-kljp4    1/1     Running   1          2d2h    10.244.1.2    k8s-node1   <none>           <none>
hellok8s                 2/2     Running   0          31h     10.244.1.6    k8s-node1   <none>           <none>
test-init-main           1/1     Running   6          9h      10.244.1.8    k8s-node1   <none>           <none>
test-readiness-httpget   1/1     Running   0          6m18s   10.244.1.10   k8s-node1   <none>           <none>
[root@k8s-master k8s-test]# curl 10.244.1.10/fake_index.html
hello fake index

livenessProbe操作

通过yaml文件test-liveness-exec.yaml来创建一个带存活探针的pod

apiVersion: v1
kind: Pod
metadata:
  name: test-liveness-exec
  namespace: default
  labels:
    app: myapp
    version: v1
spec:
  containers:
    - name: mynginx
      image: nginx
      imagePullPolicy: IfNotPresent
      command: ['sh','-c','touch /tmp/test; sleep 20; rm -rf /tmp/test']
      livenessProbe:
        exec:
          command: ['test','-e','/tmp/test']
        initialDelaySeconds: 1
        periodSeconds: 3

这里还是使用的nginx镜像,不同的是这里是通过在容器内执行命令的exec方式做为探针。容器启动后会创建文件/tmp/test文件,20秒钟之后删除。而存活探针就是通过检测这个文件是否存在而决定是否探测成功。

启动pod,并用-w参数去持续观察pod的状态

[root@k8s-master k8s-test]# kubectl apply -f test-liveness-exec.yaml
pod/test-liveness-exec created
[root@k8s-master k8s-test]# kubectl get pod -o wide -w
NAME                    READY   STATUS    RESTARTS   AGE    IP            NODE        NOMINATED NODE   READINESS GATES
curl-6bf6db5c4f-kljp4   1/1     Running   1          2d3h   10.244.1.2    k8s-node1   <none>           <none>
hellok8s                2/2     Running   0          32h    10.244.1.6    k8s-node1   <none>           <none>
test-init-main          1/1     Running   7          10h    10.244.1.8    k8s-node1   <none>           <none>
test-liveness-exec      1/1     Running   0          7s     10.244.1.11   k8s-node1   <none>           <none>
test-liveness-exec      0/1     Completed   0          22s    10.244.1.11   k8s-node1   <none>           <none>
test-liveness-exec      1/1     Running     1          23s    10.244.1.11   k8s-node1   <none>           <none>

发现在第22秒的时候容器因为文件被删除而被认为不能存活然后被踢出并重启,重启以后新的/tmp/test文件又被创建所以又开始重复上面的重启步骤。

httpget的方式在上面就绪探针的操作中演示过了这里就不重复了,下面再看看最后一种tcpsocket方式的探测方式。

通过yaml文件test-liveness-tcpsocket.yaml创建一个带存活探针的pod

apiVersion: v1
kind: Pod
metadata:
  name: test-liveness-tcpsocket
  namespace: default
  labels:
    app: myapp
    version: v1
spec:
  containers:
    - name: mynginx
      image: nginx
      imagePullPolicy: IfNotPresent
      livenessProbe:
        tcpSocket:
          port: 8080
        initialDelaySeconds: 1
        periodSeconds: 3
        timeoutSeconds: 5

这里会去检测容器的8080端口是不是通的,如果超过3秒还是没有反馈则表示没有通,探测失败。

启动容器,同样用-w去持续观察

[root@k8s-master k8s-test]# kubectl apply -f test-liveness-tcpsocket.yaml
pod/test-liveness-tcpsocket created
[root@k8s-master k8s-test]# kubectl get pod -o wide -w
NAME                      READY   STATUS    RESTARTS   AGE    IP            NODE        NOMINATED NODE   READINESS GATES
curl-6bf6db5c4f-kljp4     1/1     Running   1          2d3h   10.244.1.2    k8s-node1   <none>           <none>
hellok8s                  2/2     Running   0          32h    10.244.1.6    k8s-node1   <none>           <none>
test-init-main            1/1     Running   8          10h    10.244.1.8    k8s-node1   <none>           <none>
test-liveness-tcpsocket   1/1     Running   1          13s    10.244.1.12   k8s-node1   <none>           <none>
test-liveness-tcpsocket   1/1     Running   2          21s    10.244.1.12   k8s-node1   <none>           <none>
test-liveness-tcpsocket   0/1     CrashLoopBackOff   2          30s    10.244.1.12   k8s-node1   <none>           <none>
test-liveness-tcpsocket   1/1     Running            3          42s    10.244.1.12   k8s-node1   <none>           <none>

发现pod隔大约8秒重启一次,到第三次触发CrashLoopBackOff门限

两种方式混合

当然也可以将readinessProbe和livenessProbe都配置在容器中彼此独立但共同起作用,这里就不额外演示了,示例yaml文件如下

apiVersion: v1
kind: Pod
metadata:
  name: test-readiness-liveness
  namespace: default
  labels:
    app: myapp
    version: v1
spec:
  containers:
    - name: mynginx
      image: nginx
      imagePullPolicy: IfNotPresent
      readinessProbe:
        httpGet:
          port: 80
          path: /fake_index.html
        initialDelaySeconds: 1
        periodSeconds: 3
      livenessProbe:
        tcpSocket:
          port: 8080
        initialDelaySeconds: 1
        periodSeconds: 3
        timeoutSeconds: 5

这里通过httpGet方式来决定容器是否为ready状态,而通过检查8080端口是否为通的来决定容器是否要重启。

总结

学习完了Init容器和探针,容器的生命周期就只剩下启动和退出动作了,我们下一节来一起学习。