目录

1. 网络插件问题

2. 删除容器时一直处于Terminting状态

3.无法访问k8s.gcr.io问题解决

4.无法访问apiServer以及解决kubernetes客户端秘钥不匹配的问题 

5. K8s 排错指南


1. 网络插件问题

        没有网络插件,k8s的集群就搭不起来,比如pod间的通信、k8s容器间的通信、pod与service间的通信,网络插件就是他们通信的网桥,我们可以使用官方提供的fannel网络插件, 部署网络fannel插件很简单,只需要使用正在run 的k8s 执行以下命令即可:

kubectl apply -f kube-fannel.yaml

 注: 官方建议需要提供cni的version在kube-fannel.yaml里:

"cniVersion": "0.3.1"

---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: psp.flannel.unprivileged
  annotations:
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default
    seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default
    apparmor.security.beta.kubernetes.io/allowedProfileNames: runtime/default
    apparmor.security.beta.kubernetes.io/defaultProfileName: runtime/default
spec:
  privileged: false
  volumes:
    - configMap
    - secret
    - emptyDir
    - hostPath
  allowedHostPaths:
    - pathPrefix: "/etc/cni/net.d"
    - pathPrefix: "/etc/kube-flannel"
    - pathPrefix: "/run/flannel"
  readOnlyRootFilesystem: false
  # Users and groups
  runAsUser:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  fsGroup:
    rule: RunAsAny
  # Privilege Escalation
  allowPrivilegeEscalation: false
  defaultAllowPrivilegeEscalation: false
  # Capabilities
  allowedCapabilities: [ 'NET_ADMIN', 'NET_RAW' ]
  defaultAddCapabilities: [ ]
  requiredDropCapabilities: [ ]
  # Host namespaces
  hostPID: false
  hostIPC: false
  hostNetwork: true
  hostPorts:
    - min: 0
      max: 65535
  # SELinux
  seLinux:
    # SELinux is unused in CaaSP
    rule: 'RunAsAny'
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: flannel
rules:
  - apiGroups: [ 'extensions' ]
    resources: [ 'podsecuritypolicies' ]
    verbs: [ 'use' ]
    resourceNames: [ 'psp.flannel.unprivileged' ]
  - apiGroups:
      - ""
    resources:
      - pods
    verbs:
      - get
  - apiGroups:
      - ""
    resources:
      - nodes
    verbs:
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - nodes/status
    verbs:
      - patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: flannel
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: flannel
subjects:
  - kind: ServiceAccount
    name: flannel
    namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: flannel
  namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "flannel",
          "delegate": {
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        {
          "type": "portmap",
          "capabilities": {
            "portMappings": true
          }
        }
      ]
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  namespace: kube-system
  labels:
    tier: node
    app: flannel
spec:
  selector:
    matchLabels:
      app: flannel
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: kubernetes.io/os
                    operator: In
                    values:
                      - linux
      hostNetwork: true
      priorityClassName: system-node-critical
      tolerations:
        - operator: Exists
          effect: NoSchedule
      serviceAccountName: flannel
      initContainers:
        - name: install-cni-plugin
          #image: flannelcni/flannel-cni-plugin:v1.0.1 for ppc64le and mips64le (dockerhub limitations may apply)
          image: rancher/mirrored-flannelcni-flannel-cni-plugin:v1.0.1
          command:
            - cp
          args:
            - -f
            - /flannel
            - /opt/cni/bin/flannel
          volumeMounts:
            - name: cni-plugin
              mountPath: /opt/cni/bin
        - name: install-cni
          #image: flannelcni/flannel:v0.16.3 for ppc64le and mips64le (dockerhub limitations may apply)
          image: rancher/mirrored-flannelcni-flannel:v0.16.3
          command:
            - cp
          args:
            - -f
            - /etc/kube-flannel/cni-conf.json
            - /etc/cni/net.d/10-flannel.conflist
          volumeMounts:
            - name: cni
              mountPath: /etc/cni/net.d
            - name: flannel-cfg
              mountPath: /etc/kube-flannel/
      containers:
        - name: kube-flannel
          #image: flannelcni/flannel:v0.16.3 for ppc64le and mips64le (dockerhub limitations may apply)
          image: rancher/mirrored-flannelcni-flannel:v0.16.3
          command:
            - /opt/bin/flanneld
          args:
            - --ip-masq
            - --kube-subnet-mgr
          resources:
            requests:
              cpu: "100m"
              memory: "50Mi"
            limits:
              cpu: "100m"
              memory: "50Mi"
          securityContext:
            privileged: false
            capabilities:
              add: [ "NET_ADMIN", "NET_RAW" ]
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          volumeMounts:
            - name: run
              mountPath: /run/flannel
            - name: flannel-cfg
              mountPath: /etc/kube-flannel/
            - name: xtables-lock
              mountPath: /run/xtables.lock
      volumes:
        - name: run
          hostPath:
            path: /run/flannel
        - name: cni-plugin
          hostPath:
            path: /opt/cni/bin
        - name: cni
          hostPath:
            path: /etc/cni/net.d
        - name: flannel-cfg
          configMap:
            name: kube-flannel-cfg
        - name: xtables-lock
          hostPath:
            path: /run/xtables.lock
            type: FileOrCreate

2. 删除容器时一直处于Terminting状态

我们可以使用kubectl delete -f .yaml执行命令

kubectl delete -f redis-master.yaml

k8s grafana 不显示cpu k8s insufficient cpu_Group

卡住后,这个时候不要去重启电脑,再启一个终端执行命令systemctl restart kubelet 即可。

systemctl restart kubelet

执行完毕后,等待一会就会发现pod和service相关的服务都删掉了。

k8s grafana 不显示cpu k8s insufficient cpu_k8s grafana 不显示cpu_02

3.无法访问k8s.gcr.io问题解决

        k8s.gcr.io是谷歌提供的k8s镜像云,不能访问外网的小伙伴就痛苦了~, 我们可以使用国内阿里云镜像registry.aliyuncs.com/google_containers

kubeadm init \
--apiserver-advertise-address=192.168.232.129 \
--image-repository registry.aliyuncs.com/google_containers \
--kubernetes-version v1.18.6 \
--service-cidr=10.96.0.0/12 \
--pod-network-cidr=10.244.0.0/16

4.无法访问apiServer以及解决kubernetes客户端秘钥不匹配的问题 

        当我在用kubeadm初始化集群的时候, 填的apiServer地址填错了,想修改,然后我就想法设法的找到apiServer这个config配置在系统的哪个位置。

        最终找到了位置, 可使用 cat $HOME/.kube/config  查看ip地址:

k8s grafana 不显示cpu k8s insufficient cpu_Group_03

         修改后,清楚掉所有kubectl 所有进程后以及etcd数据库后,重新使用Kubeadm初始化集群,报错证书错误的相关问题,此方法不能彻底解决。

        我们只需要将 $HOME/.kube 目录里的内容删掉,包含etcd以及/etc/kubernetes目录下的所有内容清掉后,重新执行kubeadmin init 命令即可,执行成功后,执行如下命令:

[root@localhost ~]# rm -rf /etc/kubernetes
[root@localhost ~]# rm -rf /var/lib/etcd
[root@localhost ~]# kubeadm reset
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0316 17:20:01.413042   36498 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please, manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
W0316 17:20:01.619021   36498 cleanupnode.go:81] [reset] Failed to remove containers: exit status 1
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]

The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.

k8s grafana 不显示cpu k8s insufficient cpu_docker_04

提示删除 .kube目录:

[root@localhost ~]# rm -rf .kube/
[root@localhost ~]#

等使用kubeadm初始化完毕后,重新创建.kube目录: 

mkdir -p $HOME/.kube

sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

sudo chown $(id -u):$(id -g) $HOME/.kube/config

        上述问题即可解决,因为重新创建集群时,.kube的存在会让kubernetes客户端的证书生成有误。 

5. K8s 排错指南

        如果pod没有启动成功,可以使用如下命令查看pod的状况。

kubectl describe pod  <pod_name>

kubectl logs <pod_name>

k8s grafana 不显示cpu k8s insufficient cpu_容器_05

         根据提示信息,检查yaml是否编译通过,相关配置是否正确。