目录
1. 网络插件问题
2. 删除容器时一直处于Terminting状态
3.无法访问k8s.gcr.io问题解决
4.无法访问apiServer以及解决kubernetes客户端秘钥不匹配的问题
5. K8s 排错指南
1. 网络插件问题
没有网络插件,k8s的集群就搭不起来,比如pod间的通信、k8s容器间的通信、pod与service间的通信,网络插件就是他们通信的网桥,我们可以使用官方提供的fannel网络插件, 部署网络fannel插件很简单,只需要使用正在run 的k8s 执行以下命令即可:
kubectl apply -f kube-fannel.yaml
注: 官方建议需要提供cni的version在kube-fannel.yaml里:
"cniVersion": "0.3.1"
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: psp.flannel.unprivileged
annotations:
seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default
seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default
apparmor.security.beta.kubernetes.io/allowedProfileNames: runtime/default
apparmor.security.beta.kubernetes.io/defaultProfileName: runtime/default
spec:
privileged: false
volumes:
- configMap
- secret
- emptyDir
- hostPath
allowedHostPaths:
- pathPrefix: "/etc/cni/net.d"
- pathPrefix: "/etc/kube-flannel"
- pathPrefix: "/run/flannel"
readOnlyRootFilesystem: false
# Users and groups
runAsUser:
rule: RunAsAny
supplementalGroups:
rule: RunAsAny
fsGroup:
rule: RunAsAny
# Privilege Escalation
allowPrivilegeEscalation: false
defaultAllowPrivilegeEscalation: false
# Capabilities
allowedCapabilities: [ 'NET_ADMIN', 'NET_RAW' ]
defaultAddCapabilities: [ ]
requiredDropCapabilities: [ ]
# Host namespaces
hostPID: false
hostIPC: false
hostNetwork: true
hostPorts:
- min: 0
max: 65535
# SELinux
seLinux:
# SELinux is unused in CaaSP
rule: 'RunAsAny'
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: flannel
rules:
- apiGroups: [ 'extensions' ]
resources: [ 'podsecuritypolicies' ]
verbs: [ 'use' ]
resourceNames: [ 'psp.flannel.unprivileged' ]
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- ""
resources:
- nodes
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: flannel
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: flannel
subjects:
- kind: ServiceAccount
name: flannel
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: flannel
namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
name: kube-flannel-cfg
namespace: kube-system
labels:
tier: node
app: flannel
data:
cni-conf.json: |
{
"name": "cbr0",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
net-conf.json: |
{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "vxlan"
}
}
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-flannel-ds
namespace: kube-system
labels:
tier: node
app: flannel
spec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
hostNetwork: true
priorityClassName: system-node-critical
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni-plugin
#image: flannelcni/flannel-cni-plugin:v1.0.1 for ppc64le and mips64le (dockerhub limitations may apply)
image: rancher/mirrored-flannelcni-flannel-cni-plugin:v1.0.1
command:
- cp
args:
- -f
- /flannel
- /opt/cni/bin/flannel
volumeMounts:
- name: cni-plugin
mountPath: /opt/cni/bin
- name: install-cni
#image: flannelcni/flannel:v0.16.3 for ppc64le and mips64le (dockerhub limitations may apply)
image: rancher/mirrored-flannelcni-flannel:v0.16.3
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
#image: flannelcni/flannel:v0.16.3 for ppc64le and mips64le (dockerhub limitations may apply)
image: rancher/mirrored-flannelcni-flannel:v0.16.3
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
resources:
requests:
cpu: "100m"
memory: "50Mi"
limits:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: false
capabilities:
add: [ "NET_ADMIN", "NET_RAW" ]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
volumeMounts:
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
- name: xtables-lock
mountPath: /run/xtables.lock
volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni-plugin
hostPath:
path: /opt/cni/bin
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
- name: xtables-lock
hostPath:
path: /run/xtables.lock
type: FileOrCreate
2. 删除容器时一直处于Terminting状态
我们可以使用kubectl delete -f .yaml执行命令
kubectl delete -f redis-master.yaml
卡住后,这个时候不要去重启电脑,再启一个终端执行命令systemctl restart kubelet 即可。
systemctl restart kubelet
执行完毕后,等待一会就会发现pod和service相关的服务都删掉了。
3.无法访问k8s.gcr.io问题解决
k8s.gcr.io是谷歌提供的k8s镜像云,不能访问外网的小伙伴就痛苦了~, 我们可以使用国内阿里云镜像registry.aliyuncs.com/google_containers
kubeadm init \
--apiserver-advertise-address=192.168.232.129 \
--image-repository registry.aliyuncs.com/google_containers \
--kubernetes-version v1.18.6 \
--service-cidr=10.96.0.0/12 \
--pod-network-cidr=10.244.0.0/16
4.无法访问apiServer以及解决kubernetes客户端秘钥不匹配的问题
当我在用kubeadm初始化集群的时候, 填的apiServer地址填错了,想修改,然后我就想法设法的找到apiServer这个config配置在系统的哪个位置。
最终找到了位置, 可使用 cat $HOME/.kube/config 查看ip地址:
修改后,清楚掉所有kubectl 所有进程后以及etcd数据库后,重新使用Kubeadm初始化集群,报错证书错误的相关问题,此方法不能彻底解决。
我们只需要将 $HOME/.kube 目录里的内容删掉,包含etcd以及/etc/kubernetes目录下的所有内容清掉后,重新执行kubeadmin init 命令即可,执行成功后,执行如下命令:
[root@localhost ~]# rm -rf /etc/kubernetes
[root@localhost ~]# rm -rf /var/lib/etcd
[root@localhost ~]# kubeadm reset
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0316 17:20:01.413042 36498 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please, manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
W0316 17:20:01.619021 36498 cleanupnode.go:81] [reset] Failed to remove containers: exit status 1
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]
The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
提示删除 .kube目录:
[root@localhost ~]# rm -rf .kube/
[root@localhost ~]#
等使用kubeadm初始化完毕后,重新创建.kube目录:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
上述问题即可解决,因为重新创建集群时,.kube的存在会让kubernetes客户端的证书生成有误。
5. K8s 排错指南
如果pod没有启动成功,可以使用如下命令查看pod的状况。
kubectl describe pod <pod_name>
kubectl logs <pod_name>
根据提示信息,检查yaml是否编译通过,相关配置是否正确。