KubeSphere 服务网格
官网介绍:
KubeSphere 服务网格基于Istio,将微服务治理和流量管理可视化。它拥有强大的工具包,包括熔断机制、蓝绿部署、金丝雀发布、流量镜像、链路追踪、可观测性和流量控制等。KubeSphere 服务网格支持代码无侵入的微服务治理,帮助开发者快速上手,Istio 的学习曲线也极大降低。KubeSphere 服务网格的所有功能都旨在满足用户的业务需求。
当前3.3.0版本的 KubeSphere 暂不支持为多集群
应用创建灰度发布策略。单集群或者最简单的All-in-one
虚拟机还是可以用的。有多集群灰度发布需求暂时不能考虑KubeSphere,需要自行搭建K8S集群及Istio等组件,自己想办法解决。
Istio
中文官网:https://istio.io/latest/zh/
Istio是个很常见的服务网格组件,主要是负载均衡
、流量管控
等功能。
灰度发布
KubeSphere官网概述:https://kubesphere.com.cn/docs/v3.3/project-user-guide/grayscale-release/overview/
所谓的灰度发布,其实就是通过不同的发布策略,将老旧的微服务替换为新版本微服务,且升级过程中遇到问题时风险更小,尽量减少对prod环境的影响。
蓝绿部署
KubeSphere官网介绍:https://kubesphere.com.cn/docs/v3.3/project-user-guide/grayscale-release/blue-green-deployment/
蓝绿部署会创建一个相同的备用环境,在该环境中运行新的应用版本,从而为发布新版本提供一个高效的方式,不会出现宕机或者服务中断。通过这种方法,KubeSphere 将所有流量路由至其中一个版本,即在任意给定时间只有一个环境接收流量。如果新构建版本出现任何问题,可以立刻回滚至先前版本。
这种发布策略很容易理解,就是创建备份,如果新版本不稳定或者功能、性能不达标,就立即回撤
到老版本。Istio切换流量转发比一定是较人工重新部署上线更加迅速。
金丝雀发布
KubeSphere官网介绍:https://kubesphere.com.cn/docs/v3.3/project-user-guide/grayscale-release/canary-release/
金丝雀部署缓慢地向一小部分用户推送变更,从而将版本升级的风险降到最低。具体来讲,可以在高度响应的仪表板上进行定义,选择将新的应用版本暴露给一部分生产流量。另外,执行金丝雀部署后,KubeSphere 会监控请求,提供实时流量的可视化视图。在整个过程中,可以分析新的应用版本的行为,选择逐渐增加向它发送的流量比例。待对构建版本有把握后,便可以把所有流量路由至该构建版本。
这种发布策略就类似与各种网游的内测、封测、公测、正式运营。让各种精英用户率先体验,再逐步扩大测试范围
,直到稳定运行。
流量镜像
KubeSphere官网介绍:https://kubesphere.com.cn/docs/v3.3/project-user-guide/grayscale-release/traffic-mirroring/
流量镜像复制实时生产流量并发送至镜像服务。默认情况下,KubeSphere 会镜像所有流量,也可以指定一个值来手动定义镜像流量的百分比。常见用例包括:
- 测试新的应用版本。可以对比镜像流量和生产流量的实时输出。
- 测试集群。可以将实例的生产流量用于集群测试。
- 测试数据库。可以使用空数据库来存储和加载数据。
这种发布策略就是将同一份prod环境的流量请求发送到镜像服务,类似MQ中消息的1对多分发
。这种方式只会占用网络带宽、CPU时间片、内存、硬盘等资源,但是只要资源充足没有遇到性能瓶颈,就不会影响到prod环境。相同的流量请求转发到镜像服务后,便可以利用prod的数据做功能测试
、性能压测
。
启动服务网格Istio
KubeKey中文Github文档:https://github.com/kubesphere/kubekey/blob/master/README_zh-CN.md
KubeSphere官方文档:https://kubesphere.com.cn/docs/v3.3/installing-on-linux/introduction/multioverview/
在安装KubeSphere之前就可以修改配置,这样安装好KubeSphere后,正常情况会自动启动Istio。这是通过KubeKey
实现的。
安装KubeSphere前配置启动
Linux安装KubeSphere时
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall# ./kk create config
Generate KubeKey config file successfully
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall# ll
总用量 70344
drwxrwxr-x 3 zhiyong zhiyong 4096 8月 16 23:14 ./
drwxr-xr-x 16 zhiyong zhiyong 4096 8月 16 23:12 ../
-rw-r--r-- 1 root root 1065 8月 16 23:14 config-sample.yaml
-rwxr-xr-x 1 zhiyong zhiyong 54910976 7月 26 14:17 kk*
drwxr-xr-x 12 root root 4096 8月 8 10:04 kubekey/
-rw-rw-r-- 1 zhiyong zhiyong 17102249 8月 8 01:03 kubekey-v2.2.2-linux-amd64.tar.gz
root@zhiyong-ksp1:/home/zhiyong/kubesphereinstall# cat config-sample.yaml
apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
name: sample
spec:
hosts:
- {name: node1, address: 172.16.0.2, internalAddress: 172.16.0.2, user: ubuntu, password: "Qcloud@123"}
- {name: node2, address: 172.16.0.3, internalAddress: 172.16.0.3, user: ubuntu, password: "Qcloud@123"}
roleGroups:
etcd:
- node1
control-plane:
- node1
worker:
- node1
- node2
controlPlaneEndpoint:
## Internal loadbalancer for apiservers
# internalLoadbalancer: haproxy
domain: lb.kubesphere.local
address: ""
port: 6443
kubernetes:
version: v1.23.8
clusterName: cluster.local
autoRenewCerts: true
containerManager: docker
etcd:
type: kubekey
network:
plugin: calico
kubePodsCIDR: 10.233.64.0/18
kubeServiceCIDR: 10.233.0.0/18
## multus support. https://github.com/k8snetworkplumbingwg/multus-cni
multusCNI:
enabled: false
registry:
privateRegistry: ""
namespaceOverride: ""
registryMirrors: []
insecureRegistries: []
addons: []
笔者创建的是默认的yaml配置文件。
还需要有如下内容:
servicemesh:
enabled: true # 将“false”更改为“true”。
istio: # Customizing the istio installation configuration, refer to https://istio.io/latest/docs/setup/additional-setup/customize-installation/
components:
ingressGateways:
- name: istio-ingressgateway # 将服务暴露至服务网格之外。默认不开启。
enabled: false
cni:
enabled: false # 启用后,会在 Kubernetes pod 生命周期的网络设置阶段完成 Istio 网格的 pod 流量转发设置工作。
之后执行:
./kk create cluster -f config-sample.yaml
KubeKey
会自动在Linux服务器创建一个包含Istio组件的K8S集群。
当然也可以手动安装Istio这种CNI组件。
Istio中文官网安装文档:https://istio.io/latest/zh/docs/setup/additional-setup/cni/
K8S集群安装KubeSphere时
由于KubeSphere既可以运行在Linux服务器,又可以直接运行在K8S的pod中,故已经有K8S集群时,也可以在安装KubeSphere时启动服务网格组件Istio。方法大同小异。
vim cluster-configuration.yaml
---
apiVersion: installer.kubesphere.io/v1alpha1
kind: ClusterConfiguration
metadata:
name: ks-installer
namespace: kubesphere-system
labels:
version: v3.3.0
spec:
persistence:
storageClass: "" # If there is no default StorageClass in your cluster, you need to specify an existing StorageClass here.
authentication:
jwtSecret: "" # Keep the jwtSecret consistent with the Host Cluster. Retrieve the jwtSecret by executing "kubectl -n kubesphere-system get cm kubesphere-config -o yaml | grep -v "apiVersion" | grep jwtSecret" on the Host Cluster.
local_registry: "" # Add your private registry address if it is needed.
# dev_tag: "" # Add your kubesphere image tag you want to install, by default it's same as ks-installer release version.
etcd:
monitoring: false # Enable or disable etcd monitoring dashboard installation. You have to create a Secret for etcd before you enable it.
endpointIps: localhost # etcd cluster EndpointIps. It can be a bunch of IPs here.
port: 2379 # etcd port.
tlsEnable: true
common:
core:
console:
enableMultiLogin: true # Enable or disable simultaneous logins. It allows different users to log in with the same account at the same time.
port: 30880
type: NodePort
# apiserver: # Enlarge the apiserver and controller manager's resource requests and limits for the large cluster
# resources: {}
# controllerManager:
# resources: {}
redis:
enabled: false
enableHA: false
volumeSize: 2Gi # Redis PVC size.
openldap:
enabled: false
volumeSize: 2Gi # openldap PVC size.
minio:
volumeSize: 20Gi # Minio PVC size.
monitoring:
# type: external # Whether to specify the external prometheus stack, and need to modify the endpoint at the next line.
endpoint: http://prometheus-operated.kubesphere-monitoring-system.svc:9090 # Prometheus endpoint to get metrics data.
GPUMonitoring: # Enable or disable the GPU-related metrics. If you enable this switch but have no GPU resources, Kubesphere will set it to zero.
enabled: false
gpu: # Install GPUKinds. The default GPU kind is nvidia.com/gpu. Other GPU kinds can be added here according to your needs.
kinds:
- resourceName: "nvidia.com/gpu"
resourceType: "GPU"
default: true
es: # Storage backend for logging, events and auditing.
# master:
# volumeSize: 4Gi # The volume size of Elasticsearch master nodes.
# replicas: 1 # The total number of master nodes. Even numbers are not allowed.
# resources: {}
# data:
# volumeSize: 20Gi # The volume size of Elasticsearch data nodes.
# replicas: 1 # The total number of data nodes.
# resources: {}
logMaxAge: 7 # Log retention time in built-in Elasticsearch. It is 7 days by default.
elkPrefix: logstash # The string making up index names. The index name will be formatted as ks-<elk_prefix>-log.
basicAuth:
enabled: false
username: ""
password: ""
externalElasticsearchHost: ""
externalElasticsearchPort: ""
alerting: # (CPU: 0.1 Core, Memory: 100 MiB) It enables users to customize alerting policies to send messages to receivers in time with different time intervals and alerting levels to choose from.
enabled: false # Enable or disable the KubeSphere Alerting System.
# thanosruler:
# replicas: 1
# resources: {}
auditing: # Provide a security-relevant chronological set of records,recording the sequence of activities happening on the platform, initiated by different tenants.
enabled: false # Enable or disable the KubeSphere Auditing Log System.
# operator:
# resources: {}
# webhook:
# resources: {}
devops: # (CPU: 0.47 Core, Memory: 8.6 G) Provide an out-of-the-box CI/CD system based on Jenkins, and automated workflow tools including Source-to-Image & Binary-to-Image.
enabled: false # Enable or disable the KubeSphere DevOps System.
# resources: {}
jenkinsMemoryLim: 2Gi # Jenkins memory limit.
jenkinsMemoryReq: 1500Mi # Jenkins memory request.
jenkinsVolumeSize: 8Gi # Jenkins volume size.
jenkinsJavaOpts_Xms: 1200m # The following three fields are JVM parameters.
jenkinsJavaOpts_Xmx: 1600m
jenkinsJavaOpts_MaxRAM: 2g
events: # Provide a graphical web console for Kubernetes Events exporting, filtering and alerting in multi-tenant Kubernetes clusters.
enabled: false # Enable or disable the KubeSphere Events System.
# operator:
# resources: {}
# exporter:
# resources: {}
# ruler:
# enabled: true
# replicas: 2
# resources: {}
logging: # (CPU: 57 m, Memory: 2.76 G) Flexible logging functions are provided for log query, collection and management in a unified console. Additional log collectors can be added, such as Elasticsearch, Kafka and Fluentd.
enabled: false # Enable or disable the KubeSphere Logging System.
logsidecar:
enabled: true
replicas: 2
# resources: {}
metrics_server: # (CPU: 56 m, Memory: 44.35 MiB) It enables HPA (Horizontal Pod Autoscaler).
enabled: false # Enable or disable metrics-server.
monitoring:
storageClass: "" # If there is an independent StorageClass you need for Prometheus, you can specify it here. The default StorageClass is used by default.
node_exporter:
port: 9100
# resources: {}
# kube_rbac_proxy:
# resources: {}
# kube_state_metrics:
# resources: {}
# prometheus:
# replicas: 1 # Prometheus replicas are responsible for monitoring different segments of data source and providing high availability.
# volumeSize: 20Gi # Prometheus PVC size.
# resources: {}
# operator:
# resources: {}
# alertmanager:
# replicas: 1 # AlertManager Replicas.
# resources: {}
# notification_manager:
# resources: {}
# operator:
# resources: {}
# proxy:
# resources: {}
gpu: # GPU monitoring-related plug-in installation.
nvidia_dcgm_exporter: # Ensure that gpu resources on your hosts can be used normally, otherwise this plug-in will not work properly.
enabled: false # Check whether the labels on the GPU hosts contain "nvidia.com/gpu.present=true" to ensure that the DCGM pod is scheduled to these nodes.
# resources: {}
multicluster:
clusterRole: none # host | member | none # You can install a solo cluster, or specify it as the Host or Member Cluster.
network:
networkpolicy: # Network policies allow network isolation within the same cluster, which means firewalls can be set up between certain instances (Pods).
# Make sure that the CNI network plugin used by the cluster supports NetworkPolicy. There are a number of CNI network plugins that support NetworkPolicy, including Calico, Cilium, Kube-router, Romana and Weave Net.
enabled: false # Enable or disable network policies.
ippool: # Use Pod IP Pools to manage the Pod network address space. Pods to be created can be assigned IP addresses from a Pod IP Pool.
type: none # Specify "calico" for this field if Calico is used as your CNI plugin. "none" means that Pod IP Pools are disabled.
topology: # Use Service Topology to view Service-to-Service communication based on Weave Scope.
type: none # Specify "weave-scope" for this field to enable Service Topology. "none" means that Service Topology is disabled.
openpitrix: # An App Store that is accessible to all platform tenants. You can use it to manage apps across their entire lifecycle.
store:
enabled: false # Enable or disable the KubeSphere App Store.
servicemesh: # (0.3 Core, 300 MiB) Provide fine-grained traffic management, observability and tracing, and visualized traffic topology.
enabled: false # Base component (pilot). Enable or disable KubeSphere Service Mesh (Istio-based).
istio: # Customizing the istio installation configuration, refer to https://istio.io/latest/docs/setup/additional-setup/customize-installation/
components:
ingressGateways:
- name: istio-ingressgateway
enabled: false
cni:
enabled: false
edgeruntime: # Add edge nodes to your cluster and deploy workloads on edge nodes.
enabled: false
kubeedge: # kubeedge configurations
enabled: false
cloudCore:
cloudHub:
advertiseAddress: # At least a public IP address or an IP address which can be accessed by edge nodes must be provided.
- "" # Note that once KubeEdge is enabled, CloudCore will malfunction if the address is not provided.
service:
cloudhubNodePort: "30000"
cloudhubQuicNodePort: "30001"
cloudhubHttpsNodePort: "30002"
cloudstreamNodePort: "30003"
tunnelNodePort: "30004"
# resources: {}
# hostNetWork: false
iptables-manager:
enabled: true
mode: "external"
# resources: {}
# edgeService:
# resources: {}
gatekeeper: # Provide admission policy and rule management, A validating (mutating TBA) webhook that enforces CRD-based policies executed by Open Policy Agent.
enabled: false # Enable or disable Gatekeeper.
# controller_manager:
# resources: {}
# audit:
# resources: {}
terminal:
# image: 'alpine:3.15' # There must be an nsenter program in the image
timeout: 600 # Container timeout, if set to 0, no timeout will be used. The unit is seconds
同样是修改这一段:
servicemesh:
enabled: true # 将“false”更改为“true”。
istio: # Customizing the istio installation configuration, refer to https://istio.io/latest/docs/setup/additional-setup/customize-installation/
components:
ingressGateways:
- name: istio-ingressgateway # 将服务暴露至服务网格之外。默认不开启。
enabled: false
cni:
enabled: false # 启用后,会在 Kubernetes pod 生命周期的网络设置阶段完成 Istio 网格的 pod 流量转发设置工作。
之后:
kubectl apply -f https://github.com/kubesphere/ks-installer/releases/download/v3.3.0/kubesphere-installer.yaml
kubectl apply -f cluster-configuration.yaml
K8S会根据描述的yaml
文件配置项自动安装KubeSphere,并安装和配置好Istio。
安装KubeSphere后配置启动
由于Istio组件是运行在K8S的pod中,故只要有K8S环境就可以启动pod使用Istio。安装好KubeSphere之后,不管KubeSphere是运行在Linux还是K8S的pod,也不管K8S集群是All-in-one还是多节点,在KubeSphere中配置启动Istio都很容易。
先使用管理员登录:
http://192.168.88.20:30880
admin
Aa123456
平台管理→集群管理:
在定制资源定义,搜索clusterconf
:
点进这个ClusterConfiguration
之后:
可以编辑yaml
。
该yaml目前的内容:
apiVersion: installer.kubesphere.io/v1alpha1
kind: ClusterConfiguration
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: >
{"apiVersion":"installer.kubesphere.io/v1alpha1","kind":"ClusterConfiguration","metadata":{"annotations":{},"labels":{"version":"v3.3.0"},"name":"ks-installer","namespace":"kubesphere-system"},"spec":{"alerting":{"enabled":false},"auditing":{"enabled":false},"authentication":{"jwtSecret":""},"common":{"core":{"console":{"enableMultiLogin":true,"port":30880,"type":"NodePort"}},"es":{"basicAuth":{"enabled":false,"password":"","username":""},"elkPrefix":"logstash","externalElasticsearchHost":"","externalElasticsearchPort":"","logMaxAge":7},"gpu":{"kinds":[{"default":true,"resourceName":"nvidia.com/gpu","resourceType":"GPU"}]},"minio":{"volumeSize":"20Gi"},"monitoring":{"GPUMonitoring":{"enabled":false},"endpoint":"http://prometheus-operated.kubesphere-monitoring-system.svc:9090"},"openldap":{"enabled":false,"volumeSize":"2Gi"},"redis":{"enabled":false,"volumeSize":"2Gi"}},"devops":{"enabled":false,"jenkinsJavaOpts_MaxRAM":"2g","jenkinsJavaOpts_Xms":"1200m","jenkinsJavaOpts_Xmx":"1600m","jenkinsMemoryLim":"2Gi","jenkinsMemoryReq":"1500Mi","jenkinsVolumeSize":"8Gi"},"edgeruntime":{"enabled":false,"kubeedge":{"cloudCore":{"cloudHub":{"advertiseAddress":[""]},"service":{"cloudhubHttpsNodePort":"30002","cloudhubNodePort":"30000","cloudhubQuicNodePort":"30001","cloudstreamNodePort":"30003","tunnelNodePort":"30004"}},"enabled":false,"iptables-manager":{"enabled":true,"mode":"external"}}},"etcd":{"endpointIps":"192.168.88.20","monitoring":false,"port":2379,"tlsEnable":true},"events":{"enabled":false},"logging":{"enabled":false,"logsidecar":{"enabled":true,"replicas":2}},"metrics_server":{"enabled":false},"monitoring":{"gpu":{"nvidia_dcgm_exporter":{"enabled":false}},"node_exporter":{"port":9100},"storageClass":""},"multicluster":{"clusterRole":"none"},"network":{"ippool":{"type":"none"},"networkpolicy":{"enabled":false},"topology":{"type":"none"}},"openpitrix":{"store":{"enabled":false}},"persistence":{"storageClass":""},"servicemesh":{"enabled":false,"istio":{"components":{"cni":{"enabled":false},"ingressGateways":[{"enabled":false,"name":"istio-ingressgateway"}]}}},"terminal":{"timeout":600},"zone":"cn"}}
labels:
version: v3.3.0
name: ks-installer
namespace: kubesphere-system
spec:
alerting:
enabled: false
auditing:
enabled: false
authentication:
jwtSecret: ''
common:
core:
console:
enableMultiLogin: true
port: 30880
type: NodePort
es:
basicAuth:
enabled: false
password: ''
username: ''
elkPrefix: logstash
externalElasticsearchHost: ''
externalElasticsearchPort: ''
logMaxAge: 7
gpu:
kinds:
- default: true
resourceName: nvidia.com/gpu
resourceType: GPU
minio:
volumeSize: 20Gi
monitoring:
GPUMonitoring:
enabled: false
endpoint: 'http://prometheus-operated.kubesphere-monitoring-system.svc:9090'
openldap:
enabled: false
volumeSize: 2Gi
redis:
enabled: false
volumeSize: 2Gi
devops:
enabled: false
jenkinsJavaOpts_MaxRAM: 2g
jenkinsJavaOpts_Xms: 1200m
jenkinsJavaOpts_Xmx: 1600m
jenkinsMemoryLim: 2Gi
jenkinsMemoryReq: 1500Mi
jenkinsVolumeSize: 8Gi
edgeruntime:
enabled: false
kubeedge:
cloudCore:
cloudHub:
advertiseAddress:
- ''
service:
cloudhubHttpsNodePort: '30002'
cloudhubNodePort: '30000'
cloudhubQuicNodePort: '30001'
cloudstreamNodePort: '30003'
tunnelNodePort: '30004'
enabled: false
iptables-manager:
enabled: true
mode: external
etcd:
endpointIps: 192.168.88.20
monitoring: false
port: 2379
tlsEnable: true
events:
enabled: false
logging:
enabled: false
logsidecar:
enabled: true
replicas: 2
metrics_server:
enabled: false
monitoring:
gpu:
nvidia_dcgm_exporter:
enabled: false
node_exporter:
port: 9100
storageClass: ''
multicluster:
clusterRole: none
network:
ippool:
type: none
networkpolicy:
enabled: false
topology:
type: none
openpitrix:
store:
enabled: false
persistence:
storageClass: ''
servicemesh:
enabled: false
istio:
components:
cni:
enabled: false
ingressGateways:
- enabled: false
name: istio-ingressgateway
terminal:
timeout: 600
zone: cn
显然按照官网文档,应该将末尾修改为:
servicemesh:
enabled: true # 将“false”更改为“true”。
istio: # Customizing the istio installation configuration, refer to https://istio.io/latest/docs/setup/additional-setup/customize-installation/
components:
ingressGateways:
- name: istio-ingressgateway # 将服务暴露至服务网格之外。默认不开启。
enabled: false
cni:
enabled: false # 启用后,会在 Kubernetes pod 生命周期的网络设置阶段完成 Istio 网格的 pod 流量转发设置工作。
根据yaml的规范,true前的空格绝对不能少!!!
确定保存后,即可检查Istio组件的安装过程【Ubuntu20.04需要切换root
用户执行】:
kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l 'app in (ks-install, ks-installer)' -o jsonpath='{.items[0].metadata.name}') -f
然后:
喜闻乐见的1核有难,15核点赞。。。有条件还是要上高主频的U,虽然核多也很重要。
Top查看CPU占用情况,发现python3占用了99.7%的CPU。。。
等一阵子以后:
Waiting for all tasks to be completed ...
task network status is successful (1/5)
task openpitrix status is successful (2/5)
task multicluster status is successful (3/5)
task monitoring status is successful (4/5)
task servicemesh status is successful (5/5)
**************************************************
Collecting installation results ...
#####################################################
### Welcome to KubeSphere! ###
#####################################################
Console: http://192.168.88.20:30880
Account: admin
Password: P@88w0rd
NOTES:
1. After you log into the console, please check the
monitoring status of service components in
"Cluster Management". If any service is not
ready, please wait patiently until all components
are up and running.
2. Please change the default password after login.
这就代表完成了Istio的安装及初始化。不需要理会这个原始密码
。重新登录还是要用已经更改的密码
。
验证Istio组件安装情况
在WebUI可以看到:
系统组件中已经出现了Istio
组件。但是点进去发现:
此时不但Istio
异常,连带之前正常的Prometheus
也一并异常了:
执行:
root@zhiyong-ksp1:/home/zhiyong# kubectl get pod -n istio-system
NAME READY STATUS RESTARTS AGE
istiod-1-11-2-54dd699c87-99krn 0/1 ContainerCreating 0 27m
jaeger-operator-fccc48b86-vtcr8 0/1 ContainerCreating 0 7m10s
kiali-operator-c459985f7-sttfs 0/1 ContainerCreating 0 7m5s
root@zhiyong-ksp1:/home/zhiyong# kubectl get pod --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
istio-system istiod-1-11-2-54dd699c87-99krn 0/1 ContainerCreating 0 30m
istio-system jaeger-operator-fccc48b86-vtcr8 0/1 ContainerCreating 0 9m53s
istio-system kiali-operator-c459985f7-sttfs 0/1 ContainerCreating 0 9m48s
kube-system calico-kube-controllers-f9f9bbcc9-2v7lm 1/1 Running 1 (8d ago) 8d
kube-system calico-node-4mgc7 1/1 Running 1 (8d ago) 8d
kube-system coredns-f657fccfd-2gw7h 1/1 Running 1 (8d ago) 8d
kube-system coredns-f657fccfd-pflwf 1/1 Running 1 (8d ago) 8d
kube-system kube-apiserver-zhiyong-ksp1 1/1 Running 1 (8d ago) 8d
kube-system kube-controller-manager-zhiyong-ksp1 1/1 Running 1 (8d ago) 8d
kube-system kube-proxy-cn68l 1/1 Running 1 (8d ago) 8d
kube-system kube-scheduler-zhiyong-ksp1 1/1 Running 1 (8d ago) 8d
kube-system nodelocaldns-96gtw 1/1 Running 1 (8d ago) 8d
kube-system openebs-localpv-provisioner-68db4d895d-p9527 1/1 Running 0 8d
kube-system snapshot-controller-0 1/1 Running 1 (8d ago) 8d
kubesphere-controls-system default-http-backend-587748d6b4-ccg59 1/1 Running 1 (8d ago) 8d
kubesphere-controls-system kubectl-admin-5d588c455b-82cnk 1/1 Running 1 (8d ago) 8d
kubesphere-logging-system elasticsearch-logging-curator-elasticsearch-curator-2767784rhhk 0/1 ContainerCreating 0 15m
kubesphere-logging-system elasticsearch-logging-data-0 0/1 Pending 0 32m
kubesphere-logging-system elasticsearch-logging-discovery-0 0/1 Pending 0 32m
kubesphere-monitoring-system alertmanager-main-0 2/2 Running 2 (8d ago) 8d
kubesphere-monitoring-system kube-state-metrics-6d6786b44-bbb4f 3/3 Running 3 (8d ago) 8d
kubesphere-monitoring-system node-exporter-8sz74 2/2 Running 2 (8d ago) 8d
kubesphere-monitoring-system notification-manager-deployment-6f8c66ff88-pt4l8 2/2 Running 2 (8d ago) 8d
kubesphere-monitoring-system notification-manager-operator-6455b45546-nkmx8 2/2 Running 2 (8d ago) 8d
kubesphere-monitoring-system prometheus-k8s-0 0/2 Terminating 0 8d
kubesphere-monitoring-system prometheus-operator-66d997dccf-c968c 2/2 Running 2 (8d ago) 8d
kubesphere-system ks-apiserver-6b9bcb86f4-hsdzs 1/1 Running 1 (8d ago) 8d
kubesphere-system ks-console-599c49d8f6-ngb6b 1/1 Running 1 (8d ago) 8d
kubesphere-system ks-controller-manager-66747fcddc-r7cpt 1/1 Running 1 (8d ago) 8d
kubesphere-system ks-installer-5fd8bd46b8-dzhbb 1/1 Running 1 (8d ago) 8d
耐心等一会儿。。。
从KubeSphere的web UI
监控可以看出目前状态还是容器创建中。但是一直这样也不合适。。。
解决ContainerCreating
查看日志
root@zhiyong-ksp1:/home/zhiyong# kubectl describe pod istiod-1-11-2-54dd699c87-99krn -n istio-system
Name: istiod-1-11-2-54dd699c87-99krn
Namespace: istio-system
Priority: 0
Node: zhiyong-ksp1/192.168.88.20
Start Time: Wed, 17 Aug 2022 00:44:55 +0800
Labels: app=istiod
install.operator.istio.io/owning-resource=unknown
istio=istiod
istio.io/rev=1-11-2
operator.istio.io/component=Pilot
pod-template-hash=54dd699c87
sidecar.istio.io/inject=false
Annotations: prometheus.io/port: 15014
prometheus.io/scrape: true
sidecar.istio.io/inject: false
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/istiod-1-11-2-54dd699c87
Containers:
discovery:
Container ID:
Image: registry.cn-beijing.aliyuncs.com/kubesphereio/pilot:1.11.1
Image ID:
Ports: 8080/TCP, 15010/TCP, 15017/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
discovery
--monitoringAddr=:15014
--log_output_level=default:info
--domain
cluster.local
--keepaliveMaxServerConnectionAge
30m
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Requests:
cpu: 500m
memory: 2Gi
Readiness: http-get http://:8080/ready delay=1s timeout=5s period=3s #success=1 #failure=3
Environment:
REVISION: 1-11-2
JWT_POLICY: first-party-jwt
PILOT_CERT_PROVIDER: istiod
POD_NAME: istiod-1-11-2-54dd699c87-99krn (v1:metadata.name)
POD_NAMESPACE: istio-system (v1:metadata.namespace)
SERVICE_ACCOUNT: (v1:spec.serviceAccountName)
KUBECONFIG: /var/run/secrets/remote/config
ENABLE_LEGACY_FSGROUP_INJECTION: false
PILOT_TRACE_SAMPLING: 1
PILOT_ENABLE_PROTOCOL_SNIFFING_FOR_OUTBOUND: true
PILOT_ENABLE_PROTOCOL_SNIFFING_FOR_INBOUND: true
ISTIOD_ADDR: istiod-1-11-2.istio-system.svc:15012
PILOT_ENABLE_ANALYSIS: false
CLUSTER_ID: Kubernetes
Mounts:
/etc/cacerts from cacerts (ro)
/var/run/secrets/istio-dns from local-certs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-l54jm (ro)
/var/run/secrets/remote from istio-kubeconfig (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
local-certs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
cacerts:
Type: Secret (a volume populated by a Secret)
SecretName: cacerts
Optional: true
istio-kubeconfig:
Type: Secret (a volume populated by a Secret)
SecretName: istio-kubeconfig
Optional: true
kube-api-access-l54jm:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 43m default-scheduler Successfully assigned istio-system/istiod-1-11-2-54dd699c87-99krn to zhiyong-ksp1
Warning FailedCreatePodSandBox 43m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "5d0a3bdb6dea937aa5b118bbd00305a1542111c97af84a3cbdd8f188b1681687": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 43m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "ff84de82acfd944be7f3804c96f39ab976ae4d6810b7e0364c90560a4b4070e7": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 42m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "6337bea6f7c16cd9adcff0d2b75238beb4365dc4b880d4c8e4f4535885d59d30": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 42m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "42e08603d4d7e7d1713eecbb21af258022e3fb50c6f5611808b3e2755d50d980": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 42m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "51a6b5b8ea5a63f4be828a0c855802e42640324c440fcc3487c535123d7b3372": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 42m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "dada948b2a416a0ec925b7f67a101b8fd48fdad9fb20d6c41eaf1bbad0a18e57": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 41m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "df3487e020c1e7eb527cc0fce1fe990873bd20f46cbf04de99005e0da5896abe": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 41m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "92e739549a96aa03ea864188abc1b91c9a45394dae28ad97234fa1caf4d52240": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 41m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "bc5d1999a2d5ad4d7cf5c1e1c3c7c1a80dee02b806d0be2e15c326e2d82f4af5": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 3m2s (x176 over 41m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "be41a317c2e14b4096f2f8f0d4bfaa8a80572f7365ab3d92c20be75fe97304f4": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
显然出现了网络没有认证通过
的问题。
再次查看Prometheus
的日志:
root@zhiyong-ksp1:/home/zhiyong# kubectl describe pod prometheus-k8s-0 -n kubesphere-monitoring-system
Name: prometheus-k8s-0
Namespace: kubesphere-monitoring-system
Priority: 0
Node: zhiyong-ksp1/192.168.88.20
Start Time: Mon, 08 Aug 2022 20:42:21 +0800
Labels: app.kubernetes.io/component=prometheus
app.kubernetes.io/instance=k8s
app.kubernetes.io/managed-by=prometheus-operator
app.kubernetes.io/name=prometheus
app.kubernetes.io/part-of=kube-prometheus
app.kubernetes.io/version=2.34.0
controller-revision-hash=prometheus-k8s-557cc865c4
operator.prometheus.io/name=k8s
operator.prometheus.io/shard=0
prometheus=k8s
statefulset.kubernetes.io/pod-name=prometheus-k8s-0
Annotations: cni.projectcalico.org/containerID: 1d4064f425cad8043d3b38e60155e778e9a1390bc2486b76ac29ad14fb589b40
cni.projectcalico.org/podIP: 10.233.107.36/32
cni.projectcalico.org/podIPs: 10.233.107.36/32
kubectl.kubernetes.io/default-container: prometheus
Status: Terminating (lasts 41m)
Termination Grace Period: 600s
IP: 10.233.107.36
IPs:
IP: 10.233.107.36
Controlled By: StatefulSet/prometheus-k8s
Init Containers:
init-config-reloader:
Container ID: containerd://f29630d87dccf60dc8bd065f53ad5187d2f7600a35500a4fa4bfd71a2118daa6
Image: registry.cn-beijing.aliyuncs.com/kubesphereio/prometheus-config-reloader:v0.55.1
Image ID: registry.cn-beijing.aliyuncs.com/kubesphereio/prometheus-config-reloader@sha256:7743c7ef48f9c0ae6f5c0de4b26e7ff6ae9ece4917a4e139acb21a0d8e77aa3c
Port: 8080/TCP
Host Port: 0/TCP
Command:
/bin/prometheus-config-reloader
Args:
--watch-interval=0
--listen-address=:8080
--config-file=/etc/prometheus/config/prometheus.yaml.gz
--config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
--watched-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 08 Aug 2022 20:42:21 +0800
Finished: Mon, 08 Aug 2022 20:42:22 +0800
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 100m
memory: 50Mi
Environment:
POD_NAME: prometheus-k8s-0 (v1:metadata.name)
SHARD: 0
Mounts:
/etc/prometheus/config from config (rw)
/etc/prometheus/config_out from config-out (rw)
/etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vcb4c (ro)
Containers:
prometheus:
Container ID: containerd://2b913fb7dadcc7342759437d2068d0a9cbdcd96fadbb567c0ca5212ca72fb372
Image: registry.cn-beijing.aliyuncs.com/kubesphereio/prometheus:v2.34.0
Image ID: registry.cn-beijing.aliyuncs.com/kubesphereio/prometheus@sha256:b37103e03399e90c9b7b1b2940894d3634915cf9df4aa2e5402bd85b4377808c
Port: 9090/TCP
Host Port: 0/TCP
Args:
--web.console.templates=/etc/prometheus/consoles
--web.console.libraries=/etc/prometheus/console_libraries
--storage.tsdb.retention.time=7d
--config.file=/etc/prometheus/config_out/prometheus.env.yaml
--storage.tsdb.path=/prometheus
--web.enable-lifecycle
--query.max-concurrency=1000
--web.route-prefix=/
--web.config.file=/etc/prometheus/web_config/web-config.yaml
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 08 Aug 2022 20:42:51 +0800
Finished: Wed, 17 Aug 2022 00:45:37 +0800
Ready: False
Restart Count: 0
Limits:
cpu: 4
memory: 16Gi
Requests:
cpu: 200m
memory: 400Mi
Liveness: http-get http://:web/-/healthy delay=0s timeout=3s period=5s #success=1 #failure=6
Readiness: http-get http://:web/-/ready delay=0s timeout=3s period=5s #success=1 #failure=3
Startup: http-get http://:web/-/ready delay=0s timeout=3s period=15s #success=1 #failure=60
Environment: <none>
Mounts:
/etc/prometheus/certs from tls-assets (ro)
/etc/prometheus/config_out from config-out (ro)
/etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw)
/etc/prometheus/web_config/web-config.yaml from web-config (ro,path="web-config.yaml")
/prometheus from prometheus-k8s-db (rw,path="prometheus-db")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vcb4c (ro)
config-reloader:
Container ID: containerd://215303f25ece01ad28e56a8d94c19b00cbd9429d10cddc1b1db9981802e74011
Image: registry.cn-beijing.aliyuncs.com/kubesphereio/prometheus-config-reloader:v0.55.1
Image ID: registry.cn-beijing.aliyuncs.com/kubesphereio/prometheus-config-reloader@sha256:7743c7ef48f9c0ae6f5c0de4b26e7ff6ae9ece4917a4e139acb21a0d8e77aa3c
Port: 8080/TCP
Host Port: 0/TCP
Command:
/bin/prometheus-config-reloader
Args:
--listen-address=:8080
--reload-url=http://localhost:9090/-/reload
--config-file=/etc/prometheus/config/prometheus.yaml.gz
--config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
--watched-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0
State: Terminated
Reason: Error
Message: level=info ts=2022-08-08T12:42:51.99954274Z caller=main.go:111 msg="Starting prometheus-config-reloader" version="(version=0.55.1, branch=refs/tags/v0.55.1, revision=08c846115c67195bc821018168040db6f3e236e3)"
level=info ts=2022-08-08T12:42:51.999646088Z caller=main.go:112 build_context="(go=go1.17.7, user=Action-Run-ID-2045821452, date=20220326-21:47:32)"
level=info ts=2022-08-08T12:42:52.093230589Z caller=main.go:149 msg="Starting web server for metrics" listen=:8080
level=info ts=2022-08-08T12:42:52.195172719Z caller=reloader.go:373 msg="Reload triggered" cfg_in=/etc/prometheus/config/prometheus.yaml.gz cfg_out=/etc/prometheus/config_out/prometheus.env.yaml watched_dirs=/etc/prometheus/rules/prometheus-k8s-rulefiles-0
level=info ts=2022-08-08T12:42:52.195306486Z caller=reloader.go:235 msg="started watching config file and directories for changes" cfg=/etc/prometheus/config/prometheus.yaml.gz out=/etc/prometheus/config_out/prometheus.env.yaml dirs=/etc/prometheus/rules/prometheus-k8s-rulefiles-0
Exit Code: 2
Started: Mon, 08 Aug 2022 20:42:51 +0800
Finished: Wed, 17 Aug 2022 00:45:36 +0800
Ready: False
Restart Count: 0
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 100m
memory: 50Mi
Environment:
POD_NAME: prometheus-k8s-0 (v1:metadata.name)
SHARD: 0
Mounts:
/etc/prometheus/config from config (rw)
/etc/prometheus/config_out from config-out (rw)
/etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vcb4c (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
prometheus-k8s-db:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: prometheus-k8s-db-prometheus-k8s-0
ReadOnly: false
config:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-k8s
Optional: false
tls-assets:
Type: Projected (a volume that contains injected data from multiple sources)
SecretName: prometheus-k8s-tls-assets-0
SecretOptionalName: <nil>
config-out:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
prometheus-k8s-rulefiles-0:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus-k8s-rulefiles-0
Optional: false
web-config:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-k8s-web-config
Optional: false
kube-api-access-vcb4c:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: dedicated=monitoring:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Killing 51m kubelet Stopping container prometheus
Normal Killing 51m kubelet Stopping container config-reloader
Warning FailedKillPod 63s (x231 over 51m) kubelet error killing pod: failed to "KillPodSandbox" for "35e28d63-59c1-4860-a9bc-924123478928" with KillPodSandboxError: "rpc error: code = Unknown desc = failed to destroy network for sandbox \"1d4064f425cad8043d3b38e60155e778e9a1390bc2486b76ac29ad14fb589b40\": plugin type=\"calico\" failed (delete): error getting ClusterInformation: connection is unauthorized: Unauthorized"
定位问题
根据报错日志,基本上确定是Calico
的问题。
root@zhiyong-ksp1:/etc/cni/net.d# pwd
/etc/cni/net.d
root@zhiyong-ksp1:/etc/cni/net.d# ll
总用量 16
drwxr-xr-x 2 kube root 4096 8月 8 10:05 ./
drwxr-xr-x 3 kube root 4096 8月 8 10:02 ../
-rw-r--r-- 1 root root 663 8月 8 19:23 10-calico.conflist
-rw------- 1 root root 2713 8月 8 20:34 calico-kubeconfig
root@zhiyong-ksp1:/etc/cni/net.d# cat 10-calico.conflist
{
"name": "k8s-pod-network",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "calico",
"log_level": "info",
"log_file_path": "/var/log/calico/cni/cni.log",
"datastore_type": "kubernetes",
"nodename": "zhiyong-ksp1",
"mtu": 0,
"ipam": {
"type": "calico-ipam"
},
"policy": {
"type": "k8s"
},
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
}
},
{
"type": "portmap",
"snat": true,
"capabilities": {"portMappings": true}
},
{
"type": "bandwidth",
"capabilities": {"bandwidth": true}
}
]
root@zhiyong-ksp1:/etc/cni/net.d# cat 10-calico.conflist
{
"name": "k8s-pod-network",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "calico",
"log_level": "info",
"log_file_path": "/var/log/calico/cni/cni.log",
"datastore_type": "kubernetes",
"nodename": "zhiyong-ksp1",
"mtu": 0,
"ipam": {
"type": "calico-ipam"
},
"policy": {
"type": "k8s"
},
"kubernetes": {
"kubeconfig": "/etc/cni/net.d/calico-kubeconfig"
}
},
{
"type": "portmap",
"snat": true,
"capabilities": {"portMappings": true}
},
{
"type": "bandwidth",
"capabilities": {"bandwidth": true}
}
]
}root@zhiyong-ksp1:/etc/cni/net.d# cat calico-kubeconfig
# Kubeconfig file for Calico CNI plugin. Installed by calico/node.
apiVersion: v1
kind: Config
clusters:
- name: local
cluster:
server: https://10.233.0.1:443
certificate-authority-data: "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUMvakNDQWVhZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRJeU1EZ3dPREF5TURRek5Wb1hEVE15TURnd05UQXlNRFF6TlZvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBTHM5ClcxTkxMWGNHNlhIdzZ0VEVyV1pWTXdlUUdTV2IzU3UrMTN0V2REcUlhcm16YW1BWGNNbnlValRoNWhQdFZVVjcKNVdjYldXcFh3VTNOaWhpSXRmOXhoZ2tsMy9KVElycFBSdlRBc3VUVUo1RW9yb3BNLzNpRWpBZUc0d0RNQURtYwpKNHArSjlJSzZWekV4UUI3VTA2L1F6eWhRT3RQQS83dFlhbjM2dFE3eFRJYmJvQ3AvQXRSNHdqOXBBRHVSV1M2CnQ0ZlFZMUh4NHpaS1pmeEpBaXF5MXl5Ylg0ckxSektYMzJ0MXlsYk9ET21kWjZXVjJLZEgzYjV2V3ZrZThzQy8KcHhMT0JvRmRVdU0ra3hkUHgxMitHaVVtbUM0NDFEdU02MVZiQ0o0NlJ4QVlDenY4bmxoQUhrTDMrL3JQZ0U1dgpaYTZuSVoxdWVabFBRRXRqL3FFQ0F3RUFBYU5aTUZjd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0hRWURWUjBPQkJZRUZDMkk4MldLNEJjSWpieEQvVjl4U0VnblNhc1pNQlVHQTFVZEVRUU8KTUF5Q0NtdDFZbVZ5Ym1WMFpYTXdEUVlKS29aSWh2Y05BUUVMQlFBRGdnRUJBRWlLbklrendTaXpKL0ZhRmd4SQpPRlNoaTNTQ0NaNHNLVXliZVhkZkIwV3FLRHpialBteEZ3LzQ0SFMwUUhaNU5TVGp6WGtHQ1kyTlpDRTE3dldWCmtDYjFVM1czQmdaM05CSmZtV29sTEJQTCtnSkovYlRuRVJUTVY4MDYrTWN6d1RBeEhWcllXcU5BT2o5R3pEdFMKc3FwVWxQZDc1MDdhZmluRmZMVFpORnF4SDV4Y0VTUDNETVF1L21GUXNxMnYyeW9XTXY4dHluVGs2V3VSa0xVQgoxd1JXdUNSeXF1OCs3dEVzMHlCNklTODF0cDBGMHZPekpoakw4bTBxQWhLbUNKUFlGTUFZRFMvNXJuZDBCb3NLClhabHlyUUxtV0ZLRDRWL2Z3T0Vua1hMS3R3VnkrdlFJYXVEWjZTaVM1ODcxMURmdlhTTWFCU1lkL0hwZW1OYmQKVFBFPQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg=="
users:
- name: calico
user:
token: eyJhbGciOiJSUzI1NiIsImtpZCI6IkNRb0VCZDRGY21PQjBSYktnYzVuSkV6UVVVY0VvOE1Jd0NCOFRYbEQ5XzQifQ.eyJhdWQiOlsiaHR0cHM6Ly9rdWJlcm5ldGVzLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwiXSwiZXhwIjoxNjYwMDQ4NDYxLCJpYXQiOjE2NTk5NjIwNjEsImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwia3ViZXJuZXRlcy5pbyI6eyJuYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsInNlcnZpY2VhY2NvdW50Ijp7Im5hbWUiOiJjYWxpY28tbm9kZSIsInVpZCI6IjFhNDk4MWY1LWVmMWQtNDk5OC05YTA1LTk4OGU0MmMyN2Q4OCJ9fSwibmJmIjoxNjU5OTYyMDYxLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6a3ViZS1zeXN0ZW06Y2FsaWNvLW5vZGUifQ.Qa0KSAJGgNSA9lvND2Ivf9qxZsieI2r1FwCGvwzvXw_d4Nrw5WSygK-9t6tJKCnsXgCQSXijRBFPqiamJYZUx1dhgbPQp8KZF1seqtafCLRNnPS1TUrYJO_SRrp37UizmQzdOQOh7m_SGktcqdViZAyIGapjeMc7P8gU3v1HA93SflnR1keUo5rbXJjpaj2b6F0SBUCVyQnuORopD9cdCH-jIunyp4y_GhOtutV71ZmxcZeCdDqaBAE5OTnIwGYwz5yZqCOJZGqRxI74EX1B06iFgOQs8yksFiEpp5JdFUaCWNnxAeYo5cpH72l2XzF7rb7A2Ob0Rk96wJSSEMJq8g
contexts:
- name: calico-context
context:
cluster: local
user: calico
移走配置文件
参照StackOverflow的这篇:https://stackoverflow.com/questions/61672804/after-uninstalling-calico-new-pods-are-stuck-in-container-creating-state
以及K8S官网的这篇:https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/
需要删除这2个配置文件。笔者直接mv
移走备份。
root@zhiyong-ksp1:/home/zhiyong# mkdir -p /fileback/20220817
current-context: calico-contextroot@zhiyong-ksp1:/etc/cni/net.d# ll
总用量 16
drwxr-xr-x 2 kube root 4096 8月 8 10:05 ./
drwxr-xr-x 3 kube root 4096 8月 8 10:02 ../
-rw-r--r-- 1 root root 663 8月 8 19:23 10-calico.conflist
-rw------- 1 root root 2713 8月 8 20:34 calico-kubeconfig
root@zhiyong-ksp1:/etc/cni/net.d# mv ./10-calico.conflist /fileback/20220817
root@zhiyong-ksp1:/etc/cni/net.d# ll
总用量 12
drwxr-xr-x 2 kube root 4096 8月 17 01:43 ./
drwxr-xr-x 3 kube root 4096 8月 8 10:02 ../
-rw------- 1 root root 2713 8月 8 20:34 calico-kubeconfig
root@zhiyong-ksp1:/etc/cni/net.d# mv ./calico-kubeconfig /fileback/20220817
root@zhiyong-ksp1:/etc/cni/net.d# ll
总用量 8
drwxr-xr-x 2 kube root 4096 8月 17 01:43 ./
drwxr-xr-x 3 kube root 4096 8月 8 10:02 ../
reboot重启机器
重启的目的是刷新Calico的配置
。
重启后
root@zhiyong-ksp1:/home/zhiyong# kubectl get pod --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
istio-system istiod-1-11-2-54dd699c87-99krn 1/1 Running 0 65m
istio-system jaeger-operator-fccc48b86-vtcr8 0/1 ContainerCreating 0 44m
istio-system kiali-75c777bdf6-xhbq7 0/1 ContainerCreating 0 12s
istio-system kiali-operator-c459985f7-sttfs 1/1 Running 0 44m
kube-system calico-kube-controllers-f9f9bbcc9-2v7lm 1/1 Running 2 (2m54s ago) 8d
kube-system calico-node-4mgc7 1/1 Running 2 (2m54s ago) 8d
kube-system coredns-f657fccfd-2gw7h 1/1 Running 2 (2m54s ago) 8d
kube-system coredns-f657fccfd-pflwf 1/1 Running 2 (2m54s ago) 8d
kube-system kube-apiserver-zhiyong-ksp1 1/1 Running 2 (2m54s ago) 8d
kube-system kube-controller-manager-zhiyong-ksp1 1/1 Running 2 (2m54s ago) 8d
kube-system kube-proxy-cn68l 1/1 Running 2 (2m54s ago) 8d
kube-system kube-scheduler-zhiyong-ksp1 1/1 Running 2 (2m54s ago) 8d
kube-system nodelocaldns-96gtw 1/1 Running 2 (2m54s ago) 8d
kube-system openebs-localpv-provisioner-68db4d895d-p9527 1/1 Running 1 (2m54s ago) 8d
kube-system snapshot-controller-0 1/1 Running 2 (2m54s ago) 8d
kubesphere-controls-system default-http-backend-587748d6b4-ccg59 1/1 Running 2 (2m54s ago) 8d
kubesphere-controls-system kubectl-admin-5d588c455b-82cnk 1/1 Running 2 (2m54s ago) 8d
kubesphere-logging-system elasticsearch-logging-curator-elasticsearch-curator-2767784rhhk 0/1 ContainerCreating 0 50m
kubesphere-logging-system elasticsearch-logging-data-0 0/1 Pending 0 67m
kubesphere-logging-system elasticsearch-logging-discovery-0 0/1 Pending 0 67m
kubesphere-monitoring-system alertmanager-main-0 2/2 Running 4 (2m54s ago) 8d
kubesphere-monitoring-system kube-state-metrics-6d6786b44-bbb4f 3/3 Running 6 (2m54s ago) 8d
kubesphere-monitoring-system node-exporter-8sz74 2/2 Running 4 (2m54s ago) 8d
kubesphere-monitoring-system notification-manager-deployment-6f8c66ff88-pt4l8 2/2 Running 4 (2m54s ago) 8d
kubesphere-monitoring-system notification-manager-operator-6455b45546-nkmx8 2/2 Running 4 (2m54s ago) 8d
kubesphere-monitoring-system prometheus-k8s-0 2/2 Running 0 2m5s
kubesphere-monitoring-system prometheus-operator-66d997dccf-c968c 2/2 Running 4 (2m54s ago) 8d
kubesphere-system ks-apiserver-6b9bcb86f4-hsdzs 0/1 Unknown 1 8d
kubesphere-system ks-console-599c49d8f6-ngb6b 1/1 Running 2 (2m54s ago) 8d
kubesphere-system ks-controller-manager-66747fcddc-r7cpt 0/1 Unknown 1 8d
kubesphere-system ks-installer-5fd8bd46b8-dzhbb 1/1 Running 2 (2m54s ago) 8d
可以看到reboot
后,由于刷新了Calico的网络配置
,之前失败的Pod现在状态看起来比较正常。
kubesphere-logging-system elasticsearch-logging-data-0 0/1 Init:1/2 0 69m
kubesphere-logging-system elasticsearch-logging-discovery-0 0/1 Init:1/2 0 69m
并且这2个pod还在初始化。
此时还有一些Java进程在占用CPU:
多等一会儿:
root@zhiyong-ksp1:/home/zhiyong# kubectl get pod --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
istio-system istiod-1-11-2-54dd699c87-99krn 1/1 Running 0 72m
istio-system jaeger-collector-67cfc55477-7757f 1/1 Running 5 (3m41s ago) 6m58s
istio-system jaeger-operator-fccc48b86-vtcr8 1/1 Running 0 52m
istio-system jaeger-query-8497bdbfd7-csbts 2/2 Running 0 102s
istio-system kiali-75c777bdf6-xhbq7 1/1 Running 0 7m37s
istio-system kiali-operator-c459985f7-sttfs 1/1 Running 0 52m
kube-system calico-kube-controllers-f9f9bbcc9-2v7lm 1/1 Running 2 (10m ago) 8d
kube-system calico-node-4mgc7 1/1 Running 2 (10m ago) 8d
kube-system coredns-f657fccfd-2gw7h 1/1 Running 2 (10m ago) 8d
kube-system coredns-f657fccfd-pflwf 1/1 Running 2 (10m ago) 8d
kube-system kube-apiserver-zhiyong-ksp1 1/1 Running 2 (10m ago) 8d
kube-system kube-controller-manager-zhiyong-ksp1 1/1 Running 2 (10m ago) 8d
kube-system kube-proxy-cn68l 1/1 Running 2 (10m ago) 8d
kube-system kube-scheduler-zhiyong-ksp1 1/1 Running 2 (10m ago) 8d
kube-system nodelocaldns-96gtw 1/1 Running 2 (10m ago) 8d
kube-system openebs-localpv-provisioner-68db4d895d-p9527 1/1 Running 1 (10m ago) 8d
kube-system snapshot-controller-0 1/1 Running 2 (10m ago) 8d
kubesphere-controls-system default-http-backend-587748d6b4-ccg59 1/1 Running 2 (10m ago) 8d
kubesphere-controls-system kubectl-admin-5d588c455b-82cnk 1/1 Running 2 (10m ago) 8d
kubesphere-logging-system elasticsearch-logging-curator-elasticsearch-curator-2767784rhhk 0/1 Completed 0 57m
kubesphere-logging-system elasticsearch-logging-data-0 1/1 Running 0 74m
kubesphere-logging-system elasticsearch-logging-discovery-0 1/1 Running 0 74m
kubesphere-monitoring-system alertmanager-main-0 2/2 Running 4 (10m ago) 8d
kubesphere-monitoring-system kube-state-metrics-6d6786b44-bbb4f 3/3 Running 6 (10m ago) 8d
kubesphere-monitoring-system node-exporter-8sz74 2/2 Running 4 (10m ago) 8d
kubesphere-monitoring-system notification-manager-deployment-6f8c66ff88-pt4l8 2/2 Running 4 (10m ago) 8d
kubesphere-monitoring-system notification-manager-operator-6455b45546-nkmx8 2/2 Running 4 (10m ago) 8d
kubesphere-monitoring-system prometheus-k8s-0 2/2 Running 0 9m30s
kubesphere-monitoring-system prometheus-operator-66d997dccf-c968c 2/2 Running 4 (10m ago) 8d
kubesphere-system ks-apiserver-6b9bcb86f4-hsdzs 1/1 Running 2 (10m ago) 8d
kubesphere-system ks-console-599c49d8f6-ngb6b 1/1 Running 2 (10m ago) 8d
kubesphere-system ks-controller-manager-66747fcddc-r7cpt 1/1 Running 2 (10m ago) 8d
kubesphere-system ks-installer-5fd8bd46b8-dzhbb 1/1 Running 2 (10m ago) 8d
发现除了elasticsearch-logging-curator-elasticsearch-curator-2767784rhhk
这个pod是completed
,其余都Running
。
从web UI
中也可以看到已经全绿,没有报错。显然Calico
及Istio
、Prometheus
的pod已经全部修复完毕。
检查completed的pod状态
root@zhiyong-ksp1:/home/zhiyong# kubectl describe pod elasticsearch-logging-curator-elasticsearch-curator-2767784rhhk -n kubesphere-logging-system
Name: elasticsearch-logging-curator-elasticsearch-curator-2767784rhhk
Namespace: kubesphere-logging-system
Priority: 0
Node: zhiyong-ksp1/192.168.88.20
Start Time: Wed, 17 Aug 2022 01:00:00 +0800
Labels: app=elasticsearch-curator
controller-uid=d95b480d-abb9-42ed-9c1e-873127f96dc1
job-name=elasticsearch-logging-curator-elasticsearch-curator-27677820
release=elasticsearch-logging-curator
Annotations: cni.projectcalico.org/containerID: 584387ef1390db6f2d17ee0e2bc92951178cdb373c34544ecf150151253f4766
cni.projectcalico.org/podIP:
cni.projectcalico.org/podIPs:
Status: Succeeded
IP: 10.233.107.51
IPs:
IP: 10.233.107.51
Controlled By: Job/elasticsearch-logging-curator-elasticsearch-curator-27677820
Containers:
elasticsearch-curator:
Container ID: containerd://a2b7da0a34df9601acc062b10691dbbfad5bc22a838e18d9b95f3bd57633479e
Image: registry.cn-beijing.aliyuncs.com/kubesphereio/elasticsearch-curator:v5.7.6
Image ID: registry.cn-beijing.aliyuncs.com/kubesphereio/elasticsearch-curator@sha256:0fdc68b2a211f753238f9d54734b331141a9ade5bf31eef801ea0d056c9ab1c1
Port: <none>
Host Port: <none>
Command:
curator/curator
Args:
--config
/etc/es-curator/config.yml
/etc/es-curator/action_file.yml
State: Terminated
Reason: Completed
Exit Code: 0
Started: Wed, 17 Aug 2022 01:51:12 +0800
Finished: Wed, 17 Aug 2022 01:51:12 +0800
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/etc/es-curator from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-kvk6g (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: elasticsearch-logging-curator-elasticsearch-curator-config
Optional: false
kube-api-access-kvk6g:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 64m default-scheduler Successfully assigned kubesphere-logging-system/elasticsearch-logging-curator-elasticsearch-curator-2767784rhhk to zhiyong-ksp1
Warning FailedCreatePodSandBox 64m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "01c36acd52449dcec6b1bcac2a1f3c57577195fd915aef6ca8d1ff53ed9b5a35": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 64m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "c0754d78516e0b4a99993dd31a5608da1b424e558560ea2c66f98856928604a9": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 64m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "0dc2bab36922b4a73c35f3b35ffd4ef46f825fd5b053454c47665d028cd89d61": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 63m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "cc132b632133dbc2ef32eed74bbfb9e64923530467ccd085d67907542a4cfea8": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 63m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "8b2f3f1f0d0ebac8a0b43025d22de1c0e1b55edbc72fec6930477061f0b46bbd": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 63m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "d6ea17333ad9c2d549f439a25b83fdb8b7338f8e4a00e5fd7adbbab1bc7c78e2": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 63m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "d6cafd828a3fa61977ca2423bf953b7aab8f114af042fb272e7172d7f55078a6": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 62m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a555a64631dab504aeacecd828e512b84a5396f0c779b42a1398518740c858d0": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 62m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a204a8cd54ce4aa97875269c3475c48266de84a214633ee9eaca8b505df52735": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning FailedCreatePodSandBox 24m (x175 over 62m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "269a0272273b83edeb22c573c3bceeeb40d48bef4cafd0b91da1aa6617b1f3d4": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
Warning NetworkNotReady 19m (x55 over 21m) kubelet network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Warning NetworkNotReady 16m (x5 over 16m) kubelet network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Warning FailedMount 16m (x4 over 16m) kubelet MountVolume.SetUp failed for volume "kube-api-access-kvk6g" : object "kubesphere-logging-system"/"kube-root-ca.crt" not registered
Warning FailedMount 16m (x5 over 16m) kubelet MountVolume.SetUp failed for volume "config-volume" : object "kubesphere-logging-system"/"elasticsearch-logging-curator-elasticsearch-curator-config" not registered
Normal Pulling 16m kubelet Pulling image "registry.cn-beijing.aliyuncs.com/kubesphereio/elasticsearch-curator:v5.7.6"
Normal Pulled 13m kubelet Successfully pulled image "registry.cn-beijing.aliyuncs.com/kubesphereio/elasticsearch-curator:v5.7.6" in 3m3.099253003s
Normal Created 13m kubelet Created container elasticsearch-curator
Normal Started 13m kubelet Started container elasticsearch-curator
可以看出这个pod失败了很久之后,终于成功从registry.cn-beijing.aliyuncs.com/kubesphereio/elasticsearch-curator:v5.7.6
拉取到镜像,并且创建及启动了容器elasticsearch-curator
。之后其完成了历史使命,正常退出。
至此,KubeSphere已经成功启动了服务网格Istio
。