涉及软件版本:
- 服务器芯片:Kunpeng-920
- 操作系统:麒麟 V10 SP2 aarch64
- KubeSphere:v3.4.1
- Kubernetes:v1.23.15
- Containerd:v20.10.8
- KubeKey: v3.0.13
一、安装规划:
本次部署是测试k8s在arm架构是否可行,所以使用一台机器部署,并没有规划
二、确认系统配置
- 操作系统类型
[root@baode104 ~]# cat /etc/os-release
NAME="Kylin Linux Advanced Server"
VERSION="V10 (Sword)"
ID="kylin"
VERSION_ID="V10"
PRETTY_NAME="Kylin Linux Advanced Server V10 (Sword)"
ANSI_COLOR="0;31"
- 操作系统内核
root@baode104 ~]# uname -a
Linux baode104 4.19.90-24.4.v2101.ky10.aarch64 #1 SMP Mon May 24 14:45:37 CST 2021 aarch64 aarch64 aarch64 GNU/Linux
- 操作系统版本
注意SP2这个版本,安装的时候有bug
[root@baode104 ~]# cat /etc/.productinfo
Kylin Linux Advanced Server
release V10 (SP2) /(Sword)-aarch64-Build09/20210524
- 服务器cpu
[root@baode104 ~]# lscpu
架构: aarch64
CPU 运行模式: 64-bit
字节序: Little Endian
CPU: 64
在线 CPU 列表: 0-63
每个核的线程数: 1
每个座的核数: 32
座: 2
NUMA 节点: 2
厂商 ID: HiSilicon
型号: 0
型号名称: Kunpeng-920
步进: 0x1
BogoMIPS: 200.00
L1d 缓存: 4 MiB
L1i 缓存: 4 MiB
L2 缓存: 32 MiB
L3 缓存: 64 MiB
NUMA 节点0 CPU: 0-31
NUMA 节点1 CPU: 32-63
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1: Mitigation; __user pointer sanitization
Vulnerability Spectre v2: Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
标记: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm
三、环境准备
3.1 配置服务器时区
timedatectl set-timezone Asia/Shanghai
3.2 禁用 SELinux
# 使用 sed 修改配置文件,实现彻底的禁用
sed -i 's/^SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config
# 使用命令,实现临时禁用,这一步其实不做也行,KubeKey 会自动配置
setenforce 0
3.3 关闭swap:
swapoff -a
sed -ri 's/.swap./#&/' /etc/fstab
四. 安装k8s和kubesphere
4.1 下载kk
cd /opt/module/kubekey
export KKZONE=cn
curl -sfL https://get-kk.kubesphere.io | sh -
注:如果下载不动请自行魔法去下载:https://kubernetes.pek3b.qingstor.com/kubekey/releases/download/v3.0.13/kubekey-v3.0.13-linux-arm64.tar.gz ,这这个链接交给迅雷很快就下载完了 ,此外, 注意是arm版的kk: kubekey-v3.0.13-linux-arm64.tar.gz
4.2 创建部署配置文件
创建集群配置文件,选择 KubeSphere v3.4.2 和 Kubernetes v1.23.15。因此,指定配置文件名称为 kubesphere-v341-v12315.yaml,如果不指定,默认的文件名为 config-sample.yaml。
./kk create config -f kubesphere-v341-v12315.yaml --with-kubernetes v1.23.15 --with-kubesphere v3.4.1
如果不想安装kubesphere,那么把–with-kubesphere v3.4.1去掉即可
修改配置文件:
主要修改 kind: Cluster 和 kind: ClusterConfiguration 两小节的相关配置,
- hosts:对于arm版本,一定要加上 arch: arm64
- storage.openebs.basePath:新增配置,指定默认存储路径为 /DATA1/openebs/local ,看你哪个磁盘用来存储你就放哪个目录
- ClusterConfiguration** :本次安装为测试,尽量最小化安装,所以可插拔插件处理etcd监控其他都不开启:**启用 etcd 监控
etcd:
monitoring: true # 将 "false" 更改为 "true"
endpointIps: localhost
port: 2379
tlsEnable: true
启动网络策略、容器组ip池
network:
networkpolicy:
enabled: true
ippool:
type: calico
topology:
type: none
示例如下:
apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
name: sample
spec:
hosts:
- {name: baode104, address: 10.28.20.104, internalAddress: 10.28.20.104, user: root, password: "Xugu@2023", arch: arm64}
roleGroups:
etcd:
- baode104
control-plane:
- baode104
worker:
- baode104
controlPlaneEndpoint:
## Internal loadbalancer for apiservers
# internalLoadbalancer: haproxy
domain: xugu.kubesphere3.local
address: ""
port: 6443
kubernetes:
version: v1.23.15
clusterName: cluster3.local
autoRenewCerts: true
containerManager: docker
etcd:
type: kubekey
network:
plugin: calico
kubePodsCIDR: 10.233.64.0/18
kubeServiceCIDR: 10.233.0.0/18
## multus support. https://github.com/k8snetworkplumbingwg/multus-cni
multusCNI:
enabled: false
storage:
openebs:
basePath: /DATA1/openebs/local # base path of the local PV provisioner
registry:
privateRegistry: ""
namespaceOverride: ""
registryMirrors: []
insecureRegistries: []
addons: []
---
apiVersion: installer.kubesphere.io/v1alpha1
kind: ClusterConfiguration
metadata:
name: ks-installer
namespace: kubesphere-system
labels:
version: v3.4.1
spec:
persistence:
storageClass: ""
authentication:
jwtSecret: ""
local_registry: ""
# dev_tag: ""
etcd:
monitoring: true
endpointIps: localhost
port: 2379
tlsEnable: true
common:
core:
console:
enableMultiLogin: true
port: 30880
type: NodePort
# apiserver:
# resources: {}
# controllerManager:
# resources: {}
redis:
enabled: false
enableHA: false
volumeSize: 2Gi
openldap:
enabled: false
volumeSize: 2Gi
minio:
volumeSize: 20Gi
monitoring:
# type: external
endpoint: http://prometheus-operated.kubesphere-monitoring-system.svc:9090
GPUMonitoring:
enabled: false
gpu:
kinds:
- resourceName: "nvidia.com/gpu"
resourceType: "GPU"
default: true
es:
# master:
# volumeSize: 4Gi
# replicas: 1
# resources: {}
# data:
# volumeSize: 20Gi
# replicas: 1
# resources: {}
enabled: false
logMaxAge: 7
elkPrefix: logstash
basicAuth:
enabled: false
username: ""
password: ""
externalElasticsearchHost: ""
externalElasticsearchPort: ""
opensearch:
# master:
# volumeSize: 4Gi
# replicas: 1
# resources: {}
# data:
# volumeSize: 20Gi
# replicas: 1
# resources: {}
enabled: true
logMaxAge: 7
opensearchPrefix: whizard
basicAuth:
enabled: true
username: "admin"
password: "admin"
externalOpensearchHost: ""
externalOpensearchPort: ""
dashboard:
enabled: false
alerting:
enabled: false
# thanosruler:
# replicas: 1
# resources: {}
auditing:
enabled: false
# operator:
# resources: {}
# webhook:
# resources: {}
devops:
enabled: false
jenkinsCpuReq: 0.5
jenkinsCpuLim: 1
jenkinsMemoryReq: 4Gi
jenkinsMemoryLim: 4Gi
jenkinsVolumeSize: 16Gi
events:
enabled: false
# operator:
# resources: {}
# exporter:
# resources: {}
ruler:
enabled: true
replicas: 2
# resources: {}
logging:
enabled: false
logsidecar:
enabled: true
replicas: 2
# resources: {}
metrics_server:
enabled: false
monitoring:
storageClass: ""
node_exporter:
port: 9100
# resources: {}
# kube_rbac_proxy:
# resources: {}
# kube_state_metrics:
# resources: {}
# prometheus:
# replicas: 1
# volumeSize: 20Gi
# resources: {}
# operator:
# resources: {}
# alertmanager:
# replicas: 1
# resources: {}
# notification_manager:
# resources: {}
# operator:
# resources: {}
# proxy:
# resources: {}
gpu:
nvidia_dcgm_exporter:
enabled: false
# resources: {}
multicluster:
clusterRole: none
network:
networkpolicy:
enabled: true
ippool:
type: calico
topology:
type: none
openpitrix:
store:
enabled: false
servicemesh:
enabled: false
istio:
components:
ingressGateways:
- name: istio-ingressgateway
enabled: false
cni:
enabled: false
edgeruntime:
enabled: false
kubeedge:
enabled: false
cloudCore:
cloudHub:
advertiseAddress:
- ""
service:
cloudhubNodePort: "30000"
cloudhubQuicNodePort: "30001"
cloudhubHttpsNodePort: "30002"
cloudstreamNodePort: "30003"
tunnelNodePort: "30004"
# resources: {}
# hostNetWork: false
iptables-manager:
enabled: true
mode: "external"
# resources: {}
# edgeService:
# resources: {}
gatekeeper:
enabled: false
# controller_manager:
# resources: {}
# audit:
# resources: {}
terminal:
timeout: 600
4.3 安装部署
./kk create cluster -f kubesphere-v341-v12315.yaml
安装日志比较多,我只截取一部分,就是要关注的下载包是否是arm版的
This is a simple check of your environment.
Before installation, ensure that your machines meet all requirements specified at
https://github.com/kubesphere/kubekey#requirements-and-recommendations
Continue this installation? [yes/no]: y
10:14:27 CST success: [LocalHost]
10:14:27 CST [NodeBinariesModule] Download installation binaries
10:14:27 CST message: [localhost]
downloading arm64 kubeadm v1.23.15 ...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 42.7M 100 42.7M 0 0 1022k 0 0:00:42 0:00:42 --:--:-- 1181k
10:15:10 CST message: [localhost]
downloading arm64 kubelet v1.23.15 ...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 116M 100 116M 0 0 1016k 0 0:01:57 0:01:57 --:--:-- 1031k
当界面出现
Please wait for the installation to complete: <---<<
说明k8s安装完了,开始安装kubesphere,这时候我们要新起一个ssh连接,使用 kubectl get pod -A来观察实际安装情况,根据报错按需解决,如果有报错请可以参考我下面第五章解决的思路。
部署完毕的界面:
10:27:37 CST success: [baode104]
Please wait for the installation to complete: >>--->
Please wait for the installation to complete: <---<<
#####################################################
### Welcome to KubeSphere! ###
#####################################################
Console: http://10.28.20.104:30880
Account: admin
Password: P@88w0rd
NOTES:
1. After you log into the console, please check the
monitoring status of service components in
"Cluster Management". If any service is not
ready, please wait patiently until all components
are up and running.
2. Please change the default password after login.
#####################################################
https://kubesphere.io 2023-12-06 11:35:08
#####################################################
11:35:10 CST success: [baode104]
11:35:10 CST Pipeline[CreateClusterPipeline] execute successfully
Installation is complete.
Please check the result using the command:
kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l 'app in (ks-install, ks-installer)' -o jsonpath='{.items[0].metadata.name}') -f
4.4 验证k8s是否可用:
kubectl create deployment nginx --image=nginx:alpine --replicas=2
等待一分钟,因为要拉镜像:
[root@baode104 DATA1]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-65778599-dxjn2 1/1 Running 0 80s 10.233.83.65 baode104 <none> <none>
nginx-65778599-ls6cb 1/1 Running 0 80s 10.233.83.64 baode104 <none> <none>
查看镜像的架构:
[root@baode104 DATA1]# docker inspect nginx:alpine |grep Architecture
"Architecture": "arm64",
验证 Nginx Service:
kubectl create service nodeport nginx --tcp=80:80
[root@baode104 DATA1]# kubectl get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kubernetes ClusterIP 10.233.0.1 <none> 443/TCP 5h18m <none>
nginx NodePort 10.233.37.30 <none> 80:31023/TCP 8s app=nginx
界面输入 10.28.20.104:31023能访问即可
五.异常解决:
5.1 coredns容器一直CrashLoopBackOff
问题发现:
[root@baode104 ~]# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-67fbf89557-8wf29 1/1 Running 0 52m
kube-system calico-node-hfnm5 1/1 Running 0 52m
kube-system coredns-757cd945b-mf6db 0/1 CrashLoopBackOff 10 (18m ago) 44m
kube-system coredns-757cd945b-shgxt 0/1 CrashLoopBackOff 8 (14m ago) 30m
尝试解决:
1.查看日志
kubectl logs coredns-757cd945b-mf6db -n kube-system
结果什么也没有。。应该是容器没有正常启动,日志也没有
2.查看容器事件记录
kubectl describe pod coredns-757cd945b-mf6db -n kube-system
发现有一处报错:
Last State: Terminated
Reason: ContainerCannotRun
Message: failed to create task for container: failed to create shim task: OCI runtime create failed: container_linux.go:318: starting container process caused "process_linux.go:281: applying cgroup configuration for process caused \"No such device or address\"": unknown
通过查找资料发现是kylinv10的SP2有bug导致的,参考文章:
实际解决:
1.修改/etc/docker/daemon.json的systemd为cgroupfs
vim /etc/docker/daemon.json
{
"log-opts": {
"max-size": "5m",
"max-file":"3"
},
"exec-opts": ["native.cgroupdriver=cgroupfs"]
}
2.重启docker
systemctl daemon-reload
systemctl restart docker.service
3.如果pod还是一直CrashLoopBackOff,那么删除pod让k8s自动重建
kubectl delete pod coredns-757cd945b-mf6db -n kubesphere-monitoring-system
4.理论上应该不用重启kubelet的,但是我可能是前面的步骤没对,手贱重启了一下,然后报错了:
使用journalctl -xefu kubelet查看启动日志
12月 06 11:21:33 baode104 kubelet[37119]: E1206 11:21:33.030651 37119 server.go:302] "Failed to run kubelet" err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: \"systemd\" is different from docker cgroup driver: \"cgroupfs\""
12月 06 11:21:33 baode104 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
-- Subject: Unit process exited
那么就修改/var/lib/kubelet/config.yaml的cgroupDriver为cgroupfs
然后重启kubelet
systemctl restart kubelet.service
5.2 kubesphere的prometheus容器一直是pending状态
问题发现: 查看事件信息发现报错:init-pvc已经存在
使用kubelet get pods -A发现确实有这么一个pod,可能是我多次重启docker和k8s导致的:
kube-system init-pvc-708f0f33-467e-4630-8ca8-35fce0a50f8d 0/1 Completed 0 17m
解决方案:删除该pod,然后重新创建
5.3 http-backend异常
- 获取适配的 ARM 版镜像(第三方相同版本 ARM 镜像)
crictl pull mirrorgooglecontainers/defaultbackend-arm64:1.4
- 镜像重新打 tag(为了保持镜像名称风格一致)
docker tag docker.io/mirrorgooglecontainers/defaultbackend-arm64:1.4 registry.cn-beijing.aliyuncs.com/kubesphereio/defaultbackend-arm64:1.4
- 重新部署组件
# 修改 Deployment 使用的镜像,并重启
kubectl set image deployment/default-http-backend default-http-backend=registry.cn-beijing.aliyuncs.com/kubesphereio/defaultbackend-arm64:1.4 -n kubesphere-controls-system
kubectl rollout restart deployment/default-http-backend -n kubesphere-controls-system
- 验证新的 Pod 创建并启动成功
kubectl get pods -o wide -n kubesphere-controls-system | grep default-http-backend