涉及软件版本:

  • 服务器芯片:Kunpeng-920
  • 操作系统:麒麟 V10 SP2 aarch64
  • KubeSphere:v3.4.1
  • Kubernetes:v1.23.15
  • Containerd:v20.10.8
  • KubeKey: v3.0.13

一、安装规划:

本次部署是测试k8s在arm架构是否可行,所以使用一台机器部署,并没有规划

二、确认系统配置

  • 操作系统类型
[root@baode104 ~]#  cat /etc/os-release
NAME="Kylin Linux Advanced Server"
VERSION="V10 (Sword)"
ID="kylin"
VERSION_ID="V10"
PRETTY_NAME="Kylin Linux Advanced Server V10 (Sword)"
ANSI_COLOR="0;31"
  • 操作系统内核
root@baode104 ~]# uname -a 
Linux baode104 4.19.90-24.4.v2101.ky10.aarch64 #1 SMP Mon May 24 14:45:37 CST 2021 aarch64 aarch64 aarch64 GNU/Linux
  • 操作系统版本

注意SP2这个版本,安装的时候有bug

[root@baode104 ~]# cat /etc/.productinfo
Kylin Linux Advanced Server
release V10 (SP2) /(Sword)-aarch64-Build09/20210524
  • 服务器cpu
[root@baode104 ~]# lscpu 
架构:                           aarch64
CPU 运行模式:                   64-bit
字节序:                         Little Endian
CPU:                             64
在线 CPU 列表:                  0-63
每个核的线程数:                 1
每个座的核数:                   32
座:                             2
NUMA 节点:                      2
厂商 ID:                        HiSilicon
型号:                           0
型号名称:                       Kunpeng-920
步进:                           0x1
BogoMIPS:                       200.00
L1d 缓存:                       4 MiB
L1i 缓存:                       4 MiB
L2 缓存:                        32 MiB
L3 缓存:                        64 MiB
NUMA 节点0 CPU:                 0-31
NUMA 节点1 CPU:                 32-63
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
标记:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm

三、环境准备

3.1 配置服务器时区

timedatectl set-timezone Asia/Shanghai

3.2 禁用 SELinux

# 使用 sed 修改配置文件,实现彻底的禁用
sed -i 's/^SELINUX=enforcing/SELINUX=disabled/' /etc/selinux/config

# 使用命令,实现临时禁用,这一步其实不做也行,KubeKey 会自动配置
setenforce 0

3.3 关闭swap:

swapoff -a
sed -ri 's/.swap./#&/' /etc/fstab

四. 安装k8s和kubesphere

4.1 下载kk

cd /opt/module/kubekey
export KKZONE=cn
curl -sfL https://get-kk.kubesphere.io | sh -

注:如果下载不动请自行魔法去下载:https://kubernetes.pek3b.qingstor.com/kubekey/releases/download/v3.0.13/kubekey-v3.0.13-linux-arm64.tar.gz ,这这个链接交给迅雷很快就下载完了 ,此外, 注意是arm版的kk: kubekey-v3.0.13-linux-arm64.tar.gz

4.2 创建部署配置文件

创建集群配置文件,选择 KubeSphere v3.4.2 和 Kubernetes v1.23.15。因此,指定配置文件名称为 kubesphere-v341-v12315.yaml,如果不指定,默认的文件名为 config-sample.yaml。

./kk create config -f kubesphere-v341-v12315.yaml --with-kubernetes v1.23.15 --with-kubesphere v3.4.1

如果不想安装kubesphere,那么把–with-kubesphere v3.4.1去掉即可

修改配置文件:

主要修改 kind: Cluster 和 kind: ClusterConfiguration 两小节的相关配置,

  • hosts:对于arm版本,一定要加上 arch: arm64
  • storage.openebs.basePath:新增配置,指定默认存储路径为 /DATA1/openebs/local ,看你哪个磁盘用来存储你就放哪个目录
  • ClusterConfiguration** :本次安装为测试,尽量最小化安装,所以可插拔插件处理etcd监控其他都不开启:**启用 etcd 监控
etcd:
    monitoring: true # 将 "false" 更改为 "true"
    endpointIps: localhost
    port: 2379
    tlsEnable: true

启动网络策略、容器组ip池

network:
    networkpolicy:
      enabled: true
    ippool:
      type: calico
    topology:
      type: none

示例如下:

apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
  name: sample
spec:
  hosts:
  - {name: baode104, address: 10.28.20.104, internalAddress: 10.28.20.104, user: root, password: "Xugu@2023", arch: arm64}
  roleGroups:
    etcd:
    - baode104
    control-plane: 
    - baode104
    worker:
    - baode104
  controlPlaneEndpoint:
    ## Internal loadbalancer for apiservers 
    # internalLoadbalancer: haproxy

    domain: xugu.kubesphere3.local
    address: ""
    port: 6443
  kubernetes:
    version: v1.23.15
    clusterName: cluster3.local
    autoRenewCerts: true
    containerManager: docker
  etcd:
    type: kubekey
  network:
    plugin: calico
    kubePodsCIDR: 10.233.64.0/18
    kubeServiceCIDR: 10.233.0.0/18
    ## multus support. https://github.com/k8snetworkplumbingwg/multus-cni
    multusCNI:
      enabled: false
  storage:
    openebs:
      basePath: /DATA1/openebs/local # base path of the local PV provisioner
  registry:
    privateRegistry: ""
    namespaceOverride: ""
    registryMirrors: []
    insecureRegistries: []
  addons: []



---
apiVersion: installer.kubesphere.io/v1alpha1
kind: ClusterConfiguration
metadata:
  name: ks-installer
  namespace: kubesphere-system
  labels:
    version: v3.4.1
spec:
  persistence:
    storageClass: ""
  authentication:
    jwtSecret: ""
  local_registry: ""
  # dev_tag: ""
  etcd:
    monitoring: true
    endpointIps: localhost
    port: 2379
    tlsEnable: true
  common:
    core:
      console:
        enableMultiLogin: true
        port: 30880
        type: NodePort
    # apiserver:
    #  resources: {}
    # controllerManager:
    #  resources: {}
    redis:
      enabled: false
      enableHA: false
      volumeSize: 2Gi
    openldap:
      enabled: false
      volumeSize: 2Gi
    minio:
      volumeSize: 20Gi
    monitoring:
      # type: external
      endpoint: http://prometheus-operated.kubesphere-monitoring-system.svc:9090
      GPUMonitoring:
        enabled: false
    gpu:
      kinds:
      - resourceName: "nvidia.com/gpu"
        resourceType: "GPU"
        default: true
    es:
      # master:
      #   volumeSize: 4Gi
      #   replicas: 1
      #   resources: {}
      # data:
      #   volumeSize: 20Gi
      #   replicas: 1
      #   resources: {}
      enabled: false
      logMaxAge: 7
      elkPrefix: logstash
      basicAuth:
        enabled: false
        username: ""
        password: ""
      externalElasticsearchHost: ""
      externalElasticsearchPort: ""
    opensearch:
      # master:
      #   volumeSize: 4Gi
      #   replicas: 1
      #   resources: {}
      # data:
      #   volumeSize: 20Gi
      #   replicas: 1
      #   resources: {}
      enabled: true
      logMaxAge: 7
      opensearchPrefix: whizard
      basicAuth:
        enabled: true
        username: "admin"
        password: "admin"
      externalOpensearchHost: ""
      externalOpensearchPort: ""
      dashboard:
        enabled: false
  alerting:
    enabled: false
    # thanosruler:
    #   replicas: 1
    #   resources: {}
  auditing:
    enabled: false
    # operator:
    #   resources: {}
    # webhook:
    #   resources: {}
  devops:
    enabled: false
    jenkinsCpuReq: 0.5
    jenkinsCpuLim: 1
    jenkinsMemoryReq: 4Gi
    jenkinsMemoryLim: 4Gi
    jenkinsVolumeSize: 16Gi
  events:
    enabled: false
    # operator:
    #   resources: {}
    # exporter:
    #   resources: {}
    ruler:
      enabled: true
      replicas: 2
    #   resources: {}
  logging:
    enabled: false
    logsidecar:
      enabled: true
      replicas: 2
      # resources: {}
  metrics_server:
    enabled: false
  monitoring:
    storageClass: ""
    node_exporter:
      port: 9100
      # resources: {}
    # kube_rbac_proxy:
    #   resources: {}
    # kube_state_metrics:
    #   resources: {}
    # prometheus:
    #   replicas: 1
    #   volumeSize: 20Gi
    #   resources: {}
    #   operator:
    #     resources: {}
    # alertmanager:
    #   replicas: 1
    #   resources: {}
    # notification_manager:
    #   resources: {}
    #   operator:
    #     resources: {}
    #   proxy:
    #     resources: {}
    gpu:
      nvidia_dcgm_exporter:
        enabled: false
        # resources: {}
  multicluster:
    clusterRole: none
  network:
    networkpolicy:
      enabled: true
    ippool:
      type: calico
    topology:
      type: none
  openpitrix:
    store:
      enabled: false
  servicemesh:
    enabled: false
    istio:
      components:
        ingressGateways:
        - name: istio-ingressgateway
          enabled: false
        cni:
          enabled: false
  edgeruntime:
    enabled: false
    kubeedge:
      enabled: false
      cloudCore:
        cloudHub:
          advertiseAddress:
            - ""
        service:
          cloudhubNodePort: "30000"
          cloudhubQuicNodePort: "30001"
          cloudhubHttpsNodePort: "30002"
          cloudstreamNodePort: "30003"
          tunnelNodePort: "30004"
        # resources: {}
        # hostNetWork: false
      iptables-manager:
        enabled: true
        mode: "external"
        # resources: {}
      # edgeService:
      #   resources: {}
  gatekeeper:
    enabled: false
    # controller_manager:
    #   resources: {}
    # audit:
    #   resources: {}
  terminal:
    timeout: 600

4.3 安装部署

./kk create cluster -f kubesphere-v341-v12315.yaml

安装日志比较多,我只截取一部分,就是要关注的下载包是否是arm版的

This is a simple check of your environment.
Before installation, ensure that your machines meet all requirements specified at
https://github.com/kubesphere/kubekey#requirements-and-recommendations

Continue this installation? [yes/no]: y
10:14:27 CST success: [LocalHost]
10:14:27 CST [NodeBinariesModule] Download installation binaries
10:14:27 CST message: [localhost]
downloading arm64 kubeadm v1.23.15 ...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 42.7M  100 42.7M    0     0  1022k      0  0:00:42  0:00:42 --:--:-- 1181k
10:15:10 CST message: [localhost]
downloading arm64 kubelet v1.23.15 ...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  116M  100  116M    0     0  1016k      0  0:01:57  0:01:57 --:--:-- 1031k

当界面出现

Please wait for the installation to complete:  <---<<

说明k8s安装完了,开始安装kubesphere,这时候我们要新起一个ssh连接,使用 kubectl get pod -A来观察实际安装情况,根据报错按需解决,如果有报错请可以参考我下面第五章解决的思路。

部署完毕的界面:

10:27:37 CST success: [baode104]
Please wait for the installation to complete:    >>---> 
Please wait for the installation to complete:  <---<< 
#####################################################
###              Welcome to KubeSphere!           ###
#####################################################

Console: http://10.28.20.104:30880
Account: admin
Password: P@88w0rd
NOTES:
  1. After you log into the console, please check the
     monitoring status of service components in
     "Cluster Management". If any service is not
     ready, please wait patiently until all components 
     are up and running.
  2. Please change the default password after login.

#####################################################
https://kubesphere.io             2023-12-06 11:35:08
#####################################################
11:35:10 CST success: [baode104]
11:35:10 CST Pipeline[CreateClusterPipeline] execute successfully
Installation is complete.

Please check the result using the command:

        kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l 'app in (ks-install, ks-installer)' -o jsonpath='{.items[0].metadata.name}') -f

4.4 验证k8s是否可用:

kubectl create deployment nginx --image=nginx:alpine --replicas=2

等待一分钟,因为要拉镜像:

[root@baode104 DATA1]# kubectl get pods -o wide
NAME                   READY   STATUS    RESTARTS   AGE   IP             NODE       NOMINATED NODE   READINESS GATES
nginx-65778599-dxjn2   1/1     Running   0          80s   10.233.83.65   baode104   <none>           <none>
nginx-65778599-ls6cb   1/1     Running   0          80s   10.233.83.64   baode104   <none>           <none>

查看镜像的架构:

[root@baode104 DATA1]# docker inspect nginx:alpine |grep Architecture
        "Architecture": "arm64",

验证 Nginx Service:

kubectl create service nodeport nginx --tcp=80:80



[root@baode104 DATA1]# kubectl get svc -o wide
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE     SELECTOR
kubernetes   ClusterIP   10.233.0.1     <none>        443/TCP        5h18m   <none>
nginx        NodePort    10.233.37.30   <none>        80:31023/TCP   8s      app=nginx

界面输入 10.28.20.104:31023能访问即可

五.异常解决:

5.1 coredns容器一直CrashLoopBackOff

问题发现:

[root@baode104 ~]# kubectl get pods -A
NAMESPACE                      NAME                                               READY   STATUS             RESTARTS       AGE
kube-system                    calico-kube-controllers-67fbf89557-8wf29           1/1     Running            0              52m
kube-system                    calico-node-hfnm5                                  1/1     Running            0              52m
kube-system                    coredns-757cd945b-mf6db                            0/1     CrashLoopBackOff   10 (18m ago)   44m
kube-system                    coredns-757cd945b-shgxt                            0/1     CrashLoopBackOff   8 (14m ago)    30m

尝试解决:

1.查看日志

kubectl logs coredns-757cd945b-mf6db  -n kube-system

结果什么也没有。。应该是容器没有正常启动,日志也没有

2.查看容器事件记录

kubectl describe pod coredns-757cd945b-mf6db -n kube-system

发现有一处报错:

Last State:     Terminated
      Reason:       ContainerCannotRun
      Message:      failed to create task for container: failed to create shim task: OCI runtime create failed: container_linux.go:318: starting container process caused "process_linux.go:281: applying cgroup configuration for process caused \"No such device or address\"": unknown

通过查找资料发现是kylinv10的SP2有bug导致的,参考文章:

实际解决:

1.修改/etc/docker/daemon.json的systemd为cgroupfs

vim /etc/docker/daemon.json
{
  "log-opts": {
    "max-size": "5m",
    "max-file":"3"
  },
  "exec-opts": ["native.cgroupdriver=cgroupfs"]
}

2.重启docker

systemctl daemon-reload
systemctl restart docker.service

3.如果pod还是一直CrashLoopBackOff,那么删除pod让k8s自动重建

kubectl delete pod coredns-757cd945b-mf6db -n kubesphere-monitoring-system

4.理论上应该不用重启kubelet的,但是我可能是前面的步骤没对,手贱重启了一下,然后报错了:

使用journalctl -xefu kubelet查看启动日志

12月 06 11:21:33 baode104 kubelet[37119]: E1206 11:21:33.030651   37119 server.go:302] "Failed to run kubelet" err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: \"systemd\" is different from docker cgroup driver: \"cgroupfs\""
12月 06 11:21:33 baode104 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
-- Subject: Unit process exited

那么就修改/var/lib/kubelet/config.yaml的cgroupDriver为cgroupfs

然后重启kubelet

systemctl restart kubelet.service

5.2 kubesphere的prometheus容器一直是pending状态

问题发现: 查看事件信息发现报错:init-pvc已经存在

使用kubelet get pods -A发现确实有这么一个pod,可能是我多次重启docker和k8s导致的:

kube-system init-pvc-708f0f33-467e-4630-8ca8-35fce0a50f8d 0/1 Completed 0 17m

解决方案:删除该pod,然后重新创建

5.3 http-backend异常

  • 获取适配的 ARM 版镜像(第三方相同版本 ARM 镜像)
crictl pull mirrorgooglecontainers/defaultbackend-arm64:1.4
  • 镜像重新打 tag(为了保持镜像名称风格一致)
docker tag docker.io/mirrorgooglecontainers/defaultbackend-arm64:1.4 registry.cn-beijing.aliyuncs.com/kubesphereio/defaultbackend-arm64:1.4
  • 重新部署组件
# 修改 Deployment 使用的镜像,并重启
kubectl set image deployment/default-http-backend default-http-backend=registry.cn-beijing.aliyuncs.com/kubesphereio/defaultbackend-arm64:1.4 -n kubesphere-controls-system
kubectl rollout restart deployment/default-http-backend -n kubesphere-controls-system
  • 验证新的 Pod 创建并启动成功
kubectl get pods -o wide -n kubesphere-controls-system | grep default-http-backend