摘要:etcd 是k8s集群最重要的组件,用来存储k8s的所有服务信息, etcd 挂了,集群就挂了,我们这里把etcd部署在master三台节点上做高可用,etcd集群采用raft算法选举Leader, 由于Raft算法在做决策时需要多数节点的投票,所以etcd一般部署集群推荐奇数个节点,推荐的数量为3、5或者7个节点构成一个集群。

官方地址 https://github.com/coreos/etcd/releases

1)下载etcd二进制文件


etcd命令为下载的二进制文件,解压后复制到指定目录即可


[root@k8s-master01 ~]# cd k8s/
[root@k8s-master01 k8s]# wget https://github.com/etcd-io/etcd/releases/download/v3.3.12/etcd-v3.3.12-linux-amd64.tar.gz
[root@k8s-master01 k8s]# tar -xf etcd-v3.3.12-linux-amd64.tar.gz  
[root@k8s-master01 k8s]# cd etcd-v3.3.12-linux-amd64  ##有2个文件,etcdctl是操作etcd的命令
##把etcd二进制文件传输到三个master节点
[root@k8s-master01 ~]# ansible k8s-master -m copy -a 'src=/root/k8s/etcd-v3.3.12-linux-amd64/etcd dest=/usr/local/bin/ mode=0755'
[root@k8s-master01 ~]# ansible k8s-master -m copy -a 'src=/root/k8s/etcd-v3.3.12-linux-amd64/etcdctl dest=/usr/local/bin/ mode=0755'
说明:若是不用ansible,可以直接用scp把两个文件传输到三个master节点的/usr/local/bin/目录下

2)创建etcd证书请求模板文件

[root@k8s-master01 ~]# vim /opt/k8s/certs/etcd-csr.json  ##证书请求文件
{
  "CN": "etcd",
  "hosts": [
    "127.0.0.1",
    "10.10.0.18",
    "10.10.0.19",
    "10.10.0.20"
  ],
  "key": {
    "algo": "rsa",
    "size": 2048
  },
  "names": [
    {
      "C": "CN",
      "ST": "ShangHai",
      "L": "ShangHai",
      "O": "k8s",
      "OU": "System"
    }
  ]
}
说明:hosts中的IP为各etcd节点IP及本地127地址,etcd的证书需要签入所有节点ip,在生产环境中hosts列表最好多预留几个IP,这样后续扩展节点或者因故障需要迁移时不需要再重新生成证书。(我生产环境使用阿里云VPC网络,所以会预留指定段的IP)

3)生成证书及私钥



注意命令中使用的证书的具体位置



[root@k8s-master01 ~]# cd /opt/k8s/certs/
[root@k8s-master01 certs]# cfssl gencert -ca=/opt/k8s/certs/ca.pem \
     -ca-key=/opt/k8s/certs/ca-key.pem \
     -config=/opt/k8s/certs/ca-config.json \
     -profile=kubernetes etcd-csr.json | cfssljson -bare etcd
2019/04/22 17:17:51 [INFO] generate received request
2019/04/22 17:17:51 [INFO] received CSR
2019/04/22 17:17:51 [INFO] generating key: rsa-2048
2019/04/22 17:17:51 [INFO] encoded CSR
2019/04/22 17:17:51 [INFO] signed certificate with serial number 335217685822754469090490767964903486042452749906
2019/04/22 17:17:51 [WARNING] This certificate lacks a "hosts" field. This makes it unsuitable for
websites. For more information see the Baseline Requirements for the Issuance and Management
of Publicly-Trusted Certificates, v.1.1.6, from the CA/Browser Forum (https://cabforum.org);
specifically, section 10.2.3 ("Information Requirements").

4)查看证书


etcd.csr是签署时用到的中间文件,如果你不打算自己签署证书,而是让第三方的CA机构签署,只需要把etcd.csr文件提交给CA机构。


[root@k8s-master01 certs]# ll etcd*
-rw-r--r--. 1 root root 1066 Apr 22 17:17 etcd.csr
-rw-r--r--. 1 root root  293 Apr 22 17:10 etcd-csr.json
-rw-------. 1 root root 1679 Apr 22 17:17 etcd-key.pem
-rw-r--r--. 1 root root 1444 Apr 22 17:17 etcd.pem

5)证书分发


把生成的etcd证书复制到创建的证书目录并放至另2台etcd节点



正常情况下只需要copy这三个文件即可,ca.pem(已经存在)、etcd-key.pem、etcd.pem


[root@k8s-master01 certs]# ansible k8s-master -m copy -a 'src=/opt/k8s/certs/etcd.pem dest=/etc/kubernetes/ssl/'
[root@k8s-master01 certs]# ansible k8s-master -m copy -a 'src=/opt/k8s/certs/etcd-key.pem dest=/etc/kubernetes/ssl/'

6)修改etcd配置参数


为了安全性起我这里使用单独的用户启动 Etcd


##创建etcd用户和组
[root@k8s-master01 ~]# ansible k8s-master -m group -a 'name=etcd'
[root@k8s-master01 ~]# ansible k8s-master -m user -a 'name=etcd group=etcd comment="etcd user" shell=/sbin/nologin home=/var/lib/etcd createhome=no'
##创建etcd数据存放目录并授权
[root@k8s-master01 ~]# ansible k8s-master -m file -a 'path=/var/lib/etcd state=directory owner=etcd group=etcd'

说明:

以上步骤若是感觉比较麻烦,可以直接在对应三台master主机执行以下命令即可

mkdir /etc/kubernetes/config
groupadd -r etcd
useradd -r -g etcd -d /var/lib/etcd -s /sbin/nologin -c "etcd user" etcd
mkdir /var/lib/etcd/
chown -R etcd:etcd /var/lib/etcd/




7)配置etcd配置文件



etcd.conf配置文件信息,配置文件中涉及证书,etcd用户需要对其有可读权限,否则会提示无法获取证书,644权限即可。



[root@k8s-master01 ~]# vim /opt/k8s/cfg/etcd.conf
#[member]
ETCD_NAME="etcd01"
ETCD_DATA_DIR="/var/lib/etcd"
#ETCD_SNAPSHOT_COUNTER="10000"
#ETCD_HEARTBEAT_INTERVAL="100"
#ETCD_ELECTION_TIMEOUT="1000"
ETCD_LISTEN_PEER_URLS="https://10.10.0.18:2380"
ETCD_LISTEN_CLIENT_URLS="https://10.10.0.18:2379,https://127.0.0.1:2379"
#ETCD_MAX_SNAPSHOTS="5"
#ETCD_MAX_WALS="5"
#ETCD_CORS=""
ETCD_AUTO_COMPACTION_RETENTION="1"
ETCD_QUOTA_BACKEND_BYTES="8589934592"
ETCD_MAX_REQUEST_BYTES="5242880"
#[cluster]
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.10.0.18:2380"
# if you use different ETCD_NAME (e.g. test),
# set ETCD_INITIAL_CLUSTER value for this name, i.e. "test=http://..."
ETCD_INITIAL_CLUSTER="etcd01=https://10.10.0.18:2380,etcd02=https://10.10.0.19:2380,etcd03=https://10.10.0.20:2380"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_INITIAL_CLUSTER_TOKEN="k8s-etcd-cluster"
ETCD_ADVERTISE_CLIENT_URLS="https://10.10.0.18:2379"
#[security]
CLIENT_CERT_AUTH="true"
ETCD_CA_FILE="/etc/kubernetes/ssl/ca.pem"
ETCD_CERT_FILE="/etc/kubernetes/ssl/etcd.pem"
ETCD_KEY_FILE="/etc/kubernetes/ssl/etcd-key.pem"
PEER_CLIENT_CERT_AUTH="true"
ETCD_PEER_CA_FILE="/etc/kubernetes/ssl/ca.pem"
ETCD_PEER_CERT_FILE="/etc/kubernetes/ssl/etcd.pem"
ETCD_PEER_KEY_FILE="/etc/kubernetes/ssl/etcd-key.pem"

参数解释:

  1. ETCD_NAME:etcd节点成员名称,在一个etcd集群中必须唯一性,可使用Hostname或者machine-id
  2. ETCD_LISTEN_PEER_URLS:和其它成员节点间通信地址,每个节点不同,必须使用IP,使用域名无效
  3. ETCD_LISTEN_CLIENT_URLS:对外提供服务的地址,通常为本机节点。使用域名无效
  4. ETCD_INITIAL_ADVERTISE_PEER_URLS:节点监听地址,并会通告集群其它节点
  5. ETCD_INITIAL_CLUSTER:集群中所有节点信息,格式为:节点名称+监听的本地端口,及:ETCD_NAME:https://ETCD_INITIAL_ADVERTISE_PEER_URLS
  6. ETCD_ADVERTISE_CLIENT_URLS:节点成员客户端url列表,对外公告此节点客户端监听地址,可以使用域名
  7. ETCD_AUTO_COMPACTION_RETENTION:  在一个小时内为mvcc键值存储的自动压实保留。0表示禁用自动压缩
  8. ETCD_QUOTA_BACKEND_BYTES: ETCDdb存储数据大小,默认2G,推荐8G
  9. ETCD_MAX_REQUEST_BYTES: 事务中允许的最大操作数,默认1.5M,官方推荐10M,我这里设置5M,大家根据自己实际业务设置

       由于我们是三个节点etcd集群,所以需要把etcd.conf配置文件复制到另外2个节点,并把上面参数解释中红色参数修改为对应主机IP。

 分发etcd.conf配置文件,当然你不用ansible,可以直接用scp命令把配置文件传输到三台机器对应位置,然后三台机器分别修改IP、ETCD_NAME等参数。 

[root@k8s-master01 config]# ansible k8s-master -m copy -a 'src=/opt/k8s/cfg/etcd.conf dest=/etc/kubernetes/config/etcd.conf'
##登陆对应主机修改配置文件,把对应IP修改为本地IP

编辑etcd.service 启动文件 

[root@k8s-master01 ~]# vim /opt/k8s/unit/etcd.service

[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
WorkingDirectory=/var/lib/etcd/
EnvironmentFile=-/etc/kubernetes/config/etcd.conf
User=etcd
# set GOMAXPROCS to number of processors
ExecStart=/bin/bash -c "GOMAXPROCS=$(nproc) /usr/local/bin/etcd --name=\"${ETCD_NAME}\" --data-dir=\"${ETCD_DATA_DIR}\" --listen-client-urls=\"${ETCD_LISTEN_CLIENT_URLS}\""
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
[root@k8s-master01 ~]# ansible k8s-master -m copy -a 'src=/opt/k8s/unit/etcd.service dest=/usr/lib/systemd/system/etcd.service'
[root@k8s-master01 ~]# ansible k8s-master -m shell -a 'systemctl daemon-reload'
[root@k8s-master01 ~]# ansible k8s-master -m shell -a 'systemctl enable etcd'
[root@k8s-master01 ~]# ansible k8s-master -m shell -a 'systemctl start etcd'

注:


这里需要三台etcd服务同时启动,在三台机器上同时执行启动命令,启动其中一台后,服务会卡在那里,直到集群中所有etcd节点都已启动。我这里因为是ansible远程执行,所以没有出现这个问题。


 8)验证集群



etcd3版本,查看集群状态时,需要指定对应的证书位置 



[root@k8s-master01 ~]# etcdctl --endpoints=https://10.10.0.18:2379,https://10.10.0.19:2379,https://10.10.0.20:2379 \
           --cert-file=/etc/kubernetes/ssl/etcd.pem \
           --ca-file=/etc/kubernetes/ssl/ca.pem \
           --key-file=/etc/kubernetes/ssl/etcd-key.pem \
           cluster-health
member 804ed05b4beec304 is healthy: got healthy result from https://10.10.0.20:2379
member 8a5b84381bee52dd is healthy: got healthy result from https://10.10.0.19:2379
member caba783185460428 is healthy: got healthy result from https://10.10.0.18:2379
cluster is healthy
[root@k8s-master01 ~]# etcdctl --endpoints=https://10.10.0.18:2379,https://10.10.0.19:2379,https://10.10.0.20:2379 \
           --cert-file=/etc/kubernetes/ssl/etcd.pem \
           --ca-file=/etc/kubernetes/ssl/ca.pem \
           --key-file=/etc/kubernetes/ssl/etcd-key.pem \
           member list
804ed05b4beec304: name=etcd03 peerURLs=https://10.10.0.20:2380 clientURLs=https://10.10.0.20:2379 isLeader=false
8a5b84381bee52dd: name=etcd02 peerURLs=https://10.10.0.19:2380 clientURLs=https://10.10.0.19:2379 isLeader=true
caba783185460428: name=etcd01 peerURLs=https://10.10.0.18:2380 clientURLs=https://10.10.0.18:2379 isLeader=false
## 可以看到集群显示健康,并可以看到isLeader=true 所在节点