docker默认采用的是端口映射的方式来让外部访问,比如你启动一个MySQL,在container内部会有一个虚拟ip,默认是172.17之类的网段,宿主机可以ping通这个ip地址,但是除了宿主机能访问这个虚拟网段,其他任何机器(不管是否docker还是非docker)都无法访问,这个很容易理解,因为这个网段的ip本身就是虚拟出来的。因此要访问docker容器的方式采用的是端口映射,通过虚拟ip网段和宿主机映射端口,你直接访问宿主机的端口就可以了。 

比如:

e91703882bd0        registry            "/entrypoint.sh /etc   11 days ago         Up 7 minutes        0.0.0.0:5000->5000/tcp   hopeful_gates

以上你要访问这个容器的5000端口,实际需要访问docker宿主机的5000端口,因为容器和宿主机做了绑定。以上属于docker基本概念,就不细说了。

那么问题来了,当我们使用k8s来管理docker容器,如果底层全部采用端口映射,那将是多么麻烦的一件事?每个node可能有上百个容器,全部需要做端口映射,如果能够直接访问这些虚拟ip,和用实际的ip一样,这样内部调用直接采用ip和端口即可,这样做就会方便很多。

flannel就是解决这个问题的一个组件,我们先来看一下官网对flannel的解释:

Platforms like Kubernetes assume that each container (pod) has a unique, routable IP inside the cluster. The advantage of this model is that it removes the port mapping complexities that come from sharing a single host IP.

Flannel is responsible for providing a layer 3 IPv4 network between multiple nodes in a cluster. Flannel does not control how containers are networked to the host, only how the traffic is transported between hosts. However, flannel does provide a CNI plugin for Kubernetes and a guidance on integrating with Docker.

第一句话就是说使用共享ip来替代复杂的端口映射,这就是flannel的意义。

1. 安装flannel, 只需要在k8s的node节点安装。先去官网下载最新版本,然后解压缩。

flannel会指定一个网段给所有docker使用,而且是自动分配的,另外flannel使用etcd来管理网端的信息。默认的key是:/coreos.com/network/config , 简单说就是flannel会去etcd获取这个key,以此来获取定义好的网段信息。

写入网段信息到etcd:

/data/etcd/etcdctl --ca-file=/data/etcd/cert/ca.pem --cert-file=/data/etcd/cert/peer.pem --key-file=/data/etcd/cert/peer-key.pem --endpoints=https://10.203.3.96:2379,https://10.203.0.46:2379,https://10.203.0.43:2379 mk /coreos.com/network/config '{ "Network": "172.17.0.0/16", "Backend": {"Type": "vxlan"}}'

 提示:这里有一个问题,flannel不能使用etcd 3的版本,但是实际上我们现在很多环境都是用最新版,因此3版本以上的etcd在安装的时候必须 --enable-v2,如下:

/data/etcd/etcd --name=etcd-node2 --data-dir=/data/etcd/data --listen-client-urls=https://10.203.3.96:2379 --listen-peer-urls=https://10.203.3.96:2380 --advertise-client-urls=https://10.203.3.96:2379 --initial-advertise-peer-urls=https://10.203.3.96:2380  --initial-cluster=etcd-node1=https://10.203.0.46:2380,etcd-node2=https://10.203.3.96:2380,etcd-node3=https://10.203.0.43:2380 --initial-cluster-state=new --peer-key-file=/data/etcd/cert/peer-key.pem --peer-cert-file=/data/etcd/cert/peer.pem --key-file=/data/etcd/cert/peer-key.pem --cert-file=/data/etcd/cert/peer.pem --client-cert-auth --trusted-ca-file=/data/etcd/cert/ca.pem --peer-client-cert-auth --peer-trusted-ca-file=/data/etcd/cert/ca.pem --enable-v2

如果你是3以上的etcd版本,需要关闭etcd,然后添加--enable-v2并重启,否者无法使用flannel,报错信息是can't get config之类的信息。

2.  启动flannel

./flanneld --ip-masq --etcd-endpoints=https://10.203.0.43:2379,https://10.203.3.96:2379,https://10.203.0.46:2379 -etcd-cafile=/data/etcd/cert/ca.pem -etcd-certfile=/data/etcd/cert/peer.pem -etcd-keyfile=/data/etcd/cert/peer-key.pem &

3. 生成subnet.env,docker将会使用这个subnet.env来启动。

/data/flannel/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/subnet.env

来看一下subnet.env什么内容:

[root@oyoshbddwatlasyprd1 flannel]# cat /run/flannel/subnet.env
DOCKER_OPT_BIP="--bip=172.17.83.1/24"
DOCKER_OPT_IPMASQ="--ip-masq=false"
DOCKER_OPT_MTU="--mtu=1450"
DOCKER_NETWORK_OPTIONS=" --bip=172.17.83.1/24 --ip-masq=false --mtu=1450"

大家可以看到,这个内容其实是随机生成的,172.17.83.1/24,表示docker使用这个虚拟网段,docker启动的时候需要使用这个配置来启动。

 

4. 修改docker.service,以便指定subnet.env文件来启动

#ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
EnvironmentFile=/run/flannel/subnet.env
ExecStart=/usr/bin/dockerd $DOCKER_NETWORK_OPTIONS

第一行注释掉的是默认的启动命令,现在改成根据subnet.env的变量DOCKER_NETWORK_OPTIONS来启动docker.

5. 重新启动docker

systemctl daemon-reload
systemctl restart docker

6. 确认docker网段

第一台机器:

docker0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 172.17.52.1  netmask 255.255.255.0  broadcast 172.17.52.255
        ether 02:42:02:ff:28:6a  txqueuelen 0  (Ethernet)
        RX packets 1312218  bytes 91088448 (86.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1403712  bytes 3494588370 (3.2 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.203.0.46  netmask 255.255.254.0  broadcast 10.203.1.255
        ether 00:16:3e:06:f3:4d  txqueuelen 1000  (Ethernet)
        RX packets 37252931  bytes 11557211415 (10.7 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 33898339  bytes 6693779547 (6.2 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 172.17.52.0  netmask 255.255.255.255  broadcast 0.0.0.0
        ether 3a:40:68:15:5c:45  txqueuelen 0  (Ethernet)
        RX packets 5  bytes 420 (420.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5  bytes 420 (420.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

第二台机器:

docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.83.1  netmask 255.255.255.0  broadcast 172.17.83.255
        ether 02:42:53:e1:69:7c  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.203.3.96  netmask 255.255.254.0  broadcast 10.203.3.255
        ether 00:16:3e:12:21:46  txqueuelen 1000  (Ethernet)
        RX packets 560672439  bytes 440894465247 (410.6 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 526563533  bytes 202796272242 (188.8 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 172.17.83.0  netmask 255.255.255.255  broadcast 0.0.0.0
        ether 2a:2e:0d:a8:5a:3a  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

仔细看网段,这2台机器是不一样的,都是flannel随机生成的。

7. 测试别的机器能够访问容器的虚拟ip

我们随便启动一个容器:

[root@oyoshbddwatlasprd0 flannel]# docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                    NAMES
e91703882bd0        registry            "/entrypoint.sh /etc   11 days ago         Up 34 minutes       0.0.0.0:5000->5000/tcp   hopeful_gates
[root@oyoshbddwatlasprd0 flannel]# docker exec -it e91703882bd0 sh
/ # ifconfig -a
eth0      Link encap:Ethernet  HWaddr 02:42:AC:11:34:02  
          inet addr:172.17.52.2  Bcast:172.17.52.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:10 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:868 (868.0 B)  TX bytes:868 (868.0 B)

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

/ #

上面有一个容器,它的ip是: 172.17.52.2 ,随便找一个别的docker机器来访问这个ip

[root@oyoshbddwatlasyprd1 flannel]# ping 172.17.52.2
PING 172.17.52.2 (172.17.52.2) 56(84) bytes of data.
64 bytes from 172.17.52.2: icmp_seq=1 ttl=63 time=1.06 ms
64 bytes from 172.17.52.2: icmp_seq=2 ttl=63 time=0.990 ms
64 bytes from 172.17.52.2: icmp_seq=3 ttl=63 time=1.00 ms
64 bytes from 172.17.52.2: icmp_seq=4 ttl=63 time=1.00 ms
64 bytes from 172.17.52.2: icmp_seq=5 ttl=63 time=1.00 ms

很明显,我在prd1上访问prd0的虚拟ip是通的。这就意味不需要再做什么端口映射了,因为docker配合flannel能够让docker的虚拟ip随意访问,直接访问即可,无需端口映射了。

在k8s里经常会提到网络模型的概念cni, 实际上这个就是所谓的网络模型。