如果两个namespace处于不同的子网中,那么就不能通过bridge进行连接了,而是需要通过路由器进行三层转发。然而Linux并未像提供虚拟网桥一样也提供一个虚拟路由器设备,原因是Linux自身就具备有路由器功能。

路由器的工作原理是这样的:路由器上有2到多个网络接口,每个网络接口处于不同的三层子网上。路由器会根据内部的路由转发表将从一个网络接口中收到的数据包转发到另一个网络接口,这样就实现了不同三层子网之间的互通。Linux内核提供了IP Forwarding功能,启用IP Forwarding后,就可以在不同的网络接口中转发IP数据包,相当于实现了路由器的功能。

开启路由转发

Linux的IP Forwarding功能并不是默认开启的,可以采用下面的方法开启:

/etc/sysctl.conf下增加如下内容:

net.ipv4.ip_forward=1
net.ipv6.conf.default.forwarding=1
net.ipv6.conf.all.forwarding=1

然后使用sysctl -p重新加载配置文件:

$ sysctl -p /etc/sysctl.conf

使用路由连接两个namespace

下面我们实验将两个不同三层子网中的namespace通过Linux自身的路由功能连接起来,该试验的网络拓扑如下图所示。

【Docker】Linux路由连接两个不同网段namespace,连接namespace与主机_docker

注意图中下方的路由器并未对应一个物理或者虚拟的路由器设备,而是采用了一个带两个虚拟网卡的namespace来实现,由于Linux内核启用了IP forwading功能,因此ns-router namespace可以在其两个处于不同子网的网卡之间进行IP数据包转发,实现了路由功能。

创建namespace

创建三个名为ns0、ns1、ns-router的namespace,其中ns0和ns1充当两个不同网段的命名空间,ns-router负责充当路由功能。

$ ip netns add ns0
$ ip netns add ns1
$ ip netns add ns-router

$ ip netns list
ns-router
ns1
ns0

创建veth

创建两个veth用来连接两个namespace和router。

$ ip link add type veth
$ ip link add type veth

$ ip link
56: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 2e:40:31:14:9e:5d brd ff:ff:ff:ff:ff:ff
57: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 86:a3:bf:bc:2c:82 brd ff:ff:ff:ff:ff:ff
58: veth2@veth3: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether f2:c5:84:06:e6:76 brd ff:ff:ff:ff:ff:ff
59: veth3@veth2: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 42:be:88:01:8c:c0 brd ff:ff:ff:ff:ff:ff

将veth划入namespace

使用veth pair将ns0和ns1连接到由ns-router实现的路由器上。

$ ip link set veth0 netns ns0
$ ip link set veth1 netns ns-router
$ ip link set veth2 netns ns1
$ ip link set veth3 netns ns-router

为veth分配ip

为虚拟网卡设置ip地址,ns0和ns1分别为192.168.0.0/24和192.168.1.0/24两个子网上,而ns-router的两个网卡则分别连接到了这两个子网上。

$ ip netns exec ns0 ip addr add 192.168.0.2/24 dev veth0
$ ip netns exec ns-router ip addr add 192.168.0.1/24 dev veth1
$ ip netns exec ns1 ip addr add 192.168.1.2/24 dev veth2
$ ip netns exec ns-router ip addr add 192.168.1.1/24 dev veth3

启用veth

将网卡的状态设置为up。

$ ip netns exec ns0 ip link set veth0 up
$ ip netns exec ns-router ip link set veth1 up
$ ip netns exec ns-router ip link set veth3 up
$ ip netns exec ns1 ip link set veth2 up

查看各个命名空间的ip

查看命名空间ns0的ip:

$ ip netns exec ns0 ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
56: veth0@if57: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 2e:40:31:14:9e:5d brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet 192.168.0.2/24 scope global veth0
       valid_lft forever preferred_lft forever
    inet6 fe80::2c40:31ff:fe14:9e5d/64 scope link
       valid_lft forever preferred_lft forever

查看命名空间ns-router的ip:

$ ip netns exec ns-router ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
57: veth1@if56: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 86:a3:bf:bc:2c:82 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.0.1/24 scope global veth1
       valid_lft forever preferred_lft forever
    inet6 fe80::84a3:bfff:febc:2c82/64 scope link
       valid_lft forever preferred_lft forever
59: veth3@if58: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 42:be:88:01:8c:c0 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet 192.168.1.1/24 scope global veth3
       valid_lft forever preferred_lft forever
    inet6 fe80::40be:88ff:fe01:8cc0/64 scope link
       valid_lft forever preferred_lft forever

查看命名空间ns1的ip:

$ ip netns exec ns1 ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
58: veth2@if59: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether f2:c5:84:06:e6:76 brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet 192.168.1.2/24 scope global veth2
       valid_lft forever preferred_lft forever
    inet6 fe80::f0c5:84ff:fe06:e676/64 scope link
       valid_lft forever preferred_lft forever

测试

此时尝试从ns0 ping ns1,会失败,原因是虽然ns-router可以进行路由转发,但ns1的IP地址不在ns0的子网中,ns0在尝试发送IP数据包时找不到对应的路由,因此会报错,提示Network is unreachable。此时IP数据包并未能发送到ns-router上。

$ ip netns exec ns0 ping 192.168.1.1 -c 3
connect: Network is unreachable

$ ip netns exec ns0 ping 192.168.1.2 -c 3
connect: Network is unreachable

添加路由

我们在ns0和ns1中分别加上到达对方子网的路由,即将发送到对方子网的IP数据包先发送到路由器上本子网对于的网络接口上,然后通过路由器ns-router进行转发

$ ip netns exec ns0 ip route add 192.168.1.0/24 via 192.168.0.1
$ ip netns exec ns1 ip route add 192.168.0.0/24 via 192.168.1.1

再次测试

此时再在两个ns中尝试ping对方,就可以成功了。

$ ip netns exec ns0 ping 192.168.1.2 -c 3
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=63 time=0.045 ms
64 bytes from 192.168.1.2: icmp_seq=2 ttl=63 time=0.040 ms
64 bytes from 192.168.1.2: icmp_seq=3 ttl=63 time=0.031 ms

--- 192.168.1.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.031/0.038/0.045/0.009 ms

$ ip netns exec ns1 ping 192.168.0.2 -c 3
PING 192.168.0.2 (192.168.0.2) 56(84) bytes of data.
64 bytes from 192.168.0.2: icmp_seq=1 ttl=63 time=0.034 ms
64 bytes from 192.168.0.2: icmp_seq=2 ttl=63 time=0.042 ms
64 bytes from 192.168.0.2: icmp_seq=3 ttl=63 time=0.034 ms

--- 192.168.0.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.034/0.036/0.042/0.007 ms

为了方便理解,在该试验中使用了一个单独的namespace ns-router来承担路由器的功能,实际上我们可以直接把veth路由器端的虚拟网卡放在default network namespace中,由default network namespace来承担路由器功能。

使用路由连接namespace与主机

前面在介绍Linux bridge时我们讲到,从网络角度上来说,bridge是一个二层设备,因此并不需要设置IP。但Linux bridge虚拟设备比较特殊:我们可以认为bridge自带了一张网卡,这张网卡在主机上显示的名称就是bridge的名称。这张网卡在bridge上,因此可以和其它连接在bridge上的网卡和namespace进行二层通信;同时从主机角度来看,虚拟bridge设备也是主机default network namespace上的一张网卡,在为该网卡设置了IP后,可以参与主机的路由转发。

通过给bridge设置一个IP地址,并将该IP设置为namespace的缺省网关,可以让namespace和主机进行网络通信。如果在主机上再添加相应的路由,可以让namespace和外部网络进行通信。

下面显示了为Linux bridge设备bridge0设置了IP地址后的逻辑网络视图。注意下图中Linux bridge(bridge0)和路由器(default network namespace)上出现了bridge0这张网卡,即这张网卡同时在二层上工作于Linux bridge中,在三层上工作于default network namespace中。

【Docker】Linux路由连接两个不同网段namespace,连接namespace与主机_路由_02

当将bridge0设置为缺省网关后,可以从ns0和ns1连接到主机网络172.16.0.157/16上。此时数据流向是这样的:ns0–(网桥)–>bridge0–(IP Forwarding)–>172.16.0.157/16

创建namespace

创建命名空间ns0和ns1:

$ ip netns add ns0
$ ip netns add ns1

$ ip netns list
ns1
ns0

创建veth

创建2对veth pair:

$ ip link add type veth
$ ip link add type veth

$ ip link
60: veth0@veth1: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 22:08:b1:3d:44:a3 brd ff:ff:ff:ff:ff:ff
61: veth1@veth0: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether d2:db:62:51:7d:75 brd ff:ff:ff:ff:ff:ff
62: veth2@veth3: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 62:da:16:fa:50:a0 brd ff:ff:ff:ff:ff:ff
63: veth3@veth2: <BROADCAST,MULTICAST,M-DOWN> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether d6:59:1b:fb:e6:a6 brd ff:ff:ff:ff:ff:ff

创建bridge并启用

$ ip link add bridge0 type bridge

$ ip link
64: bridge0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether d2:3b:75:2a:23:50 brd ff:ff:ff:ff:ff:ff

$ ip link set bridge0 up

划分veth

通过veth pair将ns0和ns1连接到bridge0上。

$ ip link set veth0 netns ns0
$ ip link set veth2 netns ns1
$ ip link set veth1 master bridge0
$ ip link set veth3 master bridge0

为veth设置ip

$ ip netns exec ns0 ip addr add 192.168.1.2/24 dev veth0
$ ip netns exec ns1 ip addr add 192.168.1.3/24 dev veth2

启用veth

$ ip netns exec ns0 ip link set veth0 up
$ ip netns exec ns1 ip link set veth2 up
$ ip link set veth1 up
$ ip link set veth3 up

$ ip link
61: veth1@if60: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master bridge0 state UP mode DEFAULT group default qlen 1000
    link/ether d2:db:62:51:7d:75 brd ff:ff:ff:ff:ff:ff link-netnsid 0
63: veth3@if62: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master bridge0 state UP mode DEFAULT group default qlen 1000
    link/ether d6:59:1b:fb:e6:a6 brd ff:ff:ff:ff:ff:ff link-netnsid 1
64: bridge0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether d2:db:62:51:7d:75 brd ff:ff:ff:ff:ff:ff

查看命名空间的ip

查看命名空间ns0的ip:

$ ip netns exec ns0 ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
60: veth0@if61: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 22:08:b1:3d:44:a3 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.1.2/24 scope global veth0
       valid_lft forever preferred_lft forever
    inet6 fe80::2008:b1ff:fe3d:44a3/64 scope link
       valid_lft forever preferred_lft forever

查看命名空间ns1的ip:

$ ip netns exec ns1 ip addr
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
62: veth2@if63: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 62:da:16:fa:50:a0 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 192.168.1.3/24 scope global veth2
       valid_lft forever preferred_lft forever
    inet6 fe80::60da:16ff:fefa:50a0/64 scope link
       valid_lft forever preferred_lft forever

测试

从命名空间ns0尝试ping命名空间ns1,可以通信

$ ip netns exec ns0 ping 192.168.1.3 -c 3
PING 192.168.1.3 (192.168.1.3) 56(84) bytes of data.
64 bytes from 192.168.1.3: icmp_seq=1 ttl=64 time=0.026 ms
64 bytes from 192.168.1.3: icmp_seq=2 ttl=64 time=0.034 ms
64 bytes from 192.168.1.3: icmp_seq=3 ttl=64 time=0.031 ms

--- 192.168.1.3 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.026/0.030/0.034/0.005 ms

从命名空间ns1尝试ping命名空间ns0,可以通信

$ ip netns exec ns1 ping 192.168.1.2 -c 3
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=0.049 ms
64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=0.030 ms
64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=0.037 ms

--- 192.168.1.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.030/0.038/0.049/0.010 ms

从命名空间ns0尝试ping主机,不可以通信

$ ip netns exec ns0 ping 172.16.0.157 -c 3
connect: Network is unreachable

从命名空间ns1尝试ping主机,不可以通信

$ ip netns exec ns1 ping 172.16.0.157 -c 3
connect: Network is unreachable

此时ns0和ns1之间可以通信,但如果尝试从ns0和ns1中ping主机IP地址,则会发现网络不可达,原因是地址不在同一子网上,并且没有相应的路由。

为bridge0分配ip

$ ip addr add 192.168.1.1/24 dev bridge0

$ ip addr
default qlen 1000
    link/ether d2:db:62:51:7d:75 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.1/24 scope global bridge0
       valid_lft forever preferred_lft forever

给命名空间添加默认路由

给命名空间ns0添加默认路由

$ ip netns exec ns0 ip route add default via 192.168.1.1

$ ip netns exec ns0 route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         gateway         0.0.0.0         UG    0      0        0 veth0
192.168.1.0     0.0.0.0         255.255.255.0   U     0      0        0 veth0

给命名空间ns1添加默认路由

$ ip netns exec ns1 ip route add default via 192.168.1.1

$ ip netns exec ns1 route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         gateway         0.0.0.0         UG    0      0        0 veth2
192.168.1.0     0.0.0.0         255.255.255.0   U     0      0        0 veth2

在ns0和ns1中设置bridge0的IP为缺省网关。

再次测试

从命名空间ns0尝试ping主机,可以通信

$ ip netns exec ns0 ping 172.16.0.157 -c 3
PING 172.16.0.157 (172.16.0.157) 56(84) bytes of data.
64 bytes from 172.16.0.157: icmp_seq=1 ttl=64 time=0.026 ms
64 bytes from 172.16.0.157: icmp_seq=2 ttl=64 time=0.037 ms
64 bytes from 172.16.0.157: icmp_seq=3 ttl=64 time=0.033 ms

--- 172.16.0.157 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.026/0.032/0.037/0.004 ms

从命名空间ns1尝试ping主机,可以通信

$ ip netns exec ns1 ping 172.16.0.157 -c 3
PING 172.16.0.157 (172.16.0.157) 56(84) bytes of data.
64 bytes from 172.16.0.157: icmp_seq=1 ttl=64 time=0.022 ms
64 bytes from 172.16.0.157: icmp_seq=2 ttl=64 time=0.038 ms
64 bytes from 172.16.0.157: icmp_seq=3 ttl=64 time=0.038 ms

--- 172.16.0.157 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.022/0.032/0.038/0.010 ms

此时再从ns0和ns1尝试ping主机IP,发现已经可以正常通信,现在我们已经通过将bridge0设置为缺省网关的方法打通了namespace和主机之间的网络。

使用iptables连接namespace与外部网络

在上面的例子中,虽然使用路由连接了namespace和主机的网络,但是在namespace中无法访问外部的网络。

尝试在命名空间ns0和ns1中访问百度:

$ ip netns exec ns1 ping www.baidu.com -c 3
ping: www.baidu.com: Name or service not known

$ ip netns exec ns0 ping www.baidu.com -c 3
ping: www.baidu.com: Name or service not known

下面使用iptables做DNAT转换连接namespace与外部网络:

$ iptables -t nat -A POSTROUTING -s 192.168.1.1/24 -o eth0 -j MASQUERADE

$ iptables --list -t nat
Chain PREROUTING (policy ACCEPT)
target     prot opt source               destination

Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Chain POSTROUTING (policy ACCEPT)
target     prot opt source               destination
MASQUERADE  all  --  192.168.1.0/24       anywhere

再次尝试在命名空间ns0和ns1中访问百度:

$ ip netns exec ns0 ping www.baidu.com -c 3
PING www.a.shifen.com (14.119.104.254) 56(84) bytes of data.
64 bytes from 14.119.104.254 (14.119.104.254): icmp_seq=1 ttl=51 time=9.83 ms
64 bytes from 14.119.104.254 (14.119.104.254): icmp_seq=2 ttl=51 time=9.37 ms
64 bytes from 14.119.104.254 (14.119.104.254): icmp_seq=3 ttl=51 time=9.42 ms

--- www.a.shifen.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 9.378/9.545/9.832/0.232 ms

$ ip netns exec ns1 ping www.baidu.com -c 3
PING www.a.shifen.com (14.119.104.254) 56(84) bytes of data.
64 bytes from 14.119.104.254 (14.119.104.254): icmp_seq=1 ttl=51 time=9.31 ms
64 bytes from 14.119.104.254 (14.119.104.254): icmp_seq=2 ttl=51 time=9.35 ms
64 bytes from 14.119.104.254 (14.119.104.254): icmp_seq=3 ttl=51 time=9.39 ms

--- www.a.shifen.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 9.319/9.355/9.396/0.031 ms

发现在命名空间ns0和ns1中可以访问外部网络了。