Network NameSpace
Linux的namespace(名字空间)的作用就是“隔离内核资源”。而network namespace(Linux内核2.6版本引入)作用是隔离Linux系统的设备,以及IP地址、端口、路由表、防火墙规则等网络资源。因此,每个network namespace里都有自己的网络设备(如IP地址、路由表、端口范围、/proc/net目录等)。
Linux虚拟网络组件
veth pair
- veth是虚拟以太网卡(Virtual Ethernet)的缩写。veth设备总是成对的,因此被称之为veth pair。
- veth pair常被用于跨network namespace之间的通信,即分别将veth pair的两端放在不同的namespace里,如下图
- 以上方式仅适合于两个Namespace通信,一般我们会利用veth peer于namespace外界相连,如下图
Linux bridge
Linux bridge就是Linux系统中的网桥,但是Linux bridge的行为更像是一台虚拟的网络交换机,任意的真实物理设备(例如eth0)和虚拟设备(例如veth pair和tap设备)都可以连接到Linux bridge上。
- Linux bridge不能跨机连接网络设备。
tun/tap
- tun表示虚拟的是点对点设备
- tap表示虚拟的是以太网设备
- 两种设备针对网络包实施不同的封装
工作原理
普通通信情况,app通过socket API到达linux网络协议栈
使用Tun/tap以后,tun设备通过一个设备文件(/dev/tunX)收发数据包。所有对这个文件的写操作会通过tun设备转换成一个数据包传送给内核网络协议栈。
- tun设备的/dev/tunX文件收发的是IP包,只能工作在L3,无法与物理网卡做桥接,但可以通过三层交换(例如ip_forward)与物理网卡连通
- tap设备的/dev/tapX文件收发的是链路层数据包,可以与物理网卡做桥接
Linux L3隧道
Linux L3隧道底层实现原理都基于tun设备,因此可以将Linux L3隧道看作tun设备的高级应用篇。
Linux原生支持下列5种L3隧道:
- ipip:即IPv4 in IPv4,在IPv4报文的基础上封装一个IPv4报文;
- GRE:即通用路由封装(Generic Routing Encapsulation),定义了在任意一种网络层协议上封装其他任意一种网络层协议的机制,适用于IPv4和IPv6;
- sit:和ipip类似,不同的是sit用IPv4报文封装IPv6报文,即IPv6 over IPv4;
- ISATAP:即站内自动隧道寻址协议(Intra-Site Automatic Tunnel AddressingProtocol),与sit类似,也用于IPv6的隧道封装;
- VTI:即虚拟隧道接口(Virtual Tunnel Interface),是思科提出的一种IPSec隧道技术。
本文以ipip为例,学习Linux隧道通信的基本原理。
ipip实验
本实验分两部分
- 同一主机的不同namespace间通过ipip建立隧道
- 不同主机的namespace之间通过ipip建立隧道
环境拓扑
Name | 宿主 | IP Addr | peer | 备注 |
主机1 ens33 | Host 01 | 192.168.21.11/24 | 物理网卡 | |
主机2 ens33 | Host 02 | 192.168.21.12/24 | 物理网卡 | |
v1 | Host 01:ns1 | 10.10.10.2/24 | v1_p | veth peer |
v2 | Host 01:ns2 | 10.10.20.2/24 | v2_p | veth peer |
v3 | Host 02:ns3 | 10.10.30.2/24 | v3_p | veth peer |
v1_p | Host 01:ns1 | 10.10.10.1/24 | v1 | veth peer |
v2_p | Host 01:ns2 | 10.10.20.1/24 | v2 | veth peer |
v3_p | Host 02:ns3 | 10.10.30.1/24 | v3 | veth peer |
tun1@ns1 | Host 01 | 172.16.10.10/32 | tun1@ns2 | tunnel |
tun2@ns1 | Host 01 | 172.16.100.10/32 | tun1@ns3 | tunnel |
tun1@ns2 | Host 01 | 172.16.20.20/32 | tun1@ns1 | tunnel |
tun1@ns3 | Host 02 | 172.16.30.30/32 | tun2@ns1 | tunnel |
实验步骤
同一主机不同namespace的ipip隧道
- Host 01上创建namespace
ip netns add ns1
ip netns add ns2
- 创建veth peer,并挂在相应的namespace下
ip link add v1 type veth peer name v1_p
ip link add v2 type veth peer name v2_pip link set v1 netns ns1
ip link set v2 netns ns2
- 在linux下配置veth peer的一端
ip addr add 10.10.10.1/24 dev v1_p
ip link set v1_p up
ip addr add 10.10.20.1/24 dev v2_p
ip link set v2_p up
- 在namespace中配置veth peer的另一端
ip netns exec ns1 ip addr add 10.10.10.2/24 dev v1
ip netns exec ns1 ip link set v1 upip netns exec ns2 ip addr add 10.10.20.2/24 dev v2
ip netns exec ns2 ip link set v2 up
- 查看Linux网络协议栈中的ip_forward状态
[root@worker-01 ~]# cat /proc/sys/net/ipv4/ip_forward
1
#1表示ip转发已打开,如是0的化,可以用一下方法打开
- echo 1 /proc/sys/net/ipv4/ip_forward #临时方案
- 修改或增加
/etc/sysctl.conf
中net.ipv4.ip_forward = 1
#永久方案
- 增加namespace内部的对端路由
ip netns exec ns1 route add -net 10.10.20.0 netmask 255.255.255.0 gw 10.10.10.1
ip netns exec ns2 route add -net 10.10.10.0 netmask 255.255.255.0 gw 10.10.20.1
- 在namespace内部检查连通性,确定隧道两点的可达
[root@worker-01 ~]# ip netns exec ns1 ping 10.10.20.2
PING 10.10.20.2 (10.10.20.2) 56(84) bytes of data.
64 bytes from 10.10.20.2: icmp_seq=1 ttl=63 time=0.053 ms
64 bytes from 10.10.20.2: icmp_seq=2 ttl=63 time=0.125 ms
64 bytes from 10.10.20.2: icmp_seq=3 ttl=63 time=0.104 ms
^C
--- 10.10.20.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.053/0.094/0.125/0.030 ms
[root@worker-01 ~]# ip netns exec ns2 ping 10.10.10.2
PING 10.10.10.2 (10.10.10.2) 56(84) bytes of data.
64 bytes from 10.10.10.2: icmp_seq=1 ttl=63 time=0.071 ms
64 bytes from 10.10.10.2: icmp_seq=2 ttl=63 time=0.086 ms
64 bytes from 10.10.10.2: icmp_seq=3 ttl=63 time=0.054 ms
64 bytes from 10.10.10.2: icmp_seq=4 ttl=63 time=0.098 ms
^C
--- 10.10.10.2 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 0.054/0.077/0.098/0.017 ms
- 在namespace中创建tunnel
- ns1
ip netns exec ns1 ip tunnel add tun1 mode ipip remote 10.10.20.2 local 10.10.10.2
ip netns exec ns1 ip link set tun1 up- ns2
ip netns exec ns2 ip tunnel add tun1 mode ipip remote 10.10.10.2 local 10.10.20.2
ip netns exec ns1 ip link set tun1 up
- 端口情况和隧道可达性
host 01的端口情况
[root@worker-01 ~]# ifconfig
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.21.11 netmask 255.255.255.0 broadcast 192.168.21.255
inet6 fe80::f2a:f203:693b:c087 prefixlen 64 scopeid 0x20<link>
inet6 fe80::2c94:b9fd:ac8:6411 prefixlen 64 scopeid 0x20<link>
inet6 fe80::d95:b4d8:be84:cafa prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:3b:c1:97 txqueuelen 1000 (Ethernet)
RX packets 6123689 bytes 4238387536 (3.9 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 5925660 bytes 4772785528 (4.4 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
...
v1_p: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.10.10.1 netmask 255.255.255.0 broadcast 0.0.0.0
inet6 fe80::bc35:a6ff:fe8c:b03 prefixlen 64 scopeid 0x20<link>
ether be:35:a6:8c:0b:03 txqueuelen 1000 (Ethernet)
RX packets 138 bytes 12040 (11.7 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 32090 bytes 5790391 (5.5 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
v2_p: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.10.20.1 netmask 255.255.255.0 broadcast 0.0.0.0
inet6 fe80::5808:d2ff:fee9:83ab prefixlen 64 scopeid 0x20<link>
ether 5a:08:d2:e9:83:ab txqueuelen 1000 (Ethernet)
RX packets 46 bytes 3656 (3.5 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 31218 bytes 5637199 (5.3 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
查看namespace内部的端口
[root@worker-01 ~]# ip netns exec ns1 ifconfig
tun1: flags=209<UP,POINTOPOINT,RUNNING,NOARP> mtu 1480
inet 172.16.10.10 netmask 255.255.255.255 destination 172.16.20.20
inet6 fe80::5efe:a0a:a02 prefixlen 64 scopeid 0x20<link>
tunnel txqueuelen 1000 (IPIP Tunnel)
RX packets 3 bytes 252 (252.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 3 bytes 252 (252.0 B)
TX errors 3 dropped 0 overruns 0 carrier 0 collisions 0
v1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.10.10.2 netmask 255.255.255.0 broadcast 0.0.0.0
inet6 fe80::18f7:71ff:fecb:e113 prefixlen 64 scopeid 0x20<link>
ether 1a:f7:71:cb:e1:13 txqueuelen 1000 (Ethernet)
RX packets 32088 bytes 5790055 (5.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 138 bytes 12040 (11.7 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@worker-01 ~]# ip netns exec ns2 ifconfig
tun2: flags=209<UP,POINTOPOINT,RUNNING,NOARP> mtu 1480
inet 172.16.20.20 netmask 255.255.255.255 destination 172.16.10.10
inet6 fe80::5efe:a0a:1402 prefixlen 64 scopeid 0x20<link>
tunnel txqueuelen 1000 (IPIP Tunnel)
RX packets 3 bytes 252 (252.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 3 bytes 252 (252.0 B)
TX errors 3 dropped 0 overruns 0 carrier 0 collisions 0
v2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.10.20.2 netmask 255.255.255.0 broadcast 0.0.0.0
inet6 fe80::f483:3eff:feb2:8b3f prefixlen 64 scopeid 0x20<link>
ether f6:83:3e:b2:8b:3f txqueuelen 1000 (Ethernet)
RX packets 31369 bytes 5664578 (5.4 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 46 bytes 3656 (3.5 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
可以看到隧道都已经建成。
隧道访问情况:
[root@worker-01 ~]# ip netns exec ns1 ping 172.16.20.20
PING 172.16.20.20 (172.16.20.20) 56(84) bytes of data.
64 bytes from 172.16.20.20: icmp_seq=1 ttl=64 time=0.109 ms
64 bytes from 172.16.20.20: icmp_seq=2 ttl=64 time=0.104 ms
64 bytes from 172.16.20.20: icmp_seq=3 ttl=64 time=0.087 ms
^C
--- 172.16.20.20 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.087/0.100/0.109/0.009 ms
[root@worker-01 ~]# ip netns exec ns2 ping 172.16.10.10
PING 172.16.10.10 (172.16.10.10) 56(84) bytes of data.
64 bytes from 172.16.10.10: icmp_seq=1 ttl=64 time=0.078 ms
64 bytes from 172.16.10.10: icmp_seq=2 ttl=64 time=0.093 ms
64 bytes from 172.16.10.10: icmp_seq=3 ttl=64 time=0.373 ms
^C
--- 172.16.10.10 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.078/0.181/0.373/0.136 ms
成功!
不同主机namespace的ipip隧道
- Host 02上创建namespace
ip netns add ns3
- host 02创建veth peer,并挂在namespace下
ip link add v3 type veth peer name v3_p
ip link set v3 netns ns3
- 在host 02 linux下配置veth peer的一端
ip addr add 10.10.30.1/24 dev v3_p
ip link set v3_p up
- host 02在namespace中配置veth peer的另一端
ip netns exec ns3 ip addr add 10.10.30.2/24 dev v3
ip netns exec ns3 ip link set v3 up
- host 02查看Linux网络协议栈中的ip_forward状态
略 - host 01 和02中配置路由,确保到veth peer的子网联通
ns1
route add -net 10.10.30.0/24 gw 192.168.21.12 dev ens33
ns3
route add -net 10.10.10.0/24 gw 192.168.21.11 dev ens33
- host 01和 02增加namespace内部的对端路由
ip netns exec ns1 route add -net 10.10.30.0 netmask 255.255.255.0 gw 10.10.10.1
ip netns exec ns3 route add -net 10.10.10.0 netmask 255.255.255.0 gw 10.10.30.1
- host 01和host 02创建namespace中tunnel并配置IP地址
- ns1
ip netns exec ns1 ip tunnel add tun2 mode ipip remote 10.10.30.2 local 10.10.10.2
ip netns exec ns1 ip link set tun2 up
ip netns exec ns1 ip addr add 172.16.100.10 peer 172.16.30.30 dev tun2- ns2
ip netns exec ns3 ip tunnel add tun1 mode ipip remote 10.10.10.2 local 10.10.30.2
ip netns exec ns3 ip link set tun1 up
ip netns exec ns3 ip addr add 172.16.30.30 peer 172.16.100.10 dev tun1
查看namespace内部路由和隧道:
[root@worker-01 ~]# ip netns exec ns1 route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.10.10.0 0.0.0.0 255.255.255.0 U 0 0 0 v1
10.10.20.0 10.10.10.1 255.255.255.0 UG 0 0 0 v1
10.10.30.0 10.10.10.1 255.255.255.0 UG 0 0 0 v1
172.16.20.20 0.0.0.0 255.255.255.255 UH 0 0 0 tun1
172.16.30.30 0.0.0.0 255.255.255.255 UH 0 0 0 tun2
[root@worker-01 ~]# ip netns exec ns1 ip tunnel
tun1: ip/ip remote 10.10.20.2 local 10.10.10.2 ttl inherit
tun2: ip/ip remote 10.10.30.2 local 10.10.10.2 ttl inherit
tunl0: ip/ip remote any local any ttl inherit nopmtudisc
[root@worker-02 ~]# ip netns exec ns3 route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.10.10.0 10.10.30.1 255.255.255.0 UG 0 0 0 v3
10.10.30.0 0.0.0.0 255.255.255.0 U 0 0 0 v3
172.16.100.10 0.0.0.0 255.255.255.255 UH 0 0 0 tun1
[root@worker-02 ~]# ip netns exec ns3 ip tunnel
tun1: ip/ip remote 10.10.10.2 local 10.10.30.2 ttl inherit
tunl0: ip/ip remote any local any ttl inherit nopmtudisc
隧道联通情况:
[root@worker-02 ~]# ip netns exec ns3 ping 172.16.100.10
PING 172.16.100.10 (172.16.100.10) 56(84) bytes of data.
64 bytes from 172.16.100.10: icmp_seq=1 ttl=64 time=0.743 ms
64 bytes from 172.16.100.10: icmp_seq=2 ttl=64 time=0.737 ms
64 bytes from 172.16.100.10: icmp_seq=3 ttl=64 time=0.746 ms
64 bytes from 172.16.100.10: icmp_seq=4 ttl=64 time=1.03 ms
^C
--- 172.16.100.10 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3005ms
rtt min/avg/max/mdev = 0.737/0.815/1.035/0.128 ms
成功!
附:ipip包结构