Network NameSpace

Linux的namespace(名字空间)的作用就是“隔离内核资源”。而network namespace(Linux内核2.6版本引入)作用是隔离Linux系统的设备,以及IP地址、端口、路由表、防火墙规则等网络资源。因此,每个network namespace里都有自己的网络设备(如IP地址、路由表、端口范围、/proc/net目录等)。

Linux虚拟网络组件

veth pair

  • veth是虚拟以太网卡(Virtual Ethernet)的缩写。veth设备总是成对的,因此被称之为veth pair。
  • veth pair常被用于跨network namespace之间的通信,即分别将veth pair的两端放在不同的namespace里,如下图

pod namespace 网段划分_kubernetes

  • 以上方式仅适合于两个Namespace通信,一般我们会利用veth peer于namespace外界相连,如下图

Linux bridge

Linux bridge就是Linux系统中的网桥,但是Linux bridge的行为更像是一台虚拟的网络交换机,任意的真实物理设备(例如eth0)和虚拟设备(例如veth pair和tap设备)都可以连接到Linux bridge上。

  • Linux bridge不能跨机连接网络设备。

tun/tap

  • tun表示虚拟的是点对点设备
  • tap表示虚拟的是以太网设备
  • 两种设备针对网络包实施不同的封装

工作原理

普通通信情况,app通过socket API到达linux网络协议栈

pod namespace 网段划分_pod namespace 网段划分_02


使用Tun/tap以后,tun设备通过一个设备文件(/dev/tunX)收发数据包。所有对这个文件的写操作会通过tun设备转换成一个数据包传送给内核网络协议栈。

pod namespace 网段划分_网络协议_03

  • tun设备的/dev/tunX文件收发的是IP包,只能工作在L3,无法与物理网卡做桥接,但可以通过三层交换(例如ip_forward)与物理网卡连通
  • tap设备的/dev/tapX文件收发的是链路层数据包,可以与物理网卡做桥接

Linux L3隧道

Linux L3隧道底层实现原理都基于tun设备,因此可以将Linux L3隧道看作tun设备的高级应用篇。
Linux原生支持下列5种L3隧道:

  • ipip:即IPv4 in IPv4,在IPv4报文的基础上封装一个IPv4报文;
  • GRE:即通用路由封装(Generic Routing Encapsulation),定义了在任意一种网络层协议上封装其他任意一种网络层协议的机制,适用于IPv4和IPv6;
  • sit:和ipip类似,不同的是sit用IPv4报文封装IPv6报文,即IPv6 over IPv4;
  • ISATAP:即站内自动隧道寻址协议(Intra-Site Automatic Tunnel AddressingProtocol),与sit类似,也用于IPv6的隧道封装;
  • VTI:即虚拟隧道接口(Virtual Tunnel Interface),是思科提出的一种IPSec隧道技术。

本文以ipip为例,学习Linux隧道通信的基本原理。

ipip实验

本实验分两部分

  1. 同一主机的不同namespace间通过ipip建立隧道
  2. 不同主机的namespace之间通过ipip建立隧道

环境拓扑

pod namespace 网段划分_linux_04

Name

宿主

IP Addr

peer

备注

主机1 ens33

Host 01

192.168.21.11/24

物理网卡

主机2 ens33

Host 02

192.168.21.12/24

物理网卡

v1

Host 01:ns1

10.10.10.2/24

v1_p

veth peer

v2

Host 01:ns2

10.10.20.2/24

v2_p

veth peer

v3

Host 02:ns3

10.10.30.2/24

v3_p

veth peer

v1_p

Host 01:ns1

10.10.10.1/24

v1

veth peer

v2_p

Host 01:ns2

10.10.20.1/24

v2

veth peer

v3_p

Host 02:ns3

10.10.30.1/24

v3

veth peer

tun1@ns1

Host 01

172.16.10.10/32

tun1@ns2

tunnel

tun2@ns1

Host 01

172.16.100.10/32

tun1@ns3

tunnel

tun1@ns2

Host 01

172.16.20.20/32

tun1@ns1

tunnel

tun1@ns3

Host 02

172.16.30.30/32

tun2@ns1

tunnel

实验步骤

同一主机不同namespace的ipip隧道

  1. Host 01上创建namespace

ip netns add ns1
ip netns add ns2

  1. 创建veth peer,并挂在相应的namespace下

ip link add v1 type veth peer name v1_p
ip link add v2 type veth peer name v2_p

ip link set v1 netns ns1
ip link set v2 netns ns2

  1. 在linux下配置veth peer的一端

ip addr add 10.10.10.1/24 dev v1_p
ip link set v1_p up
ip addr add 10.10.20.1/24 dev v2_p
ip link set v2_p up

  1. 在namespace中配置veth peer的另一端

ip netns exec ns1 ip addr add 10.10.10.2/24 dev v1
ip netns exec ns1 ip link set v1 up

ip netns exec ns2 ip addr add 10.10.20.2/24 dev v2
ip netns exec ns2 ip link set v2 up

  1. 查看Linux网络协议栈中的ip_forward状态

[root@worker-01 ~]# cat /proc/sys/net/ipv4/ip_forward
1
#1表示ip转发已打开,如是0的化,可以用一下方法打开

  • echo 1 /proc/sys/net/ipv4/ip_forward #临时方案
  • 修改或增加 /etc/sysctl.confnet.ipv4.ip_forward = 1#永久方案
  1. 增加namespace内部的对端路由

ip netns exec ns1 route add -net 10.10.20.0 netmask 255.255.255.0 gw 10.10.10.1
ip netns exec ns2 route add -net 10.10.10.0 netmask 255.255.255.0 gw 10.10.20.1

  1. 在namespace内部检查连通性,确定隧道两点的可达
[root@worker-01 ~]#  ip netns exec ns1 ping 10.10.20.2
PING 10.10.20.2 (10.10.20.2) 56(84) bytes of data.
64 bytes from 10.10.20.2: icmp_seq=1 ttl=63 time=0.053 ms
64 bytes from 10.10.20.2: icmp_seq=2 ttl=63 time=0.125 ms
64 bytes from 10.10.20.2: icmp_seq=3 ttl=63 time=0.104 ms
^C
--- 10.10.20.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.053/0.094/0.125/0.030 ms
[root@worker-01 ~]#  ip netns exec ns2 ping 10.10.10.2
PING 10.10.10.2 (10.10.10.2) 56(84) bytes of data.
64 bytes from 10.10.10.2: icmp_seq=1 ttl=63 time=0.071 ms
64 bytes from 10.10.10.2: icmp_seq=2 ttl=63 time=0.086 ms
64 bytes from 10.10.10.2: icmp_seq=3 ttl=63 time=0.054 ms
64 bytes from 10.10.10.2: icmp_seq=4 ttl=63 time=0.098 ms
^C
--- 10.10.10.2 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 0.054/0.077/0.098/0.017 ms
  1. 在namespace中创建tunnel
  • ns1
    ip netns exec ns1 ip tunnel add tun1 mode ipip remote 10.10.20.2 local 10.10.10.2
    ip netns exec ns1 ip link set tun1 up
  • ns2
    ip netns exec ns2 ip tunnel add tun1 mode ipip remote 10.10.10.2 local 10.10.20.2
    ip netns exec ns1 ip link set tun1 up
  1. 端口情况和隧道可达性
    host 01的端口情况
[root@worker-01 ~]# ifconfig
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.21.11  netmask 255.255.255.0  broadcast 192.168.21.255
        inet6 fe80::f2a:f203:693b:c087  prefixlen 64  scopeid 0x20<link>
        inet6 fe80::2c94:b9fd:ac8:6411  prefixlen 64  scopeid 0x20<link>
        inet6 fe80::d95:b4d8:be84:cafa  prefixlen 64  scopeid 0x20<link>
        ether 00:0c:29:3b:c1:97  txqueuelen 1000  (Ethernet)
        RX packets 6123689  bytes 4238387536 (3.9 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5925660  bytes 4772785528 (4.4 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

...
v1_p: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.10.10.1  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fe80::bc35:a6ff:fe8c:b03  prefixlen 64  scopeid 0x20<link>
        ether be:35:a6:8c:0b:03  txqueuelen 1000  (Ethernet)
        RX packets 138  bytes 12040 (11.7 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 32090  bytes 5790391 (5.5 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

v2_p: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.10.20.1  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fe80::5808:d2ff:fee9:83ab  prefixlen 64  scopeid 0x20<link>
        ether 5a:08:d2:e9:83:ab  txqueuelen 1000  (Ethernet)
        RX packets 46  bytes 3656 (3.5 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 31218  bytes 5637199 (5.3 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

查看namespace内部的端口

[root@worker-01 ~]#  ip netns exec ns1 ifconfig
tun1: flags=209<UP,POINTOPOINT,RUNNING,NOARP>  mtu 1480
     inet 172.16.10.10  netmask 255.255.255.255  destination 172.16.20.20
     inet6 fe80::5efe:a0a:a02  prefixlen 64  scopeid 0x20<link>
     tunnel   txqueuelen 1000  (IPIP Tunnel)
     RX packets 3  bytes 252 (252.0 B)
     RX errors 0  dropped 0  overruns 0  frame 0
     TX packets 3  bytes 252 (252.0 B)
     TX errors 3  dropped 0 overruns 0  carrier 0  collisions 0

v1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
     inet 10.10.10.2  netmask 255.255.255.0  broadcast 0.0.0.0
     inet6 fe80::18f7:71ff:fecb:e113  prefixlen 64  scopeid 0x20<link>
     ether 1a:f7:71:cb:e1:13  txqueuelen 1000  (Ethernet)
     RX packets 32088  bytes 5790055 (5.5 MiB)
     RX errors 0  dropped 0  overruns 0  frame 0
     TX packets 138  bytes 12040 (11.7 KiB)
     TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
[root@worker-01 ~]#  ip netns exec ns2 ifconfig
tun2: flags=209<UP,POINTOPOINT,RUNNING,NOARP>  mtu 1480
        inet 172.16.20.20  netmask 255.255.255.255  destination 172.16.10.10
        inet6 fe80::5efe:a0a:1402  prefixlen 64  scopeid 0x20<link>
        tunnel   txqueuelen 1000  (IPIP Tunnel)
        RX packets 3  bytes 252 (252.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3  bytes 252 (252.0 B)
        TX errors 3  dropped 0 overruns 0  carrier 0  collisions 0

v2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.10.20.2  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fe80::f483:3eff:feb2:8b3f  prefixlen 64  scopeid 0x20<link>
        ether f6:83:3e:b2:8b:3f  txqueuelen 1000  (Ethernet)
        RX packets 31369  bytes 5664578 (5.4 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 46  bytes 3656 (3.5 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

可以看到隧道都已经建成。

隧道访问情况:

[root@worker-01 ~]#  ip netns exec ns1 ping 172.16.20.20
PING 172.16.20.20 (172.16.20.20) 56(84) bytes of data.
64 bytes from 172.16.20.20: icmp_seq=1 ttl=64 time=0.109 ms
64 bytes from 172.16.20.20: icmp_seq=2 ttl=64 time=0.104 ms
64 bytes from 172.16.20.20: icmp_seq=3 ttl=64 time=0.087 ms
^C
--- 172.16.20.20 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.087/0.100/0.109/0.009 ms
[root@worker-01 ~]#  ip netns exec ns2 ping 172.16.10.10
PING 172.16.10.10 (172.16.10.10) 56(84) bytes of data.
64 bytes from 172.16.10.10: icmp_seq=1 ttl=64 time=0.078 ms
64 bytes from 172.16.10.10: icmp_seq=2 ttl=64 time=0.093 ms
64 bytes from 172.16.10.10: icmp_seq=3 ttl=64 time=0.373 ms
^C
--- 172.16.10.10 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.078/0.181/0.373/0.136 ms

成功!

不同主机namespace的ipip隧道

  1. Host 02上创建namespace

ip netns add ns3

  1. host 02创建veth peer,并挂在namespace下

ip link add v3 type veth peer name v3_p

ip link set v3 netns ns3

  1. 在host 02 linux下配置veth peer的一端

ip addr add 10.10.30.1/24 dev v3_p
ip link set v3_p up

  1. host 02在namespace中配置veth peer的另一端

ip netns exec ns3 ip addr add 10.10.30.2/24 dev v3
ip netns exec ns3 ip link set v3 up

  1. host 02查看Linux网络协议栈中的ip_forward状态
  2. host 01 和02中配置路由,确保到veth peer的子网联通

ns1
route add -net 10.10.30.0/24 gw 192.168.21.12 dev ens33
ns3
route add -net 10.10.10.0/24 gw 192.168.21.11 dev ens33

  1. host 01和 02增加namespace内部的对端路由

ip netns exec ns1 route add -net 10.10.30.0 netmask 255.255.255.0 gw 10.10.10.1
ip netns exec ns3 route add -net 10.10.10.0 netmask 255.255.255.0 gw 10.10.30.1

  1. host 01和host 02创建namespace中tunnel并配置IP地址
  • ns1
    ip netns exec ns1 ip tunnel add tun2 mode ipip remote 10.10.30.2 local 10.10.10.2
    ip netns exec ns1 ip link set tun2 up
    ip netns exec ns1 ip addr add 172.16.100.10 peer 172.16.30.30 dev tun2
  • ns2
    ip netns exec ns3 ip tunnel add tun1 mode ipip remote 10.10.10.2 local 10.10.30.2
    ip netns exec ns3 ip link set tun1 up
    ip netns exec ns3 ip addr add 172.16.30.30 peer 172.16.100.10 dev tun1

查看namespace内部路由和隧道:

[root@worker-01 ~]# ip netns exec ns1 route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.10.10.0      0.0.0.0         255.255.255.0   U     0      0        0 v1
10.10.20.0      10.10.10.1      255.255.255.0   UG    0      0        0 v1
10.10.30.0      10.10.10.1      255.255.255.0   UG    0      0        0 v1
172.16.20.20    0.0.0.0         255.255.255.255 UH    0      0        0 tun1
172.16.30.30    0.0.0.0         255.255.255.255 UH    0      0        0 tun2

[root@worker-01 ~]# ip netns exec ns1 ip tunnel
tun1: ip/ip remote 10.10.20.2 local 10.10.10.2 ttl inherit
tun2: ip/ip remote 10.10.30.2 local 10.10.10.2 ttl inherit
tunl0: ip/ip remote any local any ttl inherit nopmtudisc
[root@worker-02 ~]# ip netns exec ns3 route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.10.10.0      10.10.30.1      255.255.255.0   UG    0      0        0 v3
10.10.30.0      0.0.0.0         255.255.255.0   U     0      0        0 v3
172.16.100.10   0.0.0.0         255.255.255.255 UH    0      0        0 tun1

[root@worker-02 ~]# ip netns exec ns3 ip tunnel
tun1: ip/ip remote 10.10.10.2 local 10.10.30.2 ttl inherit
tunl0: ip/ip remote any local any ttl inherit nopmtudisc

隧道联通情况:

[root@worker-02 ~]# ip netns exec ns3 ping 172.16.100.10
PING 172.16.100.10 (172.16.100.10) 56(84) bytes of data.
64 bytes from 172.16.100.10: icmp_seq=1 ttl=64 time=0.743 ms
64 bytes from 172.16.100.10: icmp_seq=2 ttl=64 time=0.737 ms
64 bytes from 172.16.100.10: icmp_seq=3 ttl=64 time=0.746 ms
64 bytes from 172.16.100.10: icmp_seq=4 ttl=64 time=1.03 ms
^C
--- 172.16.100.10 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3005ms
rtt min/avg/max/mdev = 0.737/0.815/1.035/0.128 ms

成功!

附:ipip包结构

pod namespace 网段划分_kubernetes_05