macvlan是kernel提供的一种网卡虚拟化技术,可以将网卡(不一定是真实的物理网卡,virtio等虚拟网卡也可以)虚拟出多个接口,这个网卡称为master或者父接口,这些虚拟接口和外面环境通信都是通过父接口。

macvlan提供了五种模式: bridge, vepa, private, passthrough和source

private mode: 
  这种模式下,同一父接口下的子接口之间彼此隔离,不能通信。即
  使从外部的交换机导流再次被父接口接收,也会被无情地丢掉。

vepa(Virtual Ethernet Port Aggregator) mode:
  这种模式下,子接口之间的通信流量需要导到外部支持 
  802.1Qbg/VPEA 功能的交换机上(可以是物理的或者虚拟的), 
  经由外部交换机转发,再绕回来。
    注:802.1Qbg/VPEA 功能简单说就是交换机要支持发夹
   (hairpin)功能,也就是数据包从一个接口上收上来之后还能再扔回去。

bridge mode:
  这种模式下,模拟的是 Linux bridge 的功能,但比 bridge 要好的 
  一点是每个接口的 MAC 地址是已知的,不用学习。所以这种模 
  式下,子接口之间就是直接可以通信的。

passthru mode:
  这种模式,只允许单个子接口连接父接口。

source mode:
  这种模式,只接收源mac为指定的mac地址的报文。

下面图片是在网上截图的,各个模式下,数据如何转发。不管哪种模式,子接口都不能和父接口通信。




网卡 直通 虚拟化 网卡虚拟接口_kernel


image.png

下面是通过ip命令创建macvlan接口的help信息

root@node2:~# ip link add link ens8 dev macvlan1 type macvlan help
Usage: ... macvlan mode MODE [flag MODE_FLAG] MODE_OPTS

MODE: private | vepa | bridge | passthru | source
MODE_FLAG: null | nopromisc -->只针对passthru模式
MODE_OPTS: for mode "source": -->只针对source模式
        macaddr { { add | del } <macaddr> | set [ <macaddr> [ <macaddr>  ... ] ] | flush }

有两个选项需要注意:

a. nopromisc 配置只针对passthru模式
b. 添加macaddr的选项只针对source模式,以便在source模式下只接收从外部接收的源mac这些设定的值的报文

如何查看父接口和子接口的对应关系
a. 如果子接口和父接口都在同一个namespace,比如都在root namespace,则可以通过查看 /sys/class/net/xxx/下的链接文件获知。
注意两个前缀lower和upper,对于macvlan,从层级上看,父接口是网卡,子接口是虚拟出来的,所以父接口在下面,子接口在上面,所以lower表示父接口,而upper表示子接口。

#查看macvlan子接口的父接口(macvlan1的父接口为ens8)
root@node2:~# ls -l /sys/class/net/macvlan1/lower*
lrwxrwxrwx 1 root root 0 Oct 27 13:55 /sys/class/net/macvlan1/lower_ens8 -> ../../../pci0000:00/0000:00:08.0/net/ens8

#查看父接口的macvlan子接口(ens8有两个子接口: macvlan1和macvlan2)
root@node2:~# ls -l /sys/class/net/ens8/upper*
lrwxrwxrwx 1 root root 0 Oct 27 13:55 /sys/class/net/ens8/upper_macvlan1 -> ../../../../virtual/net/macvlan1
lrwxrwxrwx 1 root root 0 Oct 27 13:56 /sys/class/net/ens8/upper_macvlan2 -> ../../../../virtual/net/macvlan2

b. 如果子接口和父接口不在同一个namespace,比如将子接口放在其他namespace,方法a就不生效了。可查看网卡名字@后面的数字,比如下面的macvlan1@if3和macvlan2@if3,@后面的if3,表示对应root namespace的网卡索引,即ens8的索引号。

root@node2:~# ip netns exec test1 ip a
...
8308: macvlan1@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ce:dc:2c:c2:e3:ca brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 1.1.1.3/24 scope global macvlan1
       valid_lft forever preferred_lft forever
    inet6 fe80::ccdc:2cff:fec2:e3ca/64 scope link
       valid_lft forever preferred_lft forever

root@node2:~# ip netns exec test2 ip a
...
8309: macvlan2@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 22:c1:18:2a:68:25 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 1.1.1.4/24 scope global macvlan2
       valid_lft forever preferred_lft forever
    inet6 fe80::20c1:18ff:fe2a:6825/64 scope link
       valid_lft forever preferred_lft forever

root@node2:~# ip a
...
3: ens8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:55:4e:f8 brd ff:ff:ff:ff:ff:ff
    inet 1.1.1.2/24 scope global ens8
       valid_lft forever preferred_lft forever
    inet6 fe80::5054:ff:fe55:4ef8/64 scope link
       valid_lft forever preferred_lft forever

c. 不论父子接口是否在同一个ns,都可以通过查看父接口的fdb表得知,一般会把macvlan子接口的mac地址添加到父接口中

root@node2:~# bridge fdb show dev ens8
ce:dc:2c:c2:e3:ca self permanent  --->macvlan1的mac
22:c1:18:2a:68:25 self permanent --->macvlan2的mac
33:33:00:00:00:01 self permanent
01:00:5e:00:00:01 self permanent
33:33:ff:55:4e:f8 self permanent
33:33:ff:c2:e3:ca self permanent
33:33:ff:2a:68:25 self permanent

关于 nopromisc

#nopromisc 没有此标志时,父接口 ens8 会默认使能混杂模式(promiscuity 为1)
ip link add link ens8 dev macvlan1 type macvlan mode passthru
ip link set dev macvlan1 up
root@node2:~# ip -d link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 minmtu 0 maxmtu 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:bd:2b:7d brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 60 maxmtu 4096 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
3: ens8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:55:4e:f8 brd ff:ff:ff:ff:ff:ff promiscuity 1 minmtu 60 maxmtu 4096 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
25: macvlan1@ens8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:55:4e:f8 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 4096
    macvlan mode passthru addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

#nopromisc 加上此标志后,父接口 ens8 就不会使能混杂模式(promiscuity 为0)
ip link add link ens8 dev macvlan1 type macvlan mode passthru nopromisc
ip link set dev macvlan1 up
root@node2:~# ip -d link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 promiscuity 0 minmtu 0 maxmtu 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:bd:2b:7d brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 60 maxmtu 4096 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
3: ens8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:55:4e:f8 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 60 maxmtu 4096 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
26: macvlan1@ens8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:55:4e:f8 brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 68 maxmtu 4096
    macvlan mode passthru nopromisc addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

实践

在host上创建VM,此VM有两个网卡ens3和ens8,网卡在host上分别对应vnet0和vnet1,vnet0添加在默认网桥virbr0,vnet1添加在新建网桥br1上。下面创建macvlan子接口和namespace的操作都在VM内部完全。

a. bridge

//设置父接口ip
ip link set dev ens8 up
ip address add dev ens8 1.1.1.2/24

//创建两个macvlan子接口,模式为bridge
ip link add link ens8 dev macvlan1 type macvlan mode bridge
ip link add link ens8 dev macvlan2 type macvlan mode bridge

//创建两个namespace
ip netns add test1
ip netns add test2

//将两个macvlan子接口分别放在namespace中
ip link set dev macvlan1 netns test1
ip link set dev macvlan2 netns test2

//在namespace中分别配置两个子接口ip
ip netns exec test1 ip link set dev lo up
ip netns exec test1 ip link set dev macvlan1 up
ip netns exec test1 ip address add dev macvlan1 1.1.1.3/24

ip netns exec test2 ip link set dev lo up
ip netns exec test2 ip link set dev macvlan2 up
ip netns exec test2 ip address add dev macvlan2 1.1.1.4/24

root@node2:~# ip netns exec test1 ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state UP group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
92: macvlan1@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ae:f3:a6:e4:72:5f brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 1.1.1.3/24 scope global macvlan1
       valid_lft forever preferred_lft forever
    inet6 fe80::acf3:a6ff:fee4:725f/64 scope link
       valid_lft forever preferred_lft forever

root@node2:~# ip netns exec test2 ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state UP group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
93: macvlan2@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ba:11:1e:65:b6:89 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 1.1.1.4/24 scope global macvlan2
       valid_lft forever preferred_lft forever
    inet6 fe80::b811:1eff:fe65:b689/64 scope link
       valid_lft forever preferred_lft forever

#ping 父接口 不通
root@node2:~# ip netns exec test2 ping 1.1.1.2
PING 1.1.1.2 (1.1.1.2) 56(84) bytes of data.
^C
--- 1.1.1.2 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

#test1 ping test2 可以通
root@node2:~# ip netns exec test2 ping 1.1.1.3
PING 1.1.1.3 (1.1.1.3) 56(84) bytes of data.
64 bytes from 1.1.1.3: icmp_seq=1 ttl=64 time=0.450 ms
^C
--- 1.1.1.3 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.450/0.450/0.450/0.000 ms

b. private

//设置父接口ip
ip link set dev ens8 up
ip address add dev ens8 1.1.1.2/24

//创建两个macvlan子接口,模式为private
ip link add link ens8 dev macvlan1 type macvlan mode private
ip link add link ens8 dev macvlan2 type macvlan mode private

//创建两个namespace
ip netns add test1
ip netns add test2

//将两个macvlan子接口分别放在namespace中
ip link set dev macvlan1 netns test1
ip link set dev macvlan2 netns test2

//在namespace中分别配置两个子接口ip
ip netns exec test1 ip link set dev lo up
ip netns exec test1 ip link set dev macvlan1 up
ip netns exec test1 ip address add dev macvlan1 1.1.1.3/24

ip netns exec test2 ip link set dev lo up
ip netns exec test2 ip link set dev macvlan2 up
ip netns exec test2 ip address add dev macvlan2 1.1.1.4/24

#ping 主接口不通
root@node2:~# ip netns exec test2 ping 1.1.1.2
PING 1.1.1.2 (1.1.1.2) 56(84) bytes of data.
^C
--- 1.1.1.2 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

#test2 ping test1 不通
root@node2:~# ip netns exec test2 ping 1.1.1.3
PING 1.1.1.3 (1.1.1.3) 56(84) bytes of data.
^C
--- 1.1.1.3 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1001ms
  1. vepa
//设置父接口ip
ip link set dev ens8 up
ip address add dev ens8 1.1.1.2/24

//创建两个macvlan子接口,模式为vepa
ip link add link ens8 dev macvlan1 type macvlan mode vepa
ip link add link ens8 dev macvlan2 type macvlan mode vepa

//创建两个namespace
ip netns add test1
ip netns add test2

//将两个macvlan子接口分别放在namespace中
ip link set dev macvlan1 netns test1
ip link set dev macvlan2 netns test2

//在namespace中分别配置两个子接口ip
ip netns exec test1 ip link set dev lo up
ip netns exec test1 ip link set dev macvlan1 up
ip netns exec test1 ip address add dev macvlan1 1.1.1.3/24

ip netns exec test2 ip link set dev lo up
ip netns exec test2 ip link set dev macvlan2 up
ip netns exec test2 ip address add dev macvlan2 1.1.1.4/24

#ping 父接口不通
root@node2:~# ip netns exec test2 ping 1.1.1.2
PING 1.1.1.2 (1.1.1.2) 56(84) bytes of data.
^C
--- 1.1.1.2 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

#test2 ping test1 不通
root@node2:~# ip netns exec test2 ping 1.1.1.3
PING 1.1.1.3 (1.1.1.3) 56(84) bytes of data.
^C
--- 1.1.1.3 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
  1. passthru
//设置父接口ip
ip link set dev ens8 up
ip address add dev ens8 1.1.1.2/24

ip link add link ens8 dev macvlan1 type macvlan mode passthru
#只能添加一个passthru模式的子接口,再次添加报错(添加其他模式也不行)
ip link add link ens8 dev macvlan2 type macvlan mode passthru
    RTNETLINK answers: File exists
    
ip netns add test1
ip link set dev macvlan1 netns test1

ip netns exec test1 ip link set dev lo up
ip netns exec test1 ip link set dev macvlan1 up
ip netns exec test1 ip address add dev macvlan1 1.1.1.3/24

#ping 父接口不通
root@node2:~# ip netns exec test1 ping 1.1.1.2
PING 1.1.1.2 (1.1.1.2) 56(84) bytes of data.
^C
--- 1.1.1.2 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
  1. source
//br1在host上,VM内部的网卡ens8在host上对应的vnet1加在网桥br1上。
//给br1配置ip 1.1.1.9
root@ubuntu:~# ip address show dev br1
25: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 5e:88:02:89:d5:51 brd ff:ff:ff:ff:ff:ff
    inet 1.1.1.9/24 brd 1.1.1.255 scope global br1
       valid_lft forever preferred_lft forever
    inet6 fe80::5c88:2ff:fe89:d551/64 scope link
       valid_lft forever preferred_lft forever
       
root@ubuntu:~# brctl show
bridge name     bridge id               STP enabled     interfaces
br1             8000.5e880289d550       no              vnet1  ---> vnet1对应VM内部ens8
//vm内部执行下面几条命令
//创建macvlan1接口,模式为source,指定mac地址为br1的mac地址
ip link add link ens8 dev macvlan1 type macvlan mode source macaddr add 5e:88:02:89:d5:50
ip link set dev macvlan1 up
ip address add dev macvlan1 1.1.1.4/24

//vm内部ping br1是可以通的
root@node2:~# ping 1.1.1.9
PING 1.1.1.9 (1.1.1.9) 56(84) bytes of data.
64 bytes from 1.1.1.9: icmp_seq=1 ttl=64 time=1.45 ms
64 bytes from 1.1.1.9: icmp_seq=2 ttl=64 time=0.300 ms
64 bytes from 1.1.1.9: icmp_seq=3 ttl=64 time=0.339 ms
^C
--- 1.1.1.9 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2005ms
rtt min/avg/max/mdev = 0.300/0.696/1.450/0.533 ms
//在host修改br1的mac地址
root@ubuntu:~# ip link set dev br1 address 5e:88:02:89:d5:51

//在vm内部再次ping就不通了
root@node2:~# ping 1.1.1.9
PING 1.1.1.9 (1.1.1.9) 56(84) bytes of data.
^C
--- 1.1.1.9 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1028ms

参考

macvlan虚拟接口 - 简书 (jianshu.com)