摘要:
系统的运行环境是私有的局域网,由于前期对网络配置不够重视,导致出现很多诡异的问题,下面就是其中一个。
1检查Node网络
1.1 VPC 中Node节点中网络配置
可以看到网卡是多播传送(MultiCast Model),发送(RX packets)和接收数据(TX packets)都正常,丢包(dropped)也不多。其中txqueuelen值可能设置的有点低,可以简单理解为一个流量队列的大小,有人遇到过值太低导致丢包的问题:https://mozillazg.com/2019/06/linux-client-io-timeout-server-lost-drop-packet-tcp-retransmitted-ifconfig-txqueuelen.html。涉及到网络栈队列的问题可以看下: https://zhensheng.im/2017/08/11/2997/MIAO_LE_GE_MI。
[root@node12 ~]# ifconfig
cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.244.6.1 netmask 255.255.255.0 broadcast 0.0.0.0
inet6 fe80::f839:b1ff:fe92:a27e prefixlen 64 scopeid 0x20<link>
ether fa:39:b1:92:a2:7e txqueuelen 1000 (Ethernet)
RX packets 220929038 bytes 26319025754 (24.5 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 222476494 bytes 188379339618 (175.4 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:0a:42:bd:5f txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.0.10.203 netmask 255.255.255.0 broadcast 10.0.10.255
inet6 fe80::1827:f41f:a885:e814 prefixlen 64 scopeid 0x20<link>
ether 52:54:00:34:d2:50 txqueuelen 1000 (Ethernet)
RX packets 878395880 bytes 1819091657583 (1.6 TiB)
RX errors 0 dropped 10 overruns 0 frame 0
TX packets 878197799 bytes 584187305684 (544.0 GiB)
TX errors 0 dropped 17 overruns 0 carrier 0 collisions 0
flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet 10.244.6.0 netmask 255.255.255.255 broadcast 0.0.0.0
inet6 fe80::80e8:edff:feac:4290 prefixlen 64 scopeid 0x20<link>
ether 82:e8:ed:ac:42:90 txqueuelen 0 (Ethernet)
RX packets 17209431 bytes 2340765757 (2.1 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 15718881 bytes 3617396244 (3.3 GiB)
TX errors 0 dropped 8 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 630180009 bytes 182960242846 (170.3 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 630180009 bytes 182960242846 (170.3 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
veth0ff2b34e: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1450
inet6 fe80::7460:feff:fe6c:5e94 prefixlen 64 scopeid 0x20<link>
ether 76:60:fe:6c:5e:94 txqueuelen 0 (Ethernet)
RX packets 2755598 bytes 396180446 (377.8 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 2514414 bytes 924700287 (881.8 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
备注:网卡工作模式
(1) 广播模式(Broad Cast Model):它的物理地址(MAC)地址是 0Xffffff 的帧为广播帧,工作在广播模式的网卡接收广播帧。
(2)多播传送(MultiCast Model):多播传送地址作为目的物理地址的帧可以被组内的其它主机同时接收,而组外主机却接收不到。但是,如果将网卡设置为多播传送模式,它可以接收所有的多播传送帧,而不论它是不是组内成员。
(3)直接模式(Direct Model):工作在直接模式下的网卡只接收目地址是自己 Mac地址的帧。
(4)混杂模式(Promiscuous Model):工作在混杂模式下的网卡接收所有的流过网卡的帧,信包捕获程序就是在这种模式下运行的。网卡的缺省工作模式包含广播模式和直接模式,即它只接收广播帧和发给自己的帧。如果采用混杂模式,一个站点的网卡将接受同一网络内所有站点所发送的数据包这样就可以到达对于网络信息监视捕获的目的。
1.2 Ping响应时间
1.2.1获取K8S pod 副本信息
[root@kvm17-pre ~]# kubectl get pods -n lenovo -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
alarm-86c4cf5b48-8xbds 1/1 Running 0 29h 10.244.6.71 kvm17-pre <none> <none>
apigateway-595885b888-9xcd7 1/1 Running 0 11d 10.244.5.102 kvm15-pre <none> <none>
apigateway-595885b888-m2622 1/1 Running 0 11d 10.244.3.176 kvm11-pre <none> <none>
apigateway-595885b888-nbg7v 1/1 Running 0 11d 10.244.7.136 kvm18-pre <none> <none>
ceph-547b7cb5f6-pfh6g 1/1 Running 0 11d 10.244.3.175 kvm11-pre <none> <none>
ceph-547b7cb5f6-xllfn 1/1 Running 0 11d 10.244.6.45 kvm17-pre <none> <none>
device-9978469fb-769n8 1/1 Running 0 29h 10.244.3.190 kvm11-pre <none> <none>
device-9978469fb-hh7k6 1/1 Running 0 29h 10.244.4.187 kvm13-pre <none> <none>
device-9978469fb-q7b7j 1/1 Running 0 29h 10.244.6.72 kvm17-pre <none> <none>
gateway-8476c58bc4-2gtcm 1/1 Running 0 11d 10.244.5.101 kvm15-pre <none> <none>
gateway-8476c58bc4-dlmz7 1/1 Running 0 11d 10.244.6.43 kvm17-pre <none> <none>
gateway-8476c58bc4-vglr4 1/1 Running 0 11d 10.244.3.173 kvm11-pre <none> <none>
......
......
1.2.2 通过ping命令检查POD到POD和NODE到POD
基本都没有问题。
随便列一个: NODE到POD
[root@kvm17-pre ~]# ping 10.244.5.102
PING 10.244.5.102 (10.244.5.102) 56(84) bytes of data.
64 bytes from 10.244.5.102: icmp_seq=1 ttl=63 time=0.744 ms
64 bytes from 10.244.5.102: icmp_seq=2 ttl=63 time=0.504 ms
64 bytes from 10.244.5.102: icmp_seq=3 ttl=63 time=0.522 ms
64 bytes from 10.244.5.102: icmp_seq=4 ttl=63 time=0.364 ms
随便列一个: POD到POD
[root@kvm17-pre ~]# kubectl exec -it alarm-86c4cf5b48-8xbds -n lenovo /bin/bash
root@alarm-86c4cf5b48-8xbds:/opt/service/lenovo_alarm# ping 10.244.7.145
PING 10.244.7.145 (10.244.7.145) 56(84) bytes of data.
64 bytes from 10.244.7.145: icmp_seq=1 ttl=62 time=1.17 ms
64 bytes from 10.244.7.145: icmp_seq=2 ttl=62 time=0.469 ms
64 bytes from 10.244.7.145: icmp_seq=3 ttl=62 time=0.569 ms
64 bytes from 10.244.7.145: icmp_seq=4 ttl=62 time=0.617 ms
64 bytes from 10.244.7.145: icmp_seq=5 ttl=62 time=0.589 ms
64 bytes from 10.244.7.145: icmp_seq=6 ttl=62 time=0.857 ms
1.3 TCP 包是否异常
发现 SYN 被丢弃数量并不多
[root@kvm17-pre ~]# netstat -s | grep LISTEN
69 SYNs to LISTEN sockets dropped
[root@kvm17-pre ~]# netstat -s | grep LISTEN
69 SYNs to LISTEN sockets dropped
[root@kvm17-pre ~]# netstat -s | grep LISTEN
69 SYNs to LISTEN sockets dropped
[root@kvm17-pre ~]# netstat -s | grep LISTEN
69 SYNs to LISTEN sockets dropped
2、主机网卡工作模式、状态
2.1查询网卡
首先使用ifconfig -a查询出所有安装的网卡。例如得出bond0
[root@shasgm02-mdb ~]# ifconfig
bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 1500
inet 10.0.10.42 netmask 255.255.255.0 broadcast 10.0.10.255
inet6 fe80::3a68:ddff:fe10:8a88 prefixlen 64 scopeid 0x20<link>
ether 38:68:dd:10:8a:88 txqueuelen 1000 (Ethernet)
RX packets 108228721 bytes 13224522885 (12.3 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 117889496 bytes 34058879891 (31.7 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
2.2检查工作模式是否正确
可以使用如下任意一个命令:
mii-tool -v bond0
ethtool bond0
dmseg | grep ' bond0'
检查内容如下
[root@shasgm02-mdb ~]# mii-tool -v bond0
bond0: 10 Mbit, half duplex, link ok
product info: vendor 00:01:00, model 0 rev 4
basic mode: 10 Mbit, half duplex
basic status: link ok
capabilities:
advertising:
3总结
对于局域网,K8S集群的应该不会出现由于带宽太小导致请求卡慢的问题,如果当出现类似的问题,首先要考虑的是检查路由器配置或者网卡的配置。再有就是在做性能测试的时候,最好使用网线直连,而不是使用AP热点连接到局域网。