摘要:

系统的运行环境是私有的局域网,由于前期对网络配置不够重视,导致出现很多诡异的问题,下面就是其中一个。

1检查Node网络

1.1 VPC 中Node节点中网络配置

可以看到网卡是多播传送(MultiCast Model),发送(RX packets)和接收数据(TX packets)都正常,丢包(dropped)也不多。其中txqueuelen值可能设置的有点低,可以简单理解为一个流量队列的大小,有人遇到过值太低导致丢包的问题:https://mozillazg.com/2019/06/linux-client-io-timeout-server-lost-drop-packet-tcp-retransmitted-ifconfig-txqueuelen.html。涉及到网络栈队列的问题可以看下: https://zhensheng.im/2017/08/11/2997/MIAO_LE_GE_MI。

[root@node12 ~]# ifconfig

cni0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.6.1  netmask 255.255.255.0  broadcast 0.0.0.0
        inet6 fe80::f839:b1ff:fe92:a27e  prefixlen 64  scopeid 0x20<link>
        ether fa:39:b1:92:a2:7e  txqueuelen 1000  (Ethernet)
        RX packets 220929038  bytes 26319025754 (24.5 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 222476494  bytes 188379339618 (175.4 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 172.17.255.255
        ether 02:42:0a:42:bd:5f  txqueuelen 0  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.10.203  netmask 255.255.255.0  broadcast 10.0.10.255
        inet6 fe80::1827:f41f:a885:e814  prefixlen 64  scopeid 0x20<link>
        ether 52:54:00:34:d2:50  txqueuelen 1000  (Ethernet)
        RX packets 878395880  bytes 1819091657583 (1.6 TiB)
        RX errors 0  dropped 10  overruns 0  frame 0
        TX packets 878197799  bytes 584187305684 (544.0 GiB)
        TX errors 0  dropped 17 overruns 0  carrier 0  collisions 0

flannel.1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet 10.244.6.0  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::80e8:edff:feac:4290  prefixlen 64  scopeid 0x20<link>
        ether 82:e8:ed:ac:42:90  txqueuelen 0  (Ethernet)
        RX packets 17209431  bytes 2340765757 (2.1 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 15718881  bytes 3617396244 (3.3 GiB)
        TX errors 0  dropped 8 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 630180009  bytes 182960242846 (170.3 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 630180009  bytes 182960242846 (170.3 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth0ff2b34e: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1450
        inet6 fe80::7460:feff:fe6c:5e94  prefixlen 64  scopeid 0x20<link>
        ether 76:60:fe:6c:5e:94  txqueuelen 0  (Ethernet)
        RX packets 2755598  bytes 396180446 (377.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2514414  bytes 924700287 (881.8 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
备注:网卡工作模式
(1) 广播模式(Broad Cast Model):它的物理地址(MAC)地址是 0Xffffff 的帧为广播帧,工作在广播模式的网卡接收广播帧。
(2)多播传送(MultiCast Model):多播传送地址作为目的物理地址的帧可以被组内的其它主机同时接收,而组外主机却接收不到。但是,如果将网卡设置为多播传送模式,它可以接收所有的多播传送帧,而不论它是不是组内成员。
(3)直接模式(Direct Model):工作在直接模式下的网卡只接收目地址是自己 Mac地址的帧。
(4)混杂模式(Promiscuous Model):工作在混杂模式下的网卡接收所有的流过网卡的帧,信包捕获程序就是在这种模式下运行的。网卡的缺省工作模式包含广播模式和直接模式,即它只接收广播帧和发给自己的帧。如果采用混杂模式,一个站点的网卡将接受同一网络内所有站点所发送的数据包这样就可以到达对于网络信息监视捕获的目的。

1.2 Ping响应时间

1.2.1获取K8S pod 副本信息

[root@kvm17-pre ~]# kubectl get pods -n lenovo -o wide
NAME                            READY   STATUS    RESTARTS   AGE    IP             NODE        NOMINATED NODE   READINESS GATES
alarm-86c4cf5b48-8xbds          1/1     Running   0          29h    10.244.6.71    kvm17-pre   <none>           <none>
apigateway-595885b888-9xcd7     1/1     Running   0          11d    10.244.5.102   kvm15-pre   <none>           <none>
apigateway-595885b888-m2622     1/1     Running   0          11d    10.244.3.176   kvm11-pre   <none>           <none>
apigateway-595885b888-nbg7v     1/1     Running   0          11d    10.244.7.136   kvm18-pre   <none>           <none>
ceph-547b7cb5f6-pfh6g           1/1     Running   0          11d    10.244.3.175   kvm11-pre   <none>           <none>
ceph-547b7cb5f6-xllfn           1/1     Running   0          11d    10.244.6.45    kvm17-pre   <none>           <none>
device-9978469fb-769n8          1/1     Running   0          29h    10.244.3.190   kvm11-pre   <none>           <none>
device-9978469fb-hh7k6          1/1     Running   0          29h    10.244.4.187   kvm13-pre   <none>           <none>
device-9978469fb-q7b7j          1/1     Running   0          29h    10.244.6.72    kvm17-pre   <none>           <none>
gateway-8476c58bc4-2gtcm        1/1     Running   0          11d    10.244.5.101   kvm15-pre   <none>           <none>
gateway-8476c58bc4-dlmz7        1/1     Running   0          11d    10.244.6.43    kvm17-pre   <none>           <none>
gateway-8476c58bc4-vglr4        1/1     Running   0          11d    10.244.3.173   kvm11-pre   <none>           <none>
......
......

1.2.2  通过ping命令检查POD到POD和NODE到POD

基本都没有问题。

随便列一个: NODE到POD

[root@kvm17-pre ~]# ping 10.244.5.102
PING 10.244.5.102 (10.244.5.102) 56(84) bytes of data.
64 bytes from 10.244.5.102: icmp_seq=1 ttl=63 time=0.744 ms
64 bytes from 10.244.5.102: icmp_seq=2 ttl=63 time=0.504 ms
64 bytes from 10.244.5.102: icmp_seq=3 ttl=63 time=0.522 ms
64 bytes from 10.244.5.102: icmp_seq=4 ttl=63 time=0.364 ms

随便列一个: POD到POD

[root@kvm17-pre ~]# kubectl exec -it alarm-86c4cf5b48-8xbds   -n lenovo /bin/bash
root@alarm-86c4cf5b48-8xbds:/opt/service/lenovo_alarm# ping 10.244.7.145
PING 10.244.7.145 (10.244.7.145) 56(84) bytes of data.
64 bytes from 10.244.7.145: icmp_seq=1 ttl=62 time=1.17 ms
64 bytes from 10.244.7.145: icmp_seq=2 ttl=62 time=0.469 ms
64 bytes from 10.244.7.145: icmp_seq=3 ttl=62 time=0.569 ms
64 bytes from 10.244.7.145: icmp_seq=4 ttl=62 time=0.617 ms
64 bytes from 10.244.7.145: icmp_seq=5 ttl=62 time=0.589 ms
64 bytes from 10.244.7.145: icmp_seq=6 ttl=62 time=0.857 ms

1.3 TCP 包是否异常

发现 SYN 被丢弃数量并不多

[root@kvm17-pre ~]# netstat -s | grep LISTEN
    69 SYNs to LISTEN sockets dropped
[root@kvm17-pre ~]# netstat -s | grep LISTEN
    69 SYNs to LISTEN sockets dropped
[root@kvm17-pre ~]# netstat -s | grep LISTEN
    69 SYNs to LISTEN sockets dropped
[root@kvm17-pre ~]# netstat -s | grep LISTEN
    69 SYNs to LISTEN sockets dropped

2、主机网卡工作模式、状态

2.1查询网卡

首先使用ifconfig -a查询出所有安装的网卡。例如得出bond0     

[root@shasgm02-mdb ~]# ifconfig
bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST>  mtu 1500
        inet 10.0.10.42  netmask 255.255.255.0  broadcast 10.0.10.255
        inet6 fe80::3a68:ddff:fe10:8a88  prefixlen 64  scopeid 0x20<link>
        ether 38:68:dd:10:8a:88  txqueuelen 1000  (Ethernet)
        RX packets 108228721  bytes 13224522885 (12.3 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 117889496  bytes 34058879891 (31.7 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

2.2检查工作模式是否正确

可以使用如下任意一个命令:

mii-tool -v  bond0
     ethtool  bond0
     dmseg | grep ' bond0'

检查内容如下
 

[root@shasgm02-mdb ~]# mii-tool -v  bond0
bond0: 10 Mbit, half duplex, link ok
  product info: vendor 00:01:00, model 0 rev 4
  basic mode:   10 Mbit, half duplex
  basic status: link ok
  capabilities:
  advertising:

 

3总结

对于局域网,K8S集群的应该不会出现由于带宽太小导致请求卡慢的问题,如果当出现类似的问题,首先要考虑的是检查路由器配置或者网卡的配置。再有就是在做性能测试的时候,最好使用网线直连,而不是使用AP热点连接到局域网。