[20200220]windows设置keepalive参数.txt

--//昨天测试了ENABLE=BROKEN在连接串中,可以发现在客户端启用了TCP keep-alive feature特性。而缺省tcp_keepalive_time设置

--//7200秒,时间有点长。许多客户端或者中间服务器使用的是windows系统,如何修改注册表呢?

--//检索找到如下链接:http://www.cppblog.com/Robertxiao/articles/153510.html

1)在Windows NT平台上, 我们利用regedit来修改系统注册表,修改

HKEY_LOCAL_MACHINE\CurrentControlSet\Services\Tcpip\Parameters下的以下三个参数:
KeepAliveInterval        :设置其值为1000
KeepAliveTime            :设置其值为300000(单位为毫秒,300000代表5分钟)
TcpMaxDataRetransmissions:设置其值为5
--//在我的工作机器测试看看。注:我的测试环境是windows 7.

1.修改注册表:

REGEDIT4
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\Tcpip\Parameters]
"KeepAliveTime"=dword:00001770
"KeepAliveInterval"=dword:000003e8
"MaxDataRetries"="5"
--//KeepAliveTime=0x1770 = 6000
--//KeepAliveInterval=0x000003e8 = 1000
--//注:不知道MaxDataRetries还是TcpMaxDataRetransmissions,windows 技术资料太少。那位知道,我最终测试2个不是。
--//或者像介绍那样,客户端无法设置。

2.测试:

--//服务端设置:
# echo /proc/sys/net/ipv4/tcp_keepalive* | xargs   -n 1  strings -1 -f
/proc/sys/net/ipv4/tcp_keepalive_intvl: 75
/proc/sys/net/ipv4/tcp_keepalive_probes: 9
/proc/sys/net/ipv4/tcp_keepalive_time: 7200
$ grep SQLNET.EXPIRE_TIME $ORACLE_HOME/network/admin/sqlnet.ora
#SQLNET.EXPIRE_TIME = 1
--//延长服务端tcp_keepalive_time时间,避免服务端干扰。
sqlplus scott/book@"(DESCRIPTION=(ENABLE=BROKEN)(CONNECT_DATA=(SERVICE_NAME=book)(SERVER = DEDICATED))(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.78)(PORT=1521)))"
SCOTT@book> @ spid
SID    SERIAL# PROCESS                  SERVER    SPID                     PID  P_SERIAL# C50
---------- ---------- ------------------------ --------- -------------------- ------- ---------- --------------------------------------------------
3       2097 1688:7476                DEDICATED 21518                     24        159 alter system kill session '3,2097' immediate;
# netstat -npo 2>/dev/null | grep 21518
tcp        0      0 192.168.100.78:1521         192.168.98.6:56411          ESTABLISHED 21518/oraclebook    keepalive (7177.06/0/0)
--//确定端口号 56411
# tcpdump -vvnni eth0  port 56411
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
--//等1分钟毫无反应。噢!才想起来也许要重启机器再测试。不知道断开网络再连接是否有效,先测试禁用再启用连接的情况。

3.继续测试:

--//禁用再启用网络连接,操作细节略。
sqlplus scott/book@"(DESCRIPTION=(ENABLE=BROKEN)(CONNECT_DATA=(SERVICE_NAME=book)(SERVER = DEDICATED))(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.78)(PORT=1521)))"
SCOTT@book> @ spid
SID    SERIAL# PROCESS                  SERVER    SPID                     PID  P_SERIAL# C50
---------- ---------- ------------------------ --------- -------------------- ------- ---------- --------------------------------------------------
58       1379 8420:2824                DEDICATED 21635                     28        114 alter system kill session '58,1379' immediate;
# netstat -npo 2>/dev/null | grep 21635
tcp        0      0 192.168.100.78:1521         192.168.98.6:57543          ESTABLISHED 21635/oraclebook    keepalive (7084.23/0/0)
--//确定端口号 57543
# tcpdump -vvnni eth0  port 57543
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
--//等1分钟毫无反应。

4.再继续测试:

--//重启测试机器客户端。
sqlplus scott/book@"(DESCRIPTION=(ENABLE=BROKEN)(CONNECT_DATA=(SERVICE_NAME=book)(SERVER = DEDICATED))(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.100.78)(PORT=1521)))"
SCOTT@book> @ spid
SID    SERIAL# PROCESS                  SERVER    SPID                     PID  P_SERIAL# C50
---------- ---------- ------------------------ --------- -------------------- ------- ---------- --------------------------------------------------
58       1391 4052:5612                DEDICATED 22059                     28        119 alter system kill session '58,1391' immediate;
$ netstat -npo 2>/dev/null | egrep "22059"
tcp        0      0 192.168.100.78:1521         192.168.98.6:49682          ESTABLISHED 22059/oraclebook    keepalive (7163.03/0/0)
# tcpdump -vvnni eth0  port 49682
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
10:06:23.007811 IP (tos 0x0, ttl 127, id 3580, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49682 > 192.168.100.78.1521: ., cksum 0xa6ea (correct), 3057020812:3057020813(1) ack 1803888726 win 16289
10:06:23.007991 IP (tos 0x0, ttl  64, id 63755, offset 0, flags [DF], proto: TCP (6), length: 52) 192.168.100.78.1521 > 192.168.98.6.49682: ., cksum 0x47cc (incorrect (-> 0x63a3), 1:1(0) ack 1 win 330 
10:06:29.010284 IP (tos 0x0, ttl 127, id 3611, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49682 > 192.168.100.78.1521: ., cksum 0xa6ea (correct), 0:1(1) ack 1 win 16289
10:06:29.010324 IP (tos 0x0, ttl  64, id 63756, offset 0, flags [DF], proto: TCP (6), length: 52) 192.168.100.78.1521 > 192.168.98.6.49682: ., cksum 0x47cc (incorrect (-> 0x63a3), 1:1(0) ack 1 win 330 
10:06:35.004759 IP (tos 0x0, ttl 127, id 3656, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49682 > 192.168.100.78.1521: ., cksum 0xa6ea (correct), 0:1(1) ack 1 win 16289
10:06:35.004797 IP (tos 0x0, ttl  64, id 63757, offset 0, flags [DF], proto: TCP (6), length: 52) 192.168.100.78.1521 > 192.168.98.6.49682: ., cksum 0x47cc (incorrect (-> 0x63a3), 1:1(0) ack 1 win 330 
10:06:41.013022 IP (tos 0x0, ttl 127, id 3695, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49682 > 192.168.100.78.1521: ., cksum 0xa6ea (correct), 0:1(1) ack 1 win 16289
10:06:41.013075 IP (tos 0x0, ttl  64, id 63758, offset 0, flags [DF], proto: TCP (6), length: 52) 192.168.100.78.1521 > 192.168.98.6.49682: ., cksum 0x47cc (incorrect (-> 0x63a3), 1:1(0) ack 1 win 330 
--//总算OK了。注意看时间间隔正好6秒。
10:09:23.021838 IP (tos 0x0, ttl 127, id 4958, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49682 > 192.168.100.78.1521: ., cksum 0xa6ea (correct), 0:1(1) ack 1 win 16289
10:09:23.021929 IP (tos 0x0, ttl  64, id 63785, offset 0, flags [DF], proto: TCP (6), length: 52) 192.168.100.78.1521 > 192.168.98.6.49682: ., cksum 0x47cc (incorrect (-> 0x63a3), 1:1(0) ack 1 win 330 
# iptables -I INPUT 1 -p tcp --dport 49682 -j drop
--//奇怪执行以上命令不行。因为没有这样类型的包。
--//192.168.100.78.1521 > 192.168.98.6.49682 对应的是OUTPUT链。
# iptables -D INPUT 1
# iptables -I INPUT 1 -p tcp --sport 49682 -j DROP
10:15:59.099141 IP (tos 0x0, ttl 127, id 7794, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49682 > 192.168.100.78.1521: ., cksum 0xa6ea (correct), 0:1(1) ack 1 win 16289
10:15:59.099179 IP (tos 0x0, ttl  64, id 63851, offset 0, flags [DF], proto: TCP (6), length: 52) 192.168.100.78.1521 > 192.168.98.6.49682: ., cksum 0x47cc (incorrect (-> 0x63a3), 1:1(0) ack 1 win 330 
--//正常。以下就是执行iptables -I INPUT 1 -p tcp --sport 49682 -j DROP的情况。
10:16:05.101208 IP (tos 0x0, ttl 127, id 7848, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49682 > 192.168.100.78.1521: ., cksum 0xa6ea (correct), 0:1(1) ack 1 win 16289
10:16:06.102474 IP (tos 0x0, ttl 127, id 7850, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49682 > 192.168.100.78.1521: ., cksum 0xa6ea (correct), 0:1(1) ack 1 win 16289
10:16:07.103134 IP (tos 0x0, ttl 127, id 7859, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49682 > 192.168.100.78.1521: ., cksum 0xa6ea (correct), 0:1(1) ack 1 win 16289
10:16:08.103356 IP (tos 0x0, ttl 127, id 7861, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49682 > 192.168.100.78.1521: ., cksum 0xa6ea (correct), 0:1(1) ack 1 win 16289
10:16:09.103903 IP (tos 0x0, ttl 127, id 7870, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49682 > 192.168.100.78.1521: ., cksum 0xa6ea (correct), 0:1(1) ack 1 win 16289
10:16:10.102031 IP (tos 0x0, ttl 127, id 7872, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49682 > 192.168.100.78.1521: ., cksum 0xa6ea (correct), 0:1(1) ack 1 win 16289
10:16:11.100107 IP (tos 0x0, ttl 127, id 7880, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49682 > 192.168.100.78.1521: ., cksum 0xa6ea (correct), 0:1(1) ack 1 win 16289
10:16:12.100189 IP (tos 0x0, ttl 127, id 7884, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49682 > 192.168.100.78.1521: ., cksum 0xa6ea (correct), 0:1(1) ack 1 win 16289
10:16:13.100203 IP (tos 0x0, ttl 127, id 7892, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49682 > 192.168.100.78.1521: ., cksum 0xa6ea (correct), 0:1(1) ack 1 win 16289
10:16:14.100300 IP (tos 0x0, ttl 127, id 7895, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49682 > 192.168.100.78.1521: ., cksum 0xa6ea (correct), 0:1(1) ack 1 win 16289
10:16:15.100914 IP (tos 0x0, ttl 127, id 7903, offset 0, flags [DF], proto: TCP (6), length: 40) 192.168.98.6.49682 > 192.168.100.78.1521: R, cksum 0xe687 (correct), 1:1(0) ack 1 win 0
--//出现11次,时间间隔1秒也正确。出现11次说明注册表的这个参数MaxDataRetries不对。
--//客户端执行sql语句,马上报错。
10:12:18 SCOTT@book> set time on escape on
10:12:20 SCOTT@book> select sysdate from dual ;
select sysdate from dual
*
ERROR at line 1:
ORA-03135: connection lost contact
Process ID: 22059
Session ID: 58 Serial number: 1391

5.验证重试测试是那个参数:

其中, setsockopt 设置了 keepalive 模式,但是系统对 keepalive 默认的参数可能不符合我们的要求,比如空闲 2 小时后才探测对

端是否活跃,所以 WSAIoctl 函数通过 tcp_keepalive 结构体对这些参数进行了相应设置。 tcp_keepalive 这 个 结构体在 mstcpip.h

头文件中有定义:

struct tcp_keepalive {
ULONG onoff ;   // 是否开启 keepalive
ULONG keepalivetime ;  // 多长时间( ms )没有数据就开始 send 心跳包
ULONG keepaliveinterval ; // 每隔多长时间( ms ) send 一个心跳包,
// 发 5 次 (2000 XP 2003 默认 ), 10 次 (Vista 后系统默认 )
};

这个结构体设置了空闲检测时间,及检测时重复发送的间隔时间。详细的可以查询 msdn:(VS.85).aspx 。

按照 msdn 上的说法,这些参数也可以通过在注册表里设置,分别为:

HKLM/SYSTEM/CurrentControlSet/Services/Tcpip/Parameters/KeepAliveTime

HKLM/SYSTEM/CurrentControlSet/Services/Tcpip/Parameters/KeepAliveInterval

另外,有些人可能已经发现了, tcp_keepalive 这个结构体中没有对重试次数这个参数的设置,这个参数可以通过注册表来设置,具体位置为:

HKLM/SYSTEM/CurrentControlSet/Services/Tcpip/Parameters/TcpMaxDataRetransmissions

关于在注册表中设置这几个参数,我在 XP 和 Server2008 系统中都没有找到, msdn 上说貌似只是支持 server 2003 ,我这里没有实验,具体不太清楚。

REGEDIT4
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\Tcpip\Parameters]
"KeepAliveTime"=dword:00001770
"KeepAliveInterval"=dword:000003e8
"MaxDataRetries"="5"
"TcpMaxDataRetransmissions"="5"

--//再次重启测试,其它步骤不再贴出。

# tcpdump -vvnni eth0  port 49513
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes
10:41:35.694775 IP (tos 0x0, ttl 127, id 3395, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49513 > 192.168.100.78.1521: ., cksum 0x420f (correct), 4042311713:4042311714(1) ack 2224808308 win 16289
10:41:35.694958 IP (tos 0x0, ttl  64, id 4842, offset 0, flags [DF], proto: TCP (6), length: 52) 192.168.100.78.1521 > 192.168.98.6.49513: ., cksum 0x47cc (incorrect (-> 0xd828), 1:1(0) ack 1 win 330 
10:41:41.692546 IP (tos 0x0, ttl 127, id 3440, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49513 > 192.168.100.78.1521: ., cksum 0x420f (correct), 0:1(1) ack 1 win 16289
10:41:41.692584 IP (tos 0x0, ttl  64, id 4843, offset 0, flags [DF], proto: TCP (6), length: 52) 192.168.100.78.1521 > 192.168.98.6.49513: ., cksum 0x47cc (incorrect (-> 0xd828), 1:1(0) ack 1 win 330 
10:41:47.694272 IP (tos 0x0, ttl 127, id 3514, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49513 > 192.168.100.78.1521: ., cksum 0x420f (correct), 0:1(1) ack 1 win 16289
10:41:47.694313 IP (tos 0x0, ttl  64, id 4844, offset 0, flags [DF], proto: TCP (6), length: 52) 192.168.100.78.1521 > 192.168.98.6.49513: ., cksum 0x47cc (incorrect (-> 0xd828), 1:1(0) ack 1 win 330 
10:41:53.697620 IP (tos 0x0, ttl 127, id 3586, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49513 > 192.168.100.78.1521: ., cksum 0x420f (correct), 0:1(1) ack 1 win 16289
10:41:53.697664 IP (tos 0x0, ttl  64, id 4845, offset 0, flags [DF], proto: TCP (6), length: 52) 192.168.100.78.1521 > 192.168.98.6.49513: ., cksum 0x47cc (incorrect (-> 0xd828), 1:1(0) ack 1 win 330 
# iptables -I INPUT 1 -p tcp --sport 49513 -j DROP
10:41:59.696732 IP (tos 0x0, ttl 127, id 3653, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49513 > 192.168.100.78.1521: ., cksum 0x420f (correct), 0:1(1) ack 1 win 16289
10:42:00.691771 IP (tos 0x0, ttl 127, id 3666, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49513 > 192.168.100.78.1521: ., cksum 0x420f (correct), 0:1(1) ack 1 win 16289
10:42:01.691836 IP (tos 0x0, ttl 127, id 3675, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49513 > 192.168.100.78.1521: ., cksum 0x420f (correct), 0:1(1) ack 1 win 16289
10:42:02.691884 IP (tos 0x0, ttl 127, id 3686, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49513 > 192.168.100.78.1521: ., cksum 0x420f (correct), 0:1(1) ack 1 win 16289
10:42:03.692474 IP (tos 0x0, ttl 127, id 3698, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49513 > 192.168.100.78.1521: ., cksum 0x420f (correct), 0:1(1) ack 1 win 16289
10:42:04.695563 IP (tos 0x0, ttl 127, id 3717, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49513 > 192.168.100.78.1521: ., cksum 0x420f (correct), 0:1(1) ack 1 win 16289
10:42:05.695603 IP (tos 0x0, ttl 127, id 3725, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49513 > 192.168.100.78.1521: ., cksum 0x420f (correct), 0:1(1) ack 1 win 16289
10:42:06.696245 IP (tos 0x0, ttl 127, id 3736, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49513 > 192.168.100.78.1521: ., cksum 0x420f (correct), 0:1(1) ack 1 win 16289
10:42:07.697502 IP (tos 0x0, ttl 127, id 3744, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49513 > 192.168.100.78.1521: ., cksum 0x420f (correct), 0:1(1) ack 1 win 16289
10:42:08.698103 IP (tos 0x0, ttl 127, id 3755, offset 0, flags [DF], proto: TCP (6), length: 41) 192.168.98.6.49513 > 192.168.100.78.1521: ., cksum 0x420f (correct), 0:1(1) ack 1 win 16289
10:42:09.697217 IP (tos 0x0, ttl 127, id 3764, offset 0, flags [DF], proto: TCP (6), length: 40) 192.168.98.6.49513 > 192.168.100.78.1521: R, cksum 0x81ac (correct), 1:1(0) ack 1 win 0
--//还是不对,放弃测试。

总结:

1.windows 测试真心的烦,一共重启3次。不知道修改注册表如何快速生效。

2.知道如何修改注册表的相关信息的具体位置。

3.重试次数默认好像是10次,我设置的"MaxDataRetries"="5","TcpMaxDataRetransmissions"="5"或者根本不能改变。看链接:

(v=vs.85)?redirectedfrom=MSDN
/* Argument structure for SIO_KEEPALIVE_VALS */
struct tcp_keepalive {
u_long  onoff;
u_long  keepalivetime;
u_long  keepaliveinterval;
};

The value specified in the onoff member determines if TCP keep-alive is enabled or disabled. If the onoff member is set

to a nonzero value, TCP keep-alive is enabled and the other members in the structure are used. The keepalivetime member

specifies the timeout, in milliseconds, with no activity until the first keep-alive packet is sent. The

keepaliveinterval member specifies the interval, in milliseconds, between when successive keep-alive packets are sent if

no acknowledgement is received.

The SO_KEEPALIVE option, which is one of the SOL_SOCKET Socket Options, can also be used to enable or disable the TCP

keep-alive on a connection, as well as query the current state of this option. To query whether TCP keep-alive is

enabled on a socket, the getsockopt function can be called with the SO_KEEPALIVE option. To enable or disable TCP

keep-alive, the setsockopt function can be called with the SO_KEEPALIVE option. If TCP keep-alive is enabled with

SO_KEEPALIVE, then the default TCP settings are used for keep-alive timeout and interval unless these values have been

changed using SIO_KEEPALIVE_VALS.

The default settings when a TCP socket is initialized sets the keep-alive timeout to 2 hours and the keep-alive interval

to 1 second. The default system-wide value of the keep-alive timeout is controllable through the KeepAliveTime registry

setting which takes a value in milliseconds. The default system-wide value of the keep-alive interval is controllable

through the KeepAliveInterval registry setting which takes a value in milliseconds.

On Windows Vista and later, the number of keep-alive probes (data retransmissions) is set to 10 and cannot be changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

--//这里提示不能改变。

On Windows Server 2003, Windows XP, and Windows 2000, the default setting for number of keep-alive probes is 5. The

number of keep-alive probes is controllable through the TcpMaxDataRetransmissions and PPTPTcpMaxDataRetransmissions

registry settings. The number of keep-alive probes is set to the larger of the two registry key values. If this number

is 0, then keep-alive probes will not be sent. If this number is above 255, then it is adjusted to 255.