在11g 之前的版本,为了使单块私网网卡的NIC down 引起节点踢出的可能性最小化,通常要依赖与OS 厂商的bonding、trunking、teaming 等类似技术将冗余的网卡绑定在一起使用。从11.2.0.2 版本开始,Oracle Clusterware 提供了通过ip 故障切换来保障内部连接冗余的一体化解决方案。
多个私网网卡可以在安装阶段定义,也可以在后来使用oifcfg 更改。ora.cluster_interconnect.haip资源将从“link-local”IP 范围(169.254.*.*)中为每个私网网卡选取一个高可用的虚IP(HAIP)。默认地,私网流量会在所有活动的内联网卡上进行负载均衡,如果一个私网网卡损坏或者无法通信,Oracle GI软件会透明地将相应的HAIP 地址移到其中一个剩余的在工作的网卡上面。相比于第三方网卡绑定技术,在提供高可用性的同时又有效利用了带宽。
即使定义了更多的私网网卡,GI 软件最大能够激活的私网网卡数目为四个,而集群实际使用的HAIP地址数则取决于集群中最先启动的节点中激活的私网网卡数目。所以如果增加私网网卡,需要重启所有节点的clusterware 才会生效。
以下通过示例演示:1、使用HAIP 的好处;2、想让私网连接继续使用固定ip 的情况下如何配置。
Oracle Database - Enterprise Edition - Version 11.2.0.2 and later
Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Cloud Machine - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Information in this document applies to any platform.
PURPOSE
This document is intended to explain what is ora.cluster_interconnect.haip resource in 11gR2 Grid Infrastructure.
SCOPE
DETAILS
Redundant Interconnect without any 3rd-party IP failover technology (bond, IPMP or similar) is supported natively by Grid Infrastructure starting from 11.2.0.2. Multiple private network adapters can be defined either during the installation phase or afterward using the oifcfg. Oracle Database, CSS, OCR, CRS, CTSS, and EVM components in 11.2.0.2 employ it automatically.
Grid Infrastructure can activate a maximum of four private network adapters at a time even if more are defined. The ora.cluster_interconnect.haip resource will start one to four link local HAIP on private network adapters for interconnect communication for Oracle RAC, Oracle ASM, and Oracle ACFS etc.
Grid automatically picks free link local addresses from reserved 169.254.*.* subnet for HAIP. According to RFC-3927, link local subnet 169.254.*.* should not be used for any other purpose. With HAIP, by default, interconnect traffic will be load balanced across all active interconnect interfaces, and corresponding HAIP address will be failed over transparently to other adapters if one fails or becomes non-communicative. .
After GI is configured, more private network interface can be added with "<GRID_HOME>/bin/oifcfg setif" command. The number of HAIP addresses is decided by how many private network adapters are active when Grid comes up on the first node in the cluster . If there's only one active private network, Grid will create one; if two, Grid will create two; and if more than two, Grid will create four HAIPs. The number of HAIPs won't change even if more private network adapters are activated later, a restart of clusterware on all nodes is required for the number to change, however, the newly activated adapters can be used for fail over purpose.
NOTE: If using the 11.2.0.2 (and above) Redundant Interconnect/HAIP feature (as documented in CASE 2 below) - At present it is REQUIRED that all interconnect interfaces be placed on separate subnets. If the interfaces are all on the same subnet and the cable is pulled from the first NIC in the routing table a rebootless-restart or node reboot will occur.
At the time of this writing, redundant private network requires different subnet for each network adapter, for example, if eth1, eth2 and eth3 are used for private network, each should be on different subnet, Refer to Case 2.
When Oracle Clusterware is fully up, resource haip should show status of ONLINE:
$ $GRID_HOME/bin/crsctl stat res -t -init
..
ora.cluster_interconnect.haip
1 ONLINE ONLINE <node1>
Case 1: Single Private Network Adapter
If multiple physical network adapters are bonded together at the OS level and presented as a single device name, for example bond0, it's still considered a single network adapter environment. Single private network adapter does not offer true HAIP, as there's only one adapter, at least two is recommended to gain true HAIP. If only one private networ
k adapter is defined, such as eth1 in the example below, one virtual IP will be created by HAIP. Here is what's expected when Grid is up and running:
$ $GRID_HOME/bin/oifcfg getif
eth1 10.x.x.128 global cluster_interconnect
eth3 10.1.x.x global public$ $GRID_HOME/bin/oifcfg iflist -p -n
eth1 10.x.x.128 PRIVATE 255.255.255.128
eth1 169.254.0.0 UNKNOWN 255.255.0.0
eth3 10.1.x.x PRIVATE 255.255.255.128
Note: 1. subnet 169.254.0.0 on eth1 is started by resource haip; 2. refer to note 1386709.1 for explanation of the outputifconfig
..
eth1 Link encap:Ethernet HWaddr 00:16:3E:11:11:22
inet addr:10.x.x.168 Bcast:10.1.0.255 Mask:255.255.255.128
inet6 addr: fe80::216:3eff:fe11:1122/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:6369306 errors:0 dropped:0 overruns:0 frame:0
TX packets:4270790 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3037449975 (2.8 GiB) TX bytes:2705797005 (2.5 GiB)
eth1:1 Link encap:Ethernet HWaddr 00:16:3E:11:22:22
inet addr:169.254.x.x Bcast:169.254.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1Instance alert.log (ASM and database):
Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
[name='eth1:1', type=1, ip=169.254.x.x, mac=00-16-3e-11-11-22, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]
Public Interface 'eth3' configured from GPnP for use as a public interface.
[name='eth3', type=1, ip=10.x.x.168, mac=00-16-3e-11-11-44, net=10.1.x.x/25, mask=255.255.255.128, use=public/1]
..
Shared memory segment for instance monitoring created
Picked latch-free SCN scheme 3
..
Cluster communication is configured to use the following interface(s) for this instance
169.254.x.x
Note: interconnect will use virtual private IP 169.254.x.x instead of real private IP. For pre-11.2.0.2 instance, by default it will still use the real private IP; to take advantage of the new feature, init.ora parameter cluster_interconnects can be updated each time Grid is restarted .For 11.2.0.2 and above, v$cluster_interconnects will show haip info:
SQL> select name,ip_address from v$cluster_interconnects;
NAME IP_ADDRESS
--------------- ----------------
eth1:1 169.254.x.x
Case 2: Multiple Private Network Adapters
Multiple switches can be deployed if there's more than one private network adapters on each node, in case one network adapter fails, the HAIP on that network segment will be failed over to others on all nodes.
2.1. Default Status
Here is an example of 3 private networks eth1, eth6 and eth7 when Grid is up and running:
$ $GRID_HOME/bin/oifcfg getif
eth1 10.x.x.128 global cluster_interconnect
eth3 10.1.x.x global public
eth6 10.11.x.x global cluster_interconnect
eth7 10.12.x.x global cluster_interconnect$ $GRID_HOME/bin/oifcfg iflist -p -n
eth1 10.x.x.128 PRIVATE 255.255.255.128
eth1 169.254.0.x UNKNOWN 255.255.192.0
eth1 169.254.192.x UNKNOWN 255.255.192.0
eth3 10.1.x.x PRIVATE 255.255.255.128
eth6 10.11.x.x PRIVATE 255.255.255.128
eth6 169.254.64.x UNKNOWN 255.255.192.0
eth7 10.12.x.x PRIVATE 255.255.255.128
eth7 169.254.128.x UNKNOWN 255.255.192.0
Note: resource haip started four virtual private IPs, two on eth1, and one on eth6 and eth7ifconfig
..
eth1 Link encap:Ethernet HWaddr 00:16:3E:11:11:22
inet addr:10.x.x.168 Bcast:10.1.0.255 Mask:255.255.255.128
inet6 addr: fe80::216:3eff:fe11:1122/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:15176906 errors:0 dropped:0 overruns:0 frame:0
TX packets:10239298 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:7929246238 (7.3 GiB) TX bytes:5768511630 (5.3 GiB)
eth1:1 Link encap:Ethernet HWaddr 00:16:3E:11:11:22
inet addr:169.254.x.x Bcast:169.254.63.255 Mask:255.255.192.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth1:2 Link encap:Ethernet HWaddr 00:16:3E:11:11:22
inet addr:169.254.x.x Bcast:169.254.255.255 Mask:255.255.192.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth6 Link encap:Ethernet HWaddr 00:16:3E:11:11:77
inet addr:10.11.x.x Bcast:10.11.0.255 Mask:255.255.255.128
inet6 addr: fe80::216:3eff:fe11:1177/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:7068185 errors:0 dropped:0 overruns:0 frame:0
TX packets:595746 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2692567483 (2.5 GiB) TX bytes:382357191 (364.6 MiB)
eth6:1 Link encap:Ethernet HWaddr 00:16:3E:11:11:77
inet addr:169.254.x.x Bcast:169.254.127.255 Mask:255.255.192.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth7 Link encap:Ethernet HWaddr 00:16:3E:11:11:88
inet addr:10.12.x.x Bcast:10.12.0.255 Mask:255.255.255.128
inet6 addr: fe80::216:3eff:fe11:1188/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:6435829 errors:0 dropped:0 overruns:0 frame:0
TX packets:314780 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2024577502 (1.8 GiB) TX bytes:172461585 (164.4 MiB)
eth7:1 Link encap:Ethernet HWaddr 00:16:3E:11:11:88
inet addr:169.254.x.x Bcast:169.254.191.255 Mask:255.255.192.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1Instance alert.log (ASM and database):
Private Interface 'eth1:1' configured from GPnP for use as a private interconnect.
[name='eth1:1', type=1, ip=169.254.xx.xx, mac=00-16-3e-11-11-22, net=169.254.x.0/18, mask=255.255.192.0, use=haip:cluster_interconnect/62]
Private Interface 'eth6:1' configured from GPnP for use as a private interconnect.
[name='eth6:1', type=1, ip=169.254.xx.xx, mac=00-16-3e-11-11-77, net=169.254.x.0/18, mask=255.255.192.0, use=haip:cluster_interconnect/62]
Private Interface 'eth7:1' configured from GPnP for use as a private interconnect.
[name='eth7:1', type=1, ip=169.254.x.x, mac=00-16-3e-11-11-88, net=169.254.x.0/18, mask=255.255.192.0, use=haip:cluster_interconnect/62]
Private Interface 'eth1:2' configured from GPnP for use as a private interconnect.
[name='eth1:2', type=1, ip=169.254.x.x, mac=00-16-3e-11-11-22, net=169.254.x.0/18, mask=255.255.192.0, use=haip:cluster_interconnect/62]
Public Interface 'eth3' configured from GPnP for use as a public interface.
[name='eth3', type=1, ip=10.x.x.68, mac=00-16-3e-11-11-44, net=10.1.x.x/25, mask=255.255.255.128, use=public/1]
Picked latch-free SCN scheme 3
..
Cluster communication is configured to use the following interface(s) for this instance
169.254.x.98
169.254.x.250
169.254.x.237
169.254.x.103
Note: interconnect communication will use all four virtual private IPs; in case of network failure, as long as there is one private network adapter functioning, all four IPs will remain active.
2.2. When Private Network Adapter Fails
If one private network adapter fails, in this example eth6, virtual private IP on eth6 will be relocated automatically to a healthy adapter, and it is transparent to instances (ASM or database)
$ $GRID_HOME/bin/oifcfg iflist -p -n
eth1 10.x.x.128 PRIVATE 255.255.255.128
eth1 169.254.0.x UNKNOWN 255.255.192.0
eth1 169.254.128.x UNKNOWN 255.255.192.0
eth7 10.12.x.x PRIVATE 255.255.255.128
eth7 169.254.64.x UNKNOWN 255.255.192.0
eth7 169.254.192.x UNKNOWN 255.255.192.0
Note: virtual private IP on eth6 subnet 169.254.64.x relocated to eth7ifconfig
..
eth1 Link encap:Ethernet HWaddr 00:16:3E:11:11:22
inet addr:10.x.x.168 Bcast:10.1.0.255 Mask:255.255.255.128
inet6 addr: fe80::216:3eff:fe11:1122/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:15183840 errors:0 dropped:0 overruns:0 frame:0
TX packets:10245071 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:7934311823 (7.3 GiB) TX bytes:5771878414 (5.3 GiB)
eth1:1 Link encap:Ethernet HWaddr 00:16:3E:11:11:22
inet addr:169.254.x.x Bcast:169.254.63.255 Mask:255.255.192.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth1:3 Link encap:Ethernet HWaddr 00:16:3E:11:11:22
inet addr:169.254.x.x Bcast:169.254.191.255 Mask:255.255.192.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth7 Link encap:Ethernet HWaddr 00:16:3E:11:11:88
inet addr:10.12.x.x Bcast:10.12.0.255 Mask:255.255.255.128
inet6 addr: fe80::216:3eff:fe11:1188/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:6438985 errors:0 dropped:0 overruns:0 frame:0
TX packets:315877 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2026266447 (1.8 GiB) TX bytes:173101641 (165.0 MiB)
eth7:2 Link encap:Ethernet HWaddr 00:16:3E:11:11:88
inet addr:169.254.x.x Bcast:169.254.127.255 Mask:255.255.192.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth7:3 Link encap:Ethernet HWaddr 00:16:3E:11:11:88
inet addr:169.254.x.x Bcast:169.254.255.255 Mask:255.255.192.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
2.3. When Another Private Network Adapter Fails
If another private network adapter is down, in this example eth1, virtual private IP on it will be relocated automatically to other healthy adapter with no impact on instances (ASM or database)
$ $GRID_HOME/bin/oifcfg iflist -p -n
eth7 10.12.x.x PRIVATE 255.255.255.128
eth7 169.254.64.x UNKNOWN 255.255.192.0
eth7 169.254.192.x UNKNOWN 255.255.192.0
eth7 169.254.0.x UNKNOWN 255.255.192.0
eth7 169.254.128.x UNKNOWN 255.255.192.0ifconfig
..
eth7 Link encap:Ethernet HWaddr 00:16:3E:11:11:88
inet addr:10.12.x.x Bcast:10.12.0.255 Mask:255.255.255.128
inet6 addr: fe80::216:3eff:fe11:1188/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:6441559 errors:0 dropped:0 overruns:0 frame:0
TX packets:317271 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2027824788 (1.8 GiB) TX bytes:173810658 (165.7 MiB)
eth7:1 Link encap:Ethernet HWaddr 00:16:3E:11:11:88
inet addr:169.254.x.x Bcast:169.254.63.255 Mask:255.255.192.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth7:2 Link encap:Ethernet HWaddr 00:16:3E:11:11:88
inet addr:169.254.x.x Bcast:169.254.127.255 Mask:255.255.192.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth7:3 Link encap:Ethernet HWaddr 00:16:3E:11:11:88
inet addr:169.254.x.x Bcast:169.254.255.255 Mask:255.255.192.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth7:4 Link encap:Ethernet HWaddr 00:16:3E:11:11:88
inet addr:169.254.x.x Bcast:169.254.191.255 Mask:255.255.192.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
2.4. When Private Network Adapter Restores
If private network adapter eth6 is restored, it will be activated automatically as virtual private IPs will be assigned to it:
$ $GRID_HOME/bin/oifcfg iflist -p -n
..
eth6 10.11.x.x PRIVATE 255.255.255.128
eth6 169.254.128.x UNKNOWN 255.255.192.0
eth6 169.254.0.x UNKNOWN 255.255.192.0
eth7 10.12.x.x PRIVATE 255.255.255.128
eth7 169.254.64.x UNKNOWN 255.255.192.0
eth7 169.254.192.x UNKNOWN 255.255.192.0ifconfig
..
eth6 Link encap:Ethernet HWaddr 00:16:3E:11:11:77
inet addr:10.11.x.x Bcast:10.11.0.255 Mask:255.255.255.128
inet6 addr: fe80::216:3eff:fe11:1177/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:398 errors:0 dropped:0 overruns:0 frame:0
TX packets:121 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:185138 (180.7 KiB) TX bytes:56439 (55.1 KiB)
eth6:1 Link encap:Ethernet HWaddr 00:16:3E:11:11:77
inet addr:169.254.x.x Bcast:169.254.191.255 Mask:255.255.192.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth6:2 Link encap:Ethernet HWaddr 00:16:3E:11:11:77
inet addr:169.254.x.x Bcast:169.254.63.255 Mask:255.255.192.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth7 Link encap:Ethernet HWaddr 00:16:3E:11:11:88
inet addr:10.12.x.x Bcast:10.12.0.255 Mask:255.255.255.128
inet6 addr: fe80::216:3eff:fe11:1188/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:6442552 errors:0 dropped:0 overruns:0 frame:0
TX packets:317983 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:2028404133 (1.8 GiB) TX bytes:174103017 (166.0 MiB)
eth7:2 Link encap:Ethernet HWaddr 00:16:3E:11:11:88
inet addr:169.254.x.x Bcast:169.254.127.255 Mask:255.255.192.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
eth7:3 Link encap:Ethernet HWaddr 00:16:3E:11:11:88
inet addr:169.254.x.x Bcast:169.254.255.255 Mask:255.255.192.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Miscellaneous
It's NOT supported to disable or stop HAIP while the cluster is up and running unless otherwise advised by Oracle Support/Development.
1. The feature is disabled in 11.2.0.2/11.2.0.3 if Sun Cluster exists
2. The feature does not exist in Windows 11.2.0.2/11.2.0.3
3. The feature is disabled in 11.2.0.2/11.2.0.3 if Fujitsu PRIMECLUSTER exists
4. With the fix of bug 11077756 (fixed in 11.2.0.2 GI PSU6, 11.2.0.3), HAIP will be disabled if it fails to start while running root script (root.sh or rootupgrade.sh), for more details, refer to Section bug 11077756
5. The feature is disabled on Solaris 11 if IPMP is used for private network. Tracking <bug 16982332 >
6. The feature is disabled on HP-UX and AIX if cluster_interconnect/"private network" is Infiniband
HAIP Log File
Resource haip is managed by ohasd.bin, resource log is located in $GRID_HOME/log/<nodename>/ohasd/ohasd.log and $GRID_HOME/log/<nodename>/agent/ohasd/orarootagent_root/orarootagent_root.log
L1. Log Sample When Private Network Adapter Fails
In a multiple private network adapter environment, if one of the adapters fails:
• ohasd.log
2010-09-24 09:10:00.891: [GIPCHGEN][1083025728]gipchaInterfaceFail: marking interface failing 0x2aaab0269a10 { host '', haName 'CLSFRAME_a2b2', local (nil), ip '10.11.x.x', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x4d }
2010-09-24 09:10:00.902: [GIPCHGEN][1138145600]gipchaInterfaceDisable: disabling interface 0x2aaab0269a10 { host '', haName 'CLSFRAME_a2b2', local (nil), ip '10.11.x.x', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x1cd }
2010-09-24 09:10:00.902: [GIPCHDEM][1138145600]gipchaWorkerCleanInterface: performing cleanup of disabled interface 0x2aaab0269a10 { host '', haName 'CLSFRAME_a2b2', local (nil), ip '10.11.0.188', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x1ed }• orarootagent_root.log
2010-09-24 09:09:57.708: [ USRTHRD][1129138496] {0:0:2} failed to receive ARP request
2010-09-24 09:09:57.708: [ USRTHRD][1129138496] {0:0:2} Assigned IP 169.254.x.x no longer valid on inf eth6
2010-09-24 09:09:57.708: [ USRTHRD][1129138496] {0:0:2} VipActions::startIp {
2010-09-24 09:09:57.708: [ USRTHRD][1129138496] {0:0:2} Adding 169.254.x.x on eth6:1
2010-09-24 09:09:57.719: [ USRTHRD][1129138496] {0:0:2} VipActions::startIp }
2010-09-24 09:09:57.719: [ USRTHRD][1129138496] {0:0:2} Reassigned IP: 169.254.x.x on interface eth6
2010-09-24 09:09:58.013: [ USRTHRD][1082325312] {0:0:2} HAIP: Updating member info HAIP1;10.11.x.x#0;10.11.x.x#1
2010-09-24 09:09:58.015: [ USRTHRD][1082325312] {0:0:2} HAIP: Moving ip '169.254.x.x' from inf 'eth6' to inf 'eth7'
2010-09-24 09:09:58.015: [ USRTHRD][1082325312] {0:0:2} pausing thread
2010-09-24 09:09:58.015: [ USRTHRD][1082325312] {0:0:2} posting thread
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]start {
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]start }
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} HAIP: Moving ip '169.254.x.x' from inf 'eth1' to inf 'eth7'
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} pausing thread
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} posting thread
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]start {
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]start }
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} HAIP: Moving ip '169.254.x.x' from inf 'eth7' to inf 'eth1'
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} pausing thread
2010-09-24 09:09:58.016: [ USRTHRD][1082325312] {0:0:2} posting thread
2010-09-24 09:09:58.017: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]start {
2010-09-24 09:09:58.017: [ USRTHRD][1116531008] {0:0:2} [NetHAWork] thread started
2010-09-24 09:09:58.017: [ USRTHRD][1116531008] {0:0:2} Arp::sCreateSocket {
2010-09-24 09:09:58.017: [ USRTHRD][1093232960] {0:0:2} [NetHAWork] thread started
2010-09-24 09:09:58.017: [ USRTHRD][1093232960] {0:0:2} Arp::sCreateSocket {
2010-09-24 09:09:58.017: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]start }
2010-09-24 09:09:58.018: [ USRTHRD][1143847232] {0:0:2} [NetHAWork] thread started
2010-09-24 09:09:58.018: [ USRTHRD][1143847232] {0:0:2} Arp::sCreateSocket {
2010-09-24 09:09:58.034: [ USRTHRD][1116531008] {0:0:2} Arp::sCreateSocket }
2010-09-24 09:09:58.034: [ USRTHRD][1116531008] {0:0:2} Starting Probe for ip 169.254.x.x
2010-09-24 09:09:58.034: [ USRTHRD][1116531008] {0:0:2} Transitioning to Probe State
2010-09-24 09:09:58.034: [ USRTHRD][1093232960] {0:0:2} Arp::sCreateSocket }
2010-09-24 09:09:58.035: [ USRTHRD][1093232960] {0:0:2} Starting Probe for ip 169.254.x.x
2010-09-24 09:09:58.035: [ USRTHRD][1093232960] {0:0:2} Transitioning to Probe State
2010-09-24 09:09:58.050: [ USRTHRD][1143847232] {0:0:2} Arp::sCreateSocket }
2010-09-24 09:09:58.050: [ USRTHRD][1143847232] {0:0:2} Starting Probe for ip 169.254.x.x
2010-09-24 09:09:58.050: [ USRTHRD][1143847232] {0:0:2} Transitioning to Probe State
2010-09-24 09:09:58.231: [ USRTHRD][1093232960] {0:0:2} Arp::sProbe {
2010-09-24 09:09:58.231: [ USRTHRD][1093232960] {0:0:2} Arp::sSend: sending type 1
2010-09-24 09:09:58.231: [ USRTHRD][1093232960] {0:0:2} Arp::sProbe }
2010-09-24 09:10:04.879: [ USRTHRD][1116531008] {0:0:2} Arp::sAnnounce {
2010-09-24 09:10:04.879: [ USRTHRD][1116531008] {0:0:2} Arp::sSend: sending type 1
2010-09-24 09:10:04.879: [ USRTHRD][1116531008] {0:0:2} Arp::sAnnounce }
2010-09-24 09:10:04.879: [ USRTHRD][1116531008] {0:0:2} Transitioning to Defend State
2010-09-24 09:10:04.879: [ USRTHRD][1116531008] {0:0:2} VipActions::startIp {
2010-09-24 09:10:04.879: [ USRTHRD][1116531008] {0:0:2} Adding 169.254.x.x on eth7:2
2010-09-24 09:10:04.880: [ USRTHRD][1116531008] {0:0:2} VipActions::startIp }
2010-09-24 09:10:04.880: [ USRTHRD][1116531008] {0:0:2} Assigned IP: 169.254.x.x on interface eth7
2010-09-24 09:10:05.150: [ USRTHRD][1143847232] {0:0:2} Arp::sAnnounce {
2010-09-24 09:10:05.150: [ USRTHRD][1143847232] {0:0:2} Arp::sSend: sending type 1
2010-09-24 09:10:05.150: [ USRTHRD][1143847232] {0:0:2} Arp::sAnnounce }
2010-09-24 09:10:05.150: [ USRTHRD][1143847232] {0:0:2} Transitioning to Defend State
2010-09-24 09:10:05.150: [ USRTHRD][1143847232] {0:0:2} VipActions::startIp {
2010-09-24 09:10:05.151: [ USRTHRD][1143847232] {0:0:2} Adding 169.254.x.x on eth1:3
2010-09-24 09:10:05.151: [ USRTHRD][1143847232] {0:0:2} VipActions::startIp }
2010-09-24 09:10:05.151: [ USRTHRD][1143847232] {0:0:2} Assigned IP: 169.254.x.x on interface eth1
2010-09-24 09:10:05.470: [ USRTHRD][1093232960] {0:0:2} Arp::sAnnounce {
2010-09-24 09:10:05.470: [ USRTHRD][1093232960] {0:0:2} Arp::sSend: sending type 1
2010-09-24 09:10:05.470: [ USRTHRD][1093232960] {0:0:2} Arp::sAnnounce }
2010-09-24 09:10:05.470: [ USRTHRD][1093232960] {0:0:2} Transitioning to Defend State
2010-09-24 09:10:05.470: [ USRTHRD][1093232960] {0:0:2} VipActions::startIp {
2010-09-24 09:10:05.471: [ USRTHRD][1093232960] {0:0:2} Adding 169.254.x.x on eth7:3
2010-09-24 09:10:05.471: [ USRTHRD][1093232960] {0:0:2} VipActions::startIp }
2010-09-24 09:10:05.471: [ USRTHRD][1093232960] {0:0:2} Assigned IP: 169.254.x.x on interface eth7
2010-09-24 09:10:06.047: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]stop {
2010-09-24 09:10:06.282: [ USRTHRD][1129138496] {0:0:2} [NetHAWork] thread stopping
2010-09-24 09:10:06.282: [ USRTHRD][1129138496] {0:0:2} Thread:[NetHAWork]isRunning is reset to false here
2010-09-24 09:10:06.282: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]stop }
2010-09-24 09:10:06.282: [ USRTHRD][1082325312] {0:0:2} VipActions::stopIp {
2010-09-24 09:10:06.282: [ USRTHRD][1082325312] {0:0:2} NetInterface::sStopIp {
2010-09-24 09:10:06.282: [ USRTHRD][1082325312] {0:0:2} Stopping ip '169.254.x.x', inf 'eth6', mask '10.11.x.x'
2010-09-24 09:10:06.288: [ USRTHRD][1082325312] {0:0:2} NetInterface::sStopIp }
2010-09-24 09:10:06.288: [ USRTHRD][1082325312] {0:0:2} VipActions::stopIp }
2010-09-24 09:10:06.288: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]stop {
2010-09-24 09:10:06.298: [ USRTHRD][1131239744] {0:0:2} [NetHAWork] thread stopping
2010-09-24 09:10:06.298: [ USRTHRD][1131239744] {0:0:2} Thread:[NetHAWork]isRunning is reset to false here
2010-09-24 09:10:06.298: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]stop }
2010-09-24 09:10:06.298: [ USRTHRD][1082325312] {0:0:2} VipActions::stopIp {
2010-09-24 09:10:06.298: [ USRTHRD][1082325312] {0:0:2} NetInterface::sStopIp {
2010-09-24 09:10:06.298: [ USRTHRD][1082325312] {0:0:2} Stopping ip '169.254.x.x', inf 'eth7', mask '10.12.x.x'
2010-09-24 09:10:06.299: [ USRTHRD][1082325312] {0:0:2} NetInterface::sStopIp }
2010-09-24 09:10:06.299: [ USRTHRD][1082325312] {0:0:2} VipActions::stopIp }
2010-09-24 09:10:06.299: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]stop {
2010-09-24 09:10:06.802: [ USRTHRD][1133340992] {0:0:2} [NetHAWork] thread stopping
2010-09-24 09:10:06.802: [ USRTHRD][1133340992] {0:0:2} Thread:[NetHAWork]isRunning is reset to false here
2010-09-24 09:10:06.802: [ USRTHRD][1082325312] {0:0:2} Thread:[NetHAWork]stop }
2010-09-24 09:10:06.802: [ USRTHRD][1082325312] {0:0:2} VipActions::stopIp {
2010-09-24 09:10:06.802: [ USRTHRD][1082325312] {0:0:2} NetInterface::sStopIp {
2010-09-24 09:10:06.802: [ USRTHRD][1082325312] {0:0:2} Stopping ip '169.254.x.x', inf 'eth1', mask '10.1.x.x'
2010-09-24 09:10:06.802: [ USRTHRD][1082325312] {0:0:2} NetInterface::sStopIp }
2010-09-24 09:10:06.802: [ USRTHRD][1082325312] {0:0:2} VipActions::stopIp }
2010-09-24 09:10:06.803: [ USRTHRD][1082325312] {0:0:2} USING HAIP[ 0 ]: eth7 - 169.254.112.x
2010-09-24 09:10:06.803: [ USRTHRD][1082325312] {0:0:2} USING HAIP[ 1 ]: eth1 - 169.254.178.x
2010-09-24 09:10:06.803: [ USRTHRD][1082325312] {0:0:2} USING HAIP[ 2 ]: eth7 - 169.254.244.x
2010-09-24 09:10:06.803: [ USRTHRD][1082325312] {0:0:2} USING HAIP[ 3 ]: eth1 - 169.254.30.x
Note: from above, even only NIC eth6 failed, there could be multiple virtual private IP movement among surviving NICs
- ocssd.log
2010-09-24 09:09:58.314: [ GIPCNET][1089964352] gipcmodNetworkProcessSend: [network] failed send attempt endp 0xe1b9150 [0000000000000399] { gipcEndpoint : localAddr 'udp://10.11.x.x:60169', remoteAddr '', numPend 5, numReady 1, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x2, usrFlags 0x4000 }, req 0x2aaab00117f0 [00000000004b0cae] { gipcSendRequest : addr 'udp://10.11.x.x:41486', data 0x2aaab0050be8, len 80, olen 0, parentEndp 0xe1b9150, ret gipcretEndpointNotAvailable (40), objFlags 0x0, reqFlags 0x2 }
2010-09-24 09:09:58.314: [ GIPCNET][1089964352] gipcmodNetworkProcessSend: slos op : sgipcnValidateSocket
2010-09-24 09:09:58.314: [ GIPCNET][1089964352] gipcmodNetworkProcessSend: slos dep : Invalid argument (22)
2010-09-24 09:09:58.314: [ GIPCNET][1089964352] gipcmodNetworkProcessSend: slos loc : address not
2010-09-24 09:09:58.314: [ GIPCNET][1089964352] gipcmodNetworkProcessSend: slos info: addr '10.11.x.x:60169', len 80, buf 0x2aaab0050be8, cookie 0x2aaab00117f0
2010-09-24 09:09:58.314: [GIPCXCPT][1089964352] gipcInternalSendSync: failed sync request, ret gipcretEndpointNotAvailable (40)
2010-09-24 09:09:58.314: [GIPCXCPT][1089964352] gipcSendSyncF [gipchaLowerInternalSend : gipchaLower.c : 755]: EXCEPTION[ ret gipcretEndpointNotAvailable (40) ] failed to send on endp 0xe1b9150 [0000000000000399] { gipcEndpoint : localAddr 'udp://10.11.x.x:60169', remoteAddr '', numPend 5, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, flags 0x2, usrFlags 0x4000 }, addr 0xe4e6d10 [00000000000007ed] { gipcAddress : name 'udp://10.11.x.x:41486', objFlags 0x0, addrFlags 0x1 }, buf 0x2aaab0050be8, len 80, flags 0x0
2010-09-24 09:09:58.314: [GIPCHGEN][1089964352] gipchaInterfaceFail: marking interface failing 0xe2bd5f0 { host '<node2>', haName 'CSS_a2b2', local 0x2aaaac2098e0, ip '10.11.x.x:41486', subnet '10.11.0.128', mask '255.255.255.128', numRef 0, numFail 0, flags 0x6 }
2010-09-24 09:09:58.314: [GIPCHALO][1089964352] gipchaLowerInternalSend: failed to initiate send on interface 0xe2bd5f0 { host '<node2>', haName 'CSS_a2b2', local 0x2aaaac2098e0, ip '10.11.x.x:41486', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x86 }, hctx 0xde81d10 [0000000000000010] { gipchaContext : host '<node1>', name 'CSS_a2b2', luid '4f06f2aa-00000000', numNode 1, numInf 3, usrFlags 0x0, flags 0x7 }
2010-09-24 09:09:58.326: [GIPCHGEN][1089964352] gipchaInterfaceDisable: disabling interface 0x2aaaac2098e0 { host '', haName 'CSS_a2b2', local (nil), ip '10.11.x.x', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 1, flags 0x14d }
2010-09-24 09:09:58.326: [GIPCHGEN][1089964352] gipchaInterfaceDisable: disabling interface 0xe2bd5f0 { host '<node2>', haName 'CSS_a2b2', local 0x2aaaac2098e0, ip '10.11.x.x:41486', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x86 }
2010-09-24 09:09:58.327: [GIPCHALO][1089964352] gipchaLowerCleanInterfaces: performing cleanup of disabled interface 0xe2bd5f0 { host '<node2>', haName 'CSS_a2b2', local 0x2aaaac2098e0, ip '10.11.x.x:41486', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0xa6 }
2010-09-24 09:09:58.327: [GIPCHGEN][1089964352] gipchaInterfaceReset: resetting interface 0xe2bd5f0 { host '<node2>', haName 'CSS_a2b2', local 0x2aaaac2098e0, ip '10.11.x.x:41486', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0xa6 }
2010-09-24 09:09:58.338: [GIPCHDEM][1089964352] gipchaWorkerCleanInterface: performing cleanup of disabled interface 0x2aaaac2098e0 { host '', haName 'CSS_a2b2', local (nil), ip '10.11.x.x', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x16d }
2010-09-24 09:09:58.338: [GIPCHTHR][1089964352] gipchaWorkerUpdateInterface: created remote interface for node '<node2>', haName 'CSS_a2b2', inf 'udp://10.11.x.x:41486'
2010-09-24 09:09:58.338: [GIPCHGEN][1089964352] gipchaWorkerAttachInterface: Interface attached inf 0xe2bd5f0 { host '<node2>', haName 'CSS_a2b2', local 0x2aaaac2014f0, ip '10.11.x.x:41486', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x
6 }
2010-09-24 09:10:00.454: [ CSSD][1108904256]clssnmSendingThread: sending status msg to all nodes
Note: from above, ocssd.bin won't fail as long as there's at least one private network adapter is working
L2. Log Sample When Private Network Adapter Restores
In a multiple private network adapter environment, if one of the failed adapters becomes restored:
- ohasd.log
2010-09-24 09:14:30.962: [GIPCHGEN][1083025728]gipchaNodeAddInterface: adding interface information for inf 0x2aaaac1a53d0 { host '', haName 'CLSFRAME_a2b2', local (nil), ip '10.11.x.x', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x41 }
2010-09-24 09:14:30.972: [GIPCHTHR][1138145600]gipchaWorkerUpdateInterface: created local bootstrap interface for node '<node1>', haName 'CLSFRAME_a2b2', inf 'mcast://230.0.1.0:42424/10.11.x.x'
2010-09-24 09:14:30.972: [GIPCHTHR][1138145600]gipchaWorkerUpdateInterface: created local interface for node '<node1>', haName 'CLSFRAME_a2b2', inf '10.11.x.x:13235'• ocssd.log
2010-09-24 09:14:30.961: [GIPCHGEN][1091541312] gipchaNodeAddInterface: adding interface information for inf 0x2aaab005af00 { host '', haName 'CSS_a2b2', local (nil), ip '10.11.x.x', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x41 }
2010-09-24 09:14:30.972: [GIPCHTHR][1089964352] gipchaWorkerUpdateInterface: created local bootstrap interface for node '<node1>', haName 'CSS_a2b2', inf 'mcast://230.0.1.0:42424/10.11.x.x'
2010-09-24 09:14:30.972: [GIPCHTHR][1089964352] gipchaWorkerUpdateInterface: created local interface for node '<node1>', haName 'CSS_a2b2', inf '10.11.x.x:10884'
2010-09-24 09:14:30.972: [GIPCHGEN][1089964352] gipchaNodeAddInterface: adding interface information for inf 0x2aaab0035490 { host '<node2>', haName 'CSS_a2b2', local (nil), ip '10.21.x.x', subnet '10.12.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x42 }
2010-09-24 09:14:30.972: [GIPCHGEN][1089964352] gipchaNodeAddInterface: adding interface information for inf 0x2aaab00355c0 { host '<node2>', haName 'CSS_a2b2', local (nil), ip '10.11.x.x', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x42 }
2010-09-24 09:14:30.972: [GIPCHTHR][1089964352] gipchaWorkerUpdateInterface: created remote interface for node '<node2>', haName 'CSS_a2b2', inf 'mcast://230.0.1.0:42424/10.12.x.x'
2010-09-24 09:14:30.972: [GIPCHGEN][1089964352] gipchaWorkerAttachInterface: Interface attached inf 0x2aaab0035490 { host '<node2>', haName 'CSS_a2b2', local 0x2aaab005af00, ip '10.12.x.x', subnet '10.12.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x46 }
2010-09-24 09:14:30.972: [GIPCHTHR][1089964352] gipchaWorkerUpdateInterface: created remote interface for node '<node2>', haName 'CSS_a2b2', inf 'mcast://230.0.1.0:42424/10.11.x.x'
2010-09-24 09:14:30.972: [GIPCHGEN][1089964352] gipchaWorkerAttachInterface: Interface attached inf 0x2aaab00355c0 { host '<node2>', haName 'CSS_a2b2', local 0x2aaab005af00, ip '10.11.x.x', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x46 }
2010-09-24 09:14:31.437: [GIPCHGEN][1089964352] gipchaInterfaceDisable: disabling interface 0x2aaab00355c0 { host '<node2>', haName 'CSS_a2b2', local 0x2aaab005af00, ip '10.11.x.x', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x46 }
2010-09-24 09:14:31.437: [GIPCHALO][1089964352] gipchaLowerCleanInterfaces: performing cleanup of disabled interface 0x2aaab00355c0 { host '<node2>', haName 'CSS_a2b2', local 0x2aaab005af00, ip '10.11.x.x', subnet '10.11.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x66 }
2010-09-24 09:14:31.446: [GIPCHGEN][1089964352] gipchaInterfaceDisable: disabling interface 0x2aaab0035490 { host '<node2>', haName 'CSS_a2b2', local 0x2aaab005af00, ip '10.12.x.x', subnet '10.12.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x46 }
2010-09-24 09:14:31.446: [GIPCHALO][1089964352] gipchaLowerCleanInterfaces: performing cleanup of disabled interface 0x2aaab0035490 { host '<node2>', haName 'CSS_a2b2', local 0x2aaab005af00, ip '10.12.x.x', subnet '10.12.x.x', mask '255.255.255.128', numRef 0, numFail 0, flags 0x66 }
Known Issues
Refer to note 1640865.1 for known HAIP issues in 11gR2/12c Grid Infrastructure
1.确保169.254.x.x 地址被绑定到私有网卡上。
2.确保地址是以169.254开头。
3.确保所有节点私有网络之间没有防火墙。
4.确保所有节点的ora.cluster_interconnect.haip资源都启动成功。
5.所有节点的ora.cluster_interconnect.haip资源启动成功后,确保所有节点绑定的169.254.x.x 地址在节点之间都能相互PING通。
注意:在ora.cluster_interconnect.haip资源启动之前,cssd进程会检查私有网络的健康状况,从而判定是否启动cssd进程,这个时候私有网络的IP是在操作系统级别设置的IP地址;当ora.cluster_interconnect.haip资源启动之后,ora.asm中的LMON等进程会检查私有网络的通信的健康状况,从而判定是否启动集群ora.asm,这个时候私有网络的IP地址是169.254.x.x,如果节点相互之间的一个或多个169.254.x.x网络地址不通,实际就是脑裂的情况,asm实例必定只能在部分节点运行,asm实例不能启动,Clusterware和数据库实例都无法启动。