Oracle 集群心跳机制:
Oracle集群如何维护集群的一致性,所谓的集群一致性就是指集群中每个成员能够了解其他成员的状态,而且每个成员获得的集群中其他节点的状态和集群中节点成员列表信息是一致的,这也是集群最基本的要求。
Oracle通过三种机制来实现集群的一致性:
网络心跳: 确定节点与节点间的连通性,以便节点之间能够了解彼此的状态。
磁盘心跳: 用一个或多个共享的位置来保存节点之间的连通性信息,以便在集群需要进行重新配置时,能够做出正确的决定并记录集群最新的状态.
本地心跳: 本地节点自我监控机制,以便当本地节点出现问题时能够主动离开集群,避免不一致的产生
一:网络心跳
ocssd进程每秒钟通过集群的私网会向集群的其他节点发送网络心跳.
例如一个4节点的集群,集群的每一个节点每一秒钟都会向集群中的其他三个几点发送网络心跳信息,也就是说每个节点每一秒钟也会收到集群中其他节点发送的网络心跳。既然节点间互相发送网络心跳,就需要有一种机制来确定节点之间的连通性,以及当网络心跳出现问题时的处理机制。
网络心跳主要通过以下的ocssd.bin线程实现:
发送线程:该线程每秒钟向集群中所有的节点发送网络心跳信息。
分析线程:该线程会分析收到的网络心跳信息并进行处理,如果发现集群中的某一个节点持续丢失网络心跳,就会通知集群进行重新配置。
集群重新配置线程:负责对集群进行重新配置
派遣线程:该线程负责接收从远程节点传递过来的信息,之后,根据信息的种类发送给相关线程进行处理。
工作机制:
1.发送线程负责每秒钟发送网络心跳到其他远程节点。
2.派遣线程负责接收从远程节点发送过来的网络心跳信息。
3.分析线程会处理由派遣线程接收到的网络心跳信息,确认节点连通性.
例如:当分析线程发现某些节点的连通性出现问题时,也就是说连续一段时间内没有发现某一个节点或几个节点的网络心跳,集群就会进行重新配置。而这种情况下重新配置的结果往往就是某一节点或几个节点离开集群,所以节点间的私网通信问题会破环集群的一致性。
二:磁盘心跳
磁盘心跳的主要目的就是当集群发生脑裂时帮助制定脑裂的解决方案。
解释
Oracle集群的每一个节点每秒钟都会向集群的所有表决盘VF注册本地节点的磁盘心跳信息,同时也会将自己能够联系到的集群中其他节点的信息写入表决盘中,一旦发生脑裂,css的重新配置线程就可以通过表决盘中的信息了解集群中节点之间的连通性,从而决定集群会分裂成几个子集群,以及每个子集群包含的节点情况和每个节点的状态。
示例:
一个两节点的集群(node1,node2)配置了三块VF(VF1,VF2,VF3),node1无法访问VF1,node2无法访问VF2,这意味着两个节点仍然同时能够访问VF3。而当集群中某一节点无法访问大多数VF时([VF/2]+1),这就意味着当需要通过VF中的信息决定节点去留时,可能会出现没有任何一个VF可被集群中的全部节点访问到的情况,这也意味着无法决定哪些节点应该离开集群,哪些节点应该被保留。
一个两节点的集群(node1,node2)配置了3块VF(VF1,VF2,VF3),node1无法访问VF1,VF2 node2无法访问VF3,这意味着当出现网络问题时,集群无法通过VF的信息获得一致的所有节点的状态,也就无法完成集群的重新配置。所以无论如何变化,只要节点必须能够访问到[VF/2]+1个VF的规则,就一定能够保证至少一个VF能够被所有节点访问到。
三:本地心跳
本地心跳的作用是监控ocssd.bin进程以及本地节点的状态。
cssdagent和cssdmonitor的功能就是监控本地节点的ocssd.bin进程状态和本地节点的状态,对于ocssd.bin进程的监控是通过本地心跳来实现的,Oracle会在每一秒钟,在发送网络心跳的同时向cssdagent和cssdmonitor发送本地ocssd.bin进程的状态(本地心跳)。如果本地心跳没有问题,cssdagent就认为ocssd.bin进程正常。如果ocssd.bin进程持续丢失本地心跳(到达misscount的时间)ocssdagent就会认为本地节点的ocssd.bin进程出现了问题,并重启该节点。
脑裂:
集群的网络心跳丢失,但是磁盘心跳正常。当脑裂出现后,集群会分裂成为若干个子集群。对于这种情况的出现,集群需要进行重新配置。
重新配置的基本原则:节点数多的子集群存活,如果子集群包含的节点数相同,那么包含最小编号节点的子集群存活。
四:网络心跳misscount和磁盘心跳disktimeout查询及设置
misscount:用来定义集群网络心跳的超时时间,默认值是30s。当集群中的一个或多个节点连续丢失网络心跳超过misscount时间后,集群需要进行重新配置,某一个或多个节点需要离开集群。在11gR2版本的集群,这个值也是节点本地心跳超时时间,因为本地心跳和网络心跳是由相同的线程发送的。
查询网络心跳NHB misscount
[root@node1 ~]# crsctl get css misscount;
CRS-4678: Successful get misscount 30 for Cluster Synchronization Services.
查询磁盘心跳DHB disktimeout
[root@node1 ~]# crsctl get css disktimeout;
CRS-4678: Successful get disktimeout 200 for Cluster Synchronization Services.
修改网络心跳NHB misscount
[root@node1 ~]# crsctl set css misscount 50;
CRS-4684: Successful set of parameter misscount to 50 for Cluster Synchronization Services.
[root@node1 ~]# crsctl set css misscount 30;
CRS-4684: Successful set of parameter misscount to 30 for Cluster Synchronization Services.
修改磁盘心跳disktimeout
crsctl set css disktimeout 300
-----Oracle心跳能不能直接连接
1、rac心跳的作用:
检测集群节点间的网络健康状态,还可用做缓存同步刷新及全局资源维护。在grid control出现后还传输数据块,其内联数据通信量比较大,通常是千兆网,当然使用万兆更好。
2、rac心跳能否用直连网线?
直连网线限制RAC至两节点,另外直连网线不稳定,由此造成的BUG和技术问题,ORACLE不提供相应的技术支持。
具体看ORACLE官方解释:
RAC: Frequently Asked Questions [ID 220970.1]中描述
Is crossover cable supported as an interconnect with RAC on any platform ?
NO. CROSS OVER CABLES ARE NOT SUPPORTED. The requirement is to use a switch:
Detailed Reasons:
1) cross-cabling limits the expansion of RAC to two nodes
2) cross-cabling is unstable:
a) Some NIC cards do not work properly with it. They are not able to negotiate the DTE/DCE clocking, and will thus not function. These NICS were made cheaper by assuming that the switch was going to have the clock. Unfortunately there is no way to know which NICs do not have that clock.
b) Media sense behaviour on various OS's (most notably Windows) will bring a NIC down when a cable is disconnected. Either of these issues can lead to cluster instability and lead to ORA-29740 errors (node evictions).
Due to the benefits and stability provided by a switch, and their afforability ($200 for a simple 16 port GigE switch), and the expense and time related to dealing with issues when one does not exist, this is the only supported configuration.
From a purely technology point of view Oracle does not care if the customer uses cross over cable or router or switches to deliver a message. However, we know from experience that a lot of adapters misbehave when used in a crossover configuration and cause a lot of problems for RAC. Hence we have stated on certify that we do not support crossover cables to avoid false bugs and finger pointing amongst the various parties: Oracle, Hardware vendors, Os vendors etc...
3、rac心跳的高可用
rac心跳实现高可用,可使用双网口绑定的技术,操作系统层面实现。双网口绑定常见有负载均衡和主备模式。负载均衡可提供两倍的带宽(实际并达不到,只是可快一些),但从可靠性角度来说,建议主备模式。在主备模式下,当一个网络接口失效时(例如主交换机掉电等),不会出现网络中断,系统会按照/etc/rc.d/rc.local里指定的网卡顺序工作,机器仍能对外服务,起到了失效保护的功能。
补充资料:
linux系统下bond mode参数说明:(mode=4 在交换机支持LACP时推荐使用,其能提供更好的性能和稳定性)0-轮询模式,所绑定的网卡会针对访问以轮询算法进行平分。
1-高可用模式,运行时只使用一个网卡,其余网卡作为备份,在负载不超过单块网卡带宽或压力时建议使用。
2-基于HASH算法的负载均衡模式,网卡的分流按照xmit_hash_policy的TCP协议层设置来进行HASH计算分流,使各种不同处理来源的访问都尽量在同一个网卡上进行处理。
3-广播模式,所有被绑定的网卡都将得到相同的数据,一般用于十分特殊的网络需求,如需要对两个互相没有连接的交换机发送相同的数据。
4-802.3ab负载均衡模式,要求交换机也支持802.3ab模式,理论上服务器及交换机都支持此模式时,网卡带宽最高可以翻倍(如从1Gbps翻到2Gbps)
5-适配器输出负载均衡模式,输出的数据会通过所有被绑定的网卡输出,接收数据时则只选定其中一块网卡。如果正在用于接收数据的网卡发生故障,则由其他网卡接管,要求所用的网卡及网卡驱动可通过ethtool命令得到speed信息。
6-适配器输入/输出负载均衡模式,在”模式5″的基础上,在接收数据的同时实现负载均衡,除要求ethtool命令可得到speed信息外,还要求支持对网卡MAC地址的动态修改功能。
4、rac双心跳的可行性
rac心跳使用双网口绑定后,是一个私有的地址隶属于一个vlan,采用主备模式,两条网线分别连接两个不同的交换机。这是操作系统层面就可实现的。如果rac心跳采用两个私有VLAN,那么心跳就会有两个私有地址。双心跳地址间如何做负载均衡或主备模式,就由ORACLE数据库自己来实现(操作系统层不再做绑定)。oracle在11G R2之后的版本11.2.0.2里支持这种方式,由于这个HAIP新特性刚推出有BUG,建议大家使用11.2.0.4版更稳定。官方的举例是针对多个数据库instance高互连带宽要求的。
官方具体说明请参见http://docs.oracle.com/database/121/RACAD/admin.htm#RACAD7295
文档ID 1210883.1详细介绍了HAIP,其中对HAIP的描述如下:
Redundant Interconnect without any 3rd-party IP failover technology (bond, IPMP or similar) is supported natively by Grid Infrastructure starting from 11.2.0.2. Multiple private network adapters can be defined either during the installation phase or afterward using the oifcfg. Oracle Database, CSS, OCR, CRS, CTSS, and EVM components in 11.2.0.2 employ it automatically.
Grid Infrastructure can activate a maximum of four private network adapters at a time even if more are defined. The ora.cluster_interconnect.haip resource will start one to four link local HAIP on private network adapters for interconnect communication for Oracle RAC, Oracle ASM, and Oracle ACFS etc.
Grid automatically picks free link local addresses from reserved 169.254.*.* subnet for HAIP. According to RFC-3927, link local subnet 169.254.*.* should not be used for any other purpose. With HAIP, by default, interconnect traffic will be load balanced across all active interconnect interfaces, and corresponding HAIP address will be failed over transparently to other adapters if one fails or becomes non-communicative. .
The number of HAIP addresses is decided by how many private network adapters are active when Grid comes up on the first node in the cluster . If there's only one active private network, Grid will create one; if two, Grid will create two; and if more than two, Grid will create four HAIPs. The number of HAIPs won't change even if more private network adapters are activated later, a restart of clusterware on all nodes is required for the number to change, however, the newly activated adapters can be used for fail over purpose.
5、每一套业务系统数据库的RAC心跳是否需要做vlan隔离?
oracle官方没有明确说明,出于安全的特定要求,自己可以做VLAN隔离,小的VLAN比较多则会增加一些管理和配置成本。
-------
1、节点1网卡损坏,无法接受到其他节点的心跳。
节点2能够接受到节点三的心跳,节点3能够接收到节点2的心跳。
节点1,心跳信息给votingdisk说:“只有我活着!”
节点2、3,心跳信息给votingdisk说:“我和2,我和3,都活着”。
votingdisk将在自身节点1的部分上写一个“赐死块”(kill block),节点1读取到后自杀。
(保留最大节点数部分的原则)
2、节点1能连接到votingdisk1、2、3,节点2只能连接votingdisk3。
则votingdisk在自身上面节点2的区域写下一个赐死块,节点2读取到后自杀。
(可访问的votingdisk数量大于不可访问的votingdisk数量时,节点可存活。可访问的votingdisk数量小于不可访问的votingdisk数量时,该节点不可存活。)
3、在两节点rac中,节点1或2的网卡损坏,造成无法通信。则节点2被赐死。
(脑裂的两部分节点数相同的情况下,instance number小的节点存活下来。)
4、各节点与votingdisk之间的连接全部中断,但各节点间心跳全通。则全部节点都将重启!
-------------------------------------Oracle Clusterware Software Concepts and Requirements
Oracle Clusterware uses voting disk files to provide fencing and cluster node membership determination. OCR provides cluster configuration information. You can place the Oracle Clusterware files on either Oracle ASM or on shared common disk storage. If you configure Oracle Clusterware on storage that does not provide file redundancy, then Oracle recommends that you configure multiple locations for OCR and voting disks. The voting disks and OCR are described as follows:
· Voting Disks
Oracle Clusterware uses voting disk files to determine which nodes are members of a cluster. You can configure voting disks on Oracle ASM, or you can configure voting disks on shared storage.
If you configure voting disks on Oracle ASM, then you do not need to manually configure the voting disks. Depending on the redundancy of your disk group, an appropriate number of voting disks are created.
If you do not configure voting disks on Oracle ASM, then for high availability, Oracle recommends that you have a minimum of three voting disks on physically separate storage. This avoids having a single point of failure. If you configure a single voting disk, then you must use external mirroring to provide redundancy.
You should have at least three voting disks, unless you have a storage device, such as a disk array that provides external redundancy. Oracle recommends that you do not use more than five voting disks. The maximum number of voting disks that is supported is 15. 1-3-5 都可以
· Oracle Cluster Registry
Oracle Clusterware uses the Oracle Cluster Registry (OCR) to store and manage information about the components that Oracle Clusterware controls, such as Oracle RAC databases, listeners, virtual IP addresses (VIPs), and services and any applications. OCR stores configuration information in a series of key-value pairs in a tree structure. To ensure cluster high availability, Oracle recommends that you define multiple OCR locations. In addition:
o You can have up to five OCR locations
o Each OCR location must reside on shared storage that is accessible by all of the nodes
o You can replace a failed OCR location online if it is not the only OCR location
o You must update OCR through supported utilities
See Also:
Chapter 2, "Administering Oracle Clusterware" for more information about voting disks and OCR
Oracle Clusterware Network Configuration Concepts
Oracle Clusterware enables a dynamic Grid Infrastructure through the self-management of the network requirements for the cluster. Oracle Clusterware 11g release 2 (11.2) supports the use of dynamic host configuration protocol (DHCP)
When you are using Oracle RAC, all of the clients must be able to reach the database. This means that the VIP addresses must be resolved by the clients. This problem is solved by the addition of the Oracle Grid Naming Service (GNS) to the cluster. GNS is linked to the corporate domain name service (DNS) so that clients can easily connect to the cluster and the databases running there. Activating GNS in a cluster requires a DHCP service on the public network. 最好都不用
Implementing GNS
To implement GNS, you must collaborate with your network administrator to obtain an IP address on the public network for the GNS VIP. DNS uses the GNS VIP to forward requests for access to the cluster to GNS. The network administrator must delegate a subdomain in the network to the cluster. The subdomain forwards all requests for addresses in the subdomain to the GNS VIP.
GNS and the GNS VIP run on one node in the cluster. The GNS daemon listens on the GNS VIP using port 53 for DNS requests. Oracle Clusterware manages the GNS and the GNS VIP to ensure that they are always available. If the server on which GNS is running fails, then Oracle Clusterware fails GNS over, along with the GNS VIP, to another node in the cluster.
With DHCP on the network, Oracle Clusterware obtains an IP address from the server along with other network information, such as what gateway to use, what DNS servers to use, what domain to use, and what NTP server to use. Oracle Clusterware initially obtains the necessary IP addresses during cluster configuration and it updates the Oracle Clusterware resources with the correct information obtained from the DHCP server.
Single Client Access Name (SCAN)
Oracle RAC 11g release 2 (11.2) introduces the Single Client Access Name (SCAN). The SCAN is a single name that resolves to three IP addresses in the public network. When using GNS and DHCP, Oracle Clusterware configures the VIP addresses for the SCAN name that is provided during cluster configuration.
The node VIP and the three SCAN VIPs are obtained from the DHCP server when using GNS. If a new server joins the cluster, then Oracle Clusterware dynamically obtains the required VIP address from the DHCP server, updates the cluster resource, and makes the server accessible through GNS.
Example 1-1 shows the DNS entries that delegate a domain to the cluster.
Example 1-1 DNS Entries
# Delegate to gns on mycluster
mycluster.example.com NS myclustergns.example.com
#Let the world know to go to the GNS vip
myclustergns.example.com. 10.9.8.7
See Also:
Oracle Grid Infrastructure Installation Guide for details about establishing resolution through DNS
Configuring Addresses Manually
Alternatively, you can choose manual address configuration, in which you configure the following:
· One public host name for each node.
· One VIP address for each node.
You must assign a VIP address to each node in the cluster. Each VIP address must be on the same subnet as the public IP address for the node and should be an address that is assigned a name in the DNS. Each VIP address must also be unused and unpingable from within the network before you install Oracle Clusterware.
· Up to three SCAN addresses for the entire cluster.
Note:
The SCAN must resolve to at least one address on the public network. For high availability and scalability, Oracle recommends that you configure the SCAN to resolve to three addresses.
See Also:
Your platform-specific Oracle Grid Infrastructure Installation Guide installation documentation for information about system requirements and configuring network addresses
Overview of Oracle Clusterware Platform-Specific Software Components
When Oracle Clusterware is operational, several platform-specific processes or services run on each node in the cluster. This section describes these various processes and services.
----nodeapps
----cluster resource
The Oracle Clusterware Stack
Oracle Clusterware consists of two separate stacks: an upper stack anchored by the Cluster Ready Services (CRS) daemon (crsd) and a lower stack anchored by the Oracle High Availability Services daemon (ohasd). These two stacks have several processes that facilitate cluster operations. The following sections describe these stacks in more detail: cssd呢
· The Cluster Ready Services Stack
· The Oracle High Availability Services Stack
The Cluster Ready Services Stack
The list in this section describes the processes that comprise CRS. The list includes components that are processes on Linux and UNIX operating systems, or services on Windows.
· Cluster Ready Services (CRS): The primary program for managing high availability operations in a cluster.
The CRS daemon (crsd) manages cluster resources based on the configuration information that is stored in OCR for each resource. This includes start, stop, monitor, and failover operations. The crsd process generates events when the status of a resource changes. When you have Oracle RAC installed, the crsd process monitors the Oracle database instance, listener, and so on, and automatically restarts these components when a failure occurs.
· Cluster Synchronization Services
The cssdagent process monitors the cluster and provides I/O fencing. This service formerly was provided by Oracle Process Monitor Daemon (oprocd), also known as OraFenceService on Windows. A cssdagent failure may result in Oracle Clusterware restarting the node. nodeapps
· Oracle ASM: Provides disk management for Oracle Clusterware and Oracle Database.
· Cluster Time Synchronization Service (CTSS): Provides time management in a cluster for Oracle Clusterware.
· Event Management (EVM): A background process that publishes events
· Oracle Notification Service (ONS): A publish and subscribe service for communicating Fast Application Notification (FAN) events.
· Oracle Agent (oraagent): Extends clusterware to support Oracle-specific requirements and complex resources. This process runs server callout scripts when FAN events occur. This process was known as RACG in Oracle Clusterware 11g release 1 (11.1).
· Oracle Root Agent (orarootagent): A specialized oraagent process that helps crsd manage resources owned by root, such as the network, and the Grid virtual IP address.
The Cluster Synchronization Service (CSS), Event Management (EVM), and Oracle Notification Services (ONS) components communicate with other cluster component layers on other nodes in the same cluster database environment. These components are also the main communication links between Oracle Database, applications, and the Oracle Clusterware high availability components. In addition, these background processes monitor and manage database operations.
css ons evmd
The Oracle High Availability Services Stack
This section describes the processes that comprise the Oracle High Availability Services stack. The list includes components that are processes on Linux and UNIX operating systems, or services on Windows.
· Cluster Logger Service (ologgerd): Receives information from all the nodes in the cluster and persists in a CHM Repository-based database. This service runs on only two nodes in a cluster.
· System Monitor Service (osysmond): The monitoring and operating system metric collection service that sends the data to the cluster logger service. This service runs on every node in a cluster.
· Grid Plug and Play (GPNPD): Provides access to the Grid Plug and Play profile, and coordinates updates to the profile among the nodes of the cluster to ensure that all of the nodes have the most recent profile. OLR?
· Grid Interprocess Communication (GIPC): A support daemon that enables Redundant Interconnect Usage.
· Multicast Domain Name Service (mDNS): Used by Grid Plug and Play to locate profiles in the cluster, as well as by GNS to perform name resolution. The mDNS process is a background process on Linux and UNIX, and a service on Windows.
· Oracle Grid Naming Service (GNS): Handles requests sent by external DNS servers, performing name resolution for names defined by the cluster.
二. 查看OHASD 资源
Oracle High Availability Services Daemon (OHASD) :This process anchors the lower part of the Oracle Clusterware stack, which consists of processes that facilitate cluster operations.
在11gR2里面启动CRS的时候,会提示ohasd已经启动。 那么这个OHASD到底包含哪些资源。 我们可以通过如下命令来查看:
[grid@racnode1 ~]$ crsctl stat res -init -t
---------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
---------------------------------------------------------
Cluster Resources
---------------------------------------------------------
ora.asm
1 ONLINE ONLINE racnode1 Started
ora.crsd
1 ONLINE ONLINE racnode1
ora.cssd
1 ONLINE ONLINE racnode1
ora.cssdmonitor
1 ONLINE ONLINE racnode1
ora.ctssd
1 ONLINE ONLINE racnode1 OBSERVER
ora.diskmon
1 ONLINE ONLINE racnode1
ora.drivers.acfs
1 ONLINE UNKNOWN racnode1
ora.evmd
1 ONLINE ONLINE racnode1
ora.gipcd
1 ONLINE ONLINE racnode1
ora.gpnpd
1 ONLINE ONLINE racnode1
ora.mdnsd
1 ONLINE ONLINE racnode1
分别看下这些进程:
(1)ora.asm:这个是asm 实例的进程。 在10g里, OCR和Voting disk 是放在其他共享设备上的。 11gR2里面,默认是放在ASM里面。 在Clusterware启动的时候需要读取这些信息,所以在集群启动的时候需要先启动ASM实例。
(2)ora.crsd,ora.cssd 和 ora.evmd:
这三个进程是Clusterware中最重要的3个进程.
会要求在每个节点执行root.sh 脚本, 这个脚本会在/etc/inittab 文件的最后把这3个进程加入启动项,这样以后每次系统启动时,Clusterware 也会自动启动,其中EVMD和CRSD 两个进程如果出现异常,则系统会自动重启这两个进程,如果是CSSD 进程异常,系统会立即重启。
在11gR2中,只会将ohasd 写入/etc/inittab 文件。
[grid@racnode1 init.d]$ cat /etc/inittab
h1:35:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 </dev/null
所以在10g中常用的/etc/init.d/init.crs 之类的命令都没有了。 就剩下一个/etc/init.d/init.ohasd 进程。
OCSSD :这个进程是Clusterware最关键的进程,如果这个进程出现异常,会导致系统重启,这个进程提供CSS(Cluster Synchronization Service)服务。 CSS 服务通过多种心跳机制实时监控集群状态,提供脑裂保护等基础集群服务功能。
CRSD:是实现"高可用性(HA)"的主要进程,它提供的服务叫作CRS(Cluster Ready Service) 服务。所有需要 高可用性 的组件,都会在安装配置的时候,以CRS Resource的形式登记到OCR中,而CRSD 进程就是根据OCR中的内容,决定监控哪些进程,如何监控,出现问题时又如何解决。也就是说,CRSD 进程负责监控CRS Resource 的运行状态,并要启动,停止,监控,Failover这些资源。 默认情况下,CRS 会自动尝试重启资源5次,如果还是失败,则放弃尝试。
CRS Resource 包括GSD(Global Serveice Daemon),ONS(Oracle Notification Service),VIP, Database, Instance 和 Service. nodeapps resource 是本地资源。
EVMD:负责发布CRS 产生的各种事件(Event). 这些Event可以通过2种方式发布给客户:ONS 和 Callout Script.
这三个进程各自的作用,具体参考
RAC 的一些概念性和原理性的知识
http://www.cndba.cn/Dave/article/1021
中的说明。
(3)Grid Plug and Play (GPNPD):
Provides access to the Grid Plug and Play profile, and coordinates updates to the profile among the nodes of the cluster to ensure that all of the nodes have the most recent profile.
(4)Grid Interprocess Communication (GIPC):
A support daemon that enables Redundant Interconnect Usage.
(5)ora.mdns
Used by Grid Plug and Play to locate profiles in the cluster, as well as by GNS to perform name resolution. The mDNS process is a background process on Linux and UNIX, and a service on Windows.
(6)Cluster Time Synchronization Service (CTSS):
Provides time management in a cluster for Oracle Clusterware. 在上面的查询结果中,我们看到CTSS 的状态是OBSERVER。即旁观者。
在11gR2中,RAC在安装的时候,时间同步可以用两种方式来实现,一是NTP,还有就是CTSS. 当安装程序发现 NTP 协议处于非活动状态时,安装集群时间同步服务将以活动模式自动进行安装并通过所有节点的时间。如果发现配置了 NTP,则以观察者模式启动集群时间同步服务,Oracle Clusterware 不会在集群中进行活动的时间同步。
(7)Automatic Storage Management Cluster File System (Oracle ACFS):
Oracle Automatic Storage Management Cluster File System (Oracle ACFS) is a multi-platform, scalable file system, and storage management technology that extends Oracle Automatic Storage Management (Oracle ASM) functionality to support customer files maintained outside of Oracle Database. Oracle ACFS supports many database and application files, including executables, database trace files, database alert logs, application reports, BFILEs, and configuration files. Other supported files are video, audio, text, images, engineering drawings, and other general-purpose application file data.
An Oracle ACFS file system is a layer on Oracle ASM and is configured with Oracle ASM storage, as shown in Figure 5-1. Oracle ACFS leverages Oracle ASM functionality that enables:
· Oracle ACFS dynamic file system resizing
· Maximized performance through direct access to Oracle ASM disk group storage
· Balanced distribution of Oracle ACFS across Oracle ASM disk group storage for increased I/O parallelism
· Data reliability through Oracle ASM mirroring protection mechanisms
更多内容参考:
http://download.oracle.com/docs/cd/E11882_01/server.112/e16102/asmfilesystem.htm#OSTMG31000
三. 查看CRS资源
在11.2中,对CRSD资源进行了重新分类: Local Resources 和 Cluster Resources。 OHASD 指的就是Cluster Resource.
[grid@racnode1 ~]$ crsctl stat res -t
---------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
---------------------------------------------------------
Local Resources
---------------------------------------------------------
ora.CRS.dg
ONLINE ONLINE racnode1
ONLINE ONLINE racnode2
ora.DATA.dg
ONLINE ONLINE racnode1
ONLINE ONLINE racnode2
ora.FRA.dg
ONLINE ONLINE racnode1
ONLINE ONLINE racnode2
ora.LISTENER.lsnr
ONLINE ONLINE racnode1
ONLINE ONLINE racnode2
ora.asm
ONLINE ONLINE racnode1 Started
ONLINE ONLINE racnode2 Started
ora.eons
ONLINE ONLINE racnode1
ONLINE ONLINE racnode2
ora.gsd
OFFLINE OFFLINE racnode1
OFFLINE OFFLINE racnode2
ora.net1.network
ONLINE ONLINE racnode1
ONLINE ONLINE racnode2
ora.ons
ONLINE ONLINE racnode1
ONLINE ONLINE racnode2
ora.registry.acfs
ONLINE UNKNOWN racnode1
ONLINE ONLINE racnode2
---------------------------------------------------------
Cluster Resources
---------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE racnode2
ora.oc4j
1 OFFLINE OFFLINE
ora.racdb.db
1 ONLINE ONLINE racnode1 Open
2 ONLINE ONLINE racnode2 Open
ora.racnode1.vip
1 ONLINE ONLINE racnode1
ora.racnode2.vip
1 ONLINE ONLINE racnode2
ora.scan1.vip
1 ONLINE ONLINE racnode2
[grid@racnode1 ~]$
从上面的查询结果可以看出,在11gR2中把network,disgroup,eons,和 asm 也作为了一种资源。
还有一点需要注意:就是gsd 和 oc4j 这两资源,他们是offlie的。 说明如下:
ora.gsd is OFFLINE by default if there is no 9i database in the cluster.
ora.oc4j is OFFLINE in 11.2.0.1 as Database Workload Management(DBWLM) is unavailable. these can be ignored in 11gR2 RAC.
也可用如下命令来查看进程:
[root@racnode1 ~]# crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.CRS.dg ora....up.type ONLINE ONLINE racnode1
ora.DATA.dg ora....up.type ONLINE ONLINE racnode1
ora.FRA.dg ora....up.type ONLINE ONLINE racnode1
ora....ER.lsnr ora....er.type ONLINE ONLINE racnode1
ora....N1.lsnr ora....er.type ONLINE ONLINE racnode2
ora.asm ora.asm.type ONLINE ONLINE racnode1
ora.eons ora.eons.type ONLINE ONLINE racnode1
ora.gsd ora.gsd.type OFFLINE OFFLINE
ora....network ora....rk.type ONLINE ONLINE racnode1
ora.oc4j ora.oc4j.type OFFLINE OFFLINE
ora.ons ora.ons.type ONLINE ONLINE racnode1
ora.racdb.db ora....se.type ONLINE ONLINE racnode1
ora....SM1.asm application ONLINE ONLINE racnode1
ora....E1.lsnr application ONLINE ONLINE racnode1
ora....de1.gsd application OFFLINE OFFLINE
ora....de1.ons application ONLINE ONLINE racnode1
ora....de1.vip ora....t1.type ONLINE ONLINE racnode1
ora....SM2.asm application ONLINE ONLINE racnode2
ora....E2.lsnr application ONLINE ONLINE racnode2
ora....de2.gsd application OFFLINE OFFLINE
ora....de2.ons application ONLINE ONLINE racnode2
ora....de2.vip ora....t1.type ONLINE ONLINE racnode2
ora....ry.acfs ora....fs.type ONLINE ONLINE racnode2
ora.scan1.vip ora....ip.type ONLINE ONLINE racnode1
ora.scan2.vip ora....ip.type ONLINE ONLINE racnode2
[root@racnode1 ~]#
四. 查看各种资源之间的依赖关系
比如DG resource依赖于ASM,VIP依赖于network。这些可以从资源的详细属性看出:
[root@racnode1 ~]# crsctl stat res ora.DATA.dg -p
NAME=ora.DATA.dg
TYPE=ora.diskgroup.type
ACL=owner:grid:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
ALIAS_NAME=
AUTO_START=never
CHECK_INTERVAL=300
CHECK_TIMEOUT=600
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION=CRS resource type definition for ASM disk group resource
ENABLED=1
LOAD=1
LOGGING_LEVEL=1
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
START_DEPENDENCIES=hard(ora.asm) pullup(ora.asm)
START_TIMEOUT=900
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=hard(intermediate:ora.asm)
STOP_TIMEOUT=180
UPTIME_THRESHOLD=1d
USR_ORA_ENV=
USR_ORA_OPI=false
USR_ORA_STOP_MODE=
VERSION=11.2.0.1.0
[grid@racnode1 ~]$ crsctl stat res ora.racnode1.vip -p
NAME=ora.racnode1.vip
TYPE=ora.cluster_vip_net1.type
ACL=owner:root:rwx,pgrp:root:r-x,other::r--,group:oinstall:r-x,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=1
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=restore
CARDINALITY=1
CHECK_INTERVAL=1
DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=vip)
DEGREE=1
DESCRIPTION=Oracle VIP resource
ENABLED=1
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=racnode1
LOAD=1
LOGGING_LEVEL=1
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
PLACEMENT=favored
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=0
SCRIPT_TIMEOUT=60
SERVER_POOLS=*
START_DEPENDENCIES=hard(ora.net1.network) pullup(ora.net1.network)
START_TIMEOUT=0
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=hard(ora.net1.network)
STOP_TIMEOUT=0
UPTIME_THRESHOLD=1h
USR_ORA_ENV=
USR_ORA_VIP=racnode1-vip
VERSION=11.2.0.1.0