1、实验环境:

Node1:192.168.1.17(RHEL5.8_32bit,web server)

Node2:192.168.1.18(RHEL5.8_32bit,web server)

SteppingStone:192.168.1.19(RHEL5.8_32bit)

VIP:192.168.1.20


2、准备工作

<1> 配置主机名

节点名称使用/etc/hosts解析;节点名称必须跟uname -n命令的执行结果一致

Node1:

# hostname node1.ikki.com
# vim /etc/sysconfig/network
HOSTNAME=node1.ikki.com

Node2:

# hostname node1.ikki.com
# vim /etc/sysconfig/network
HOSTNAME=node2.ikki.com

<2> 配置节点ssh基于密钥方式互相通信

Node1:

# ssh-keygen -t rsa
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2

Node2:

# ssh-keygen -t rsa
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1

<3> 配置各节点基于主机名互相通信

Node1&Node2:

# vim /etc/hosts
192.168.1.17   node1.ikki.com node1
192.168.1.18   node2.ikki.com node2

<4> 配置各节点时间同步

Node1&Node2:

# crontab -e
*/5 * * * *     /sbin/ntpdate 202.120.2.101 &> /dev/null

<5> 配置跳板机(SteppingStone)

与Node1和Node2建立ssh互信,且基于主机名通信:

# ssh-keygen -t rsa
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node1
# ssh-copy-id -i ~/.ssh/id_rsa.pub root@node2
# vim /etc/hosts
192.168.1.17   node1.ikki.com node1
192.168.1.18   node2.ikki.com node2

制作同步远程执行命令的step脚本工具:

# vim step
#!/bin/bash
if [ $# -eq 1 ]; then
  for I in {1..2}; do
    ssh node$I $1;
  done
else
  echo "Usage:step 'COMMANDs'"
fi
# chmod +x step
# mv step /usr/sbin

<6> Node1和Node2两个节点上各提供了一个大小相同的分区作为drbd设备

为各个节点上创建LVM逻辑卷,大小为1G

# fdisk /dev/sda
n --> e --> n --> +1G --> w
# partprobe /dev/sda


3、安装内核模块和管理工具

安装最新的8.3的版本:

drbd83-8.3.15-2.el5.centos.i386.rpm

kmod-drbd83-8.3.15-3.el5.centos.i686.rpm

在SteppingStone上执行远程安装:

# step 'yum -y --nogpgcheck localinstall drbd83-8.3.8-1.el5.centos.i386.rpm kmod-drbd83-8.3.8-1.el5.centos.i686.rpm'


4、配置drbd(Node1)

<1> 复制样例文件为配置文件:

# cp /usr/share/doc/drbd83-8.3.8/drbd.conf  /etc

<2> 配置/etc/drbd.d/global-common.conf

global {
        usage-count no;    # 禁用信息统计
        # minor-count dialog-refresh disable-ip-verification
}
common {
        protocol C;    # 默认使用同步协议
        handlers {
                # These are EXAMPLE handlers only.
                # They may have severe implications,
                # like hard resetting the node under certain circumstances.
                # Be careful when chosing your poison.
                pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
                pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
                local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
                # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
                # split-brain "/usr/lib/drbd/notify-split-brain.sh root";
                # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
                # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
                # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
        }
        startup {
                # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
        }
        disk {
                on-io-error detach;    # 当磁盘IO错误时执行分离
                # on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes
                # no-disk-drain no-md-flushes max-bio-bvecs
        }
        net {
                # sndbuf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers
                # max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret
                # after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-corki
                cram-hmac-alg "sha1";    # 同步时验证所使用的算法
                shared-secret "mydrbd7788";    # 共享的密码
        }
        syncer {
                rate 200M;    # 同步速率
                # rate after al-extents use-rle cpu-mask verify-alg csums-alg
        }
}

<3> 定义一个资源/etc/drbd.d/mydrbd.res,内容如下:

resource mydrbd {
        device  /dev/drbd0;
        disk    /dev/sda5;
        meta-disk internal;
        on node1.ikki.com {
                address 192.168.1.17:7789;
        }
        on node2.ikki.com {
                address 192.168.1.18:7789;
        }
}

将以上配置的文件全部同步至另外一个节点

# scp -r /etc/drbd.*  node2:/etc


5、在两个节点上初始化已定义的资源并启动服务:

<1> 初始化资源(Node1和Node2):

# drbdadm create-md web

<2> 启动服务(Node1和Node2):

# /etc/init.d/drbd start

<3> 查看启动状态(Node1):

# cat /proc/drbd
version: 8.3.15 (api:88/proto:86-97)
GIT-hash: 0ce4d235fc02b5c53c1c52c53433d11a694eab8c build by mockbuild@builder17.centos.org, 2013-03-27 16:04:08
 0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:987896

<4> 将当前节点设置为主节点(Node1)

# drbdadm -- --overwrite-data-of-peer primary mydrbd

注:适用于初次设置

再次查看状态:

# drbd-overview
  0:mydrbd  Connected Primary/Secondary UpToDate/UpToDate C r-----

注:Primary/Secondary:当前节点/另一节点


6、创建文件系统并挂载(主节点Node1)

文件系统的挂载只能在Primary节点进行,因此在主节点上对drbd设备进行格式化:

# mke2fs -j /dev/drbd0
# mkdir /mydata
# mount /dev/drbd0 /mydata


7、切换主从节点进行测试

Node1:

# cp /etc/inittab /mydata
# umount /mydata
# drbdadm secondary mydrbd
# drbd-overview
  0:mydrbd  Connected Secondary/Secondary UpToDate/UpToDate C r-----

Node2:

# drbdadm primary mydrbd
# drbd-overview
  0:mydrbd  Connected Primary/Secondary UpToDate/UpToDate C r-----
# mkdir /mydata
# mount /dev/drbd0 /mydata
# ls /mydata


8、配置openais/corosync+pacemaker

<1> 安装corosync和pacemaker(SteppingStone)

# cd /root/corosync/
# ls
cluster-glue-1.0.6-1.6.el5.i386.rpm
cluster-glue-libs-1.0.6-1.6.el5.i386.rpm
corosync-1.2.7-1.1.el5.i386.rpm
corosynclib-1.2.7-1.1.el5.i386.rpm
heartbeat-3.0.3-2.3.el5.i386.rpm
heartbeat-libs-3.0.3-2.3.el5.i386.rpm
libesmtp-1.0.4-5.el5.i386.rpm
pacemaker-1.1.5-1.1.el5.i386.rpm
pacemaker-libs-1.1.5-1.1.el5.i386.rpm
resource-agents-1.0.4-1.1.el5.i386.rpm
# step 'mkdir /root/corosync'
# for I in {1..2};do scp *.rpm node$I:/root/corosync;done
# step 'yum -y --nogpgcheck localinstall /root/corosync/*.rpm'
# step 'mkdir /var/log/cluster'

<2> 修改corosync配置并密钥认证(Node1)

# cd /etc/corosync/
# cp corosync.conf.example corosync.conf
# vim corosync.conf
# 修改如下内容:
secauth: on
threads: 2
bindnetaddr: 192.168.1.0
to_syslog: no
# vim corosync.conf
# 添加如下内容:
service {
        ver:    0
        name:   pacemaker
}
aisexec {
        user:   root
        group:  root
}
# corosync-keygen
# scp -p authkey corosync.conf node2:/etc/corosync/

<3> 启动服务并检查(Node1)

# service corosync start
# ssh node2 'service corosync start'
# grep -e "Corosync Cluster Engine" -e "configuration file" /var/log/cluster/corosync.log
# grep  TOTEM  /var/log/cluster/corosync.log
# grep pcmk_startup /var/log/cluster/corosync.log

<4> 配置集群属性

禁用stonith设备、关闭法定票数策略、设置默认粘性:

# crm configure property stonith-enabled=false
# crm configure property no-quorum-policy=ignore
# crm configure rsc_defaults resource-stickiness=100

查看集群配置:

# crm configure show
node node1.ikki.com
node node2.ikki.com
property $id="cib-bootstrap-options" \
        dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
        resource-stickiness="100


9、将已经配置好的drbd设备/dev/drbd0定义为集群服务

<1> 停止drbd服务并关闭自启动(Node1和Node2)

# service drbd stop
# chkconfig drbd off

<2> 配置drbd为集群资源(Node1)

添加mydrbd资源并设置为主从资源:

# crm configure primitive mysqldrbd ocf:linbit:drbd params drbd_resource=mydrbd op start timeout=240 op stop timeout=100 op monitor role=Master interval=20 timeout=30 op monitor role=Slave interval=30 timeout=30
# crm configure ms ms_mysqldrbd mysqldrbd meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true

注:高可用资源不可与drbd资源重名;crm status如显示出错,则检查配置后重启corosync服务即可

查看当前集群运行状态:

# crm status    
============
Last updated: Sat Sep 21 23:27:01 2013
Stack: openais
Current DC: node1.ikki.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
1 Resources configured.
============
Online: [ node2.ikki.com node1.ikki.com ]
 Master/Slave Set: ms_mysqldrbd [mysqldrbd]
     Masters: [ node1.ikki.com ]
     Slaves: [ node2.ikki.com ]

<3> 为主节点上的mydrbd资源创建自动挂载的集群服务(Node1)

# crm configure primitive mystore ocf:heartbeat:Filesystem params device=/dev/drbd0 directory=/mydata fstype=ext3 op start timeout=60 op stop timeout=60
# crm configure colocation mystore_with_ms_mysqldrbd inf: mystore ms_mysqldrbd:Master
# crm configure order mystore_after_ms_mysqldrbd mandatory: ms_mysqldrbd:promote mystore:start

查看资源的运行状态:

# crm status
============
Last updated: Sat Sep 21 23:55:01 2013
Stack: openais
Current DC: node1.ikki.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ node2.ikki.com node1.ikki.com ]
 Master/Slave Set: ms_mysqldrbd [mysqldrbd]
     Masters: [ node1.ikki.com ]
     Slaves: [ node2.ikki.com ]
 mystore        (ocf::heartbeat:Filesystem):    Started node1.ikki.com

<4> 模拟故障进行测试

将node1设置为standby,则资源转移至node2

# crm node standby
# crm status
============
Last updated: Sat Sep 21 23:59:38 2013
Stack: openais
Current DC: node1.ikki.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Node node1.ikki.com: standby
Online: [ node2.ikki.com ]
 Master/Slave Set: ms_mysqldrbd [mysqldrbd]
     Masters: [ node2.ikki.com ]
     Stopped: [ mysqldrbd:0 ]
 mystore        (ocf::heartbeat:Filesystem):    Started node2.ikki.com
# ls /mydata/
inittab  lost+found

将node1设置为online,显示node2为主节点

# crm node online
# crm status
============
Last updated: Sat Sep 21 23:59:59 2013
Stack: openais
Current DC: node1.ikki.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
2 Resources configured.
============
Online: [ node2.ikki.com node1.ikki.com ]
 Master/Slave Set: ms_mysqldrbd [mysqldrbd]
     Masters: [ node2.ikki.com ]
     Slaves: [ node1.ikki.com ]
 mystore        (ocf::heartbeat:Filesystem):    Started node2.ikki.com


10、配置高可用MySQL集群服务

<1> 在各个节点上安装MySQL服务(SteppingStone)

这里使用通用二进制安装mysql-5.5.28版本

# for I in {1..2};do scp mysql-5.5.28-linux2.6-i686.tar.gz node$I:/usr/src/;done
# step 'tar -xf /usr/src/mysql-5.5.28-linux2.6-i686.tar.gz -C /usr/local'
# step 'ln -sv /usr/local/mysql-5.5.28-linux2.6-i686 /usr/local/mysql'
# step 'groupadd -g 3306 mysql'      
# step 'useradd -u 3306 -g mysql -s /sbin/nologin -M mysql
# step 'mkdir /mydata/data'
# step 'chown -R mysql.mysql /mydata/data'
# step 'chown -R root.mysql /usr/local/mysql/*'
# step 'cp /usr/local/mysql/support-files/my-large.cnf /etc/my.cnf'
# step 'cp /usr/local/mysql/support-files/mysql.server /etc/init.d/mysqld'
# step 'chkconfig --add mysqld'

<2> 在主节点上初始化MySQL并配置启动测试(Node2)

# cd /usr/local/mysql
# scripts/mysql_install_db --user=mysql --datadir=/mydata/data
# vim /etc/my.cnf
在[mysqld]下添加如下:
datadir = /mydata/data
# service mysqld start
# service mysqld stop
# chkconfig mysqld off

<3> 将Node1设置为主节点并配置MySQL(无需再次初始化)

将node2设置为standby,则资源转移至node1

# crm node standby
# crm node online

在node1上配置MySQL服务并启动测试

# vim /etc/my.cnf
在[mysqld]下添加如下内容:
datadir = /mydata/data
# service mysqld start
# service mysqld stop
# chkconfig mysqld off

<4> 配置主资源mysqld和vip(Node1)

# crm configure primitive mysqld lsb:mysqld
# crm configure colocation mysqld_with_mystore inf: mysqld mystore
# crm configure order mysqld_after_mystore mandatory: mystore mysqld
# crm configure primitive vip ocf:heartbeat:IPaddr params ip=192.168.1.20 nic=eth0 cidr_netmask=24
# crm configure colocation vip_with_ms_mysqldrbd inf: ms_mysqldrbd:Master vip

查看资源的运行状态:

# crm status
============
Last updated: Sun Sep 22 13:03:27 2013
Stack: openais
Current DC: node1.ikki.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
4 Resources configured.
============
Online: [ node2.ikki.com node1.ikki.com ]
 Master/Slave Set: ms_mysqldrbd [mysqldrbd]
     Masters: [ node1.ikki.com ]
     Slaves: [ node2.ikki.com ]
 mystore        (ocf::heartbeat:Filesystem):    Started node1.ikki.com
 mysqld (lsb:mysqld):   Started node1.ikki.com
 vip    (ocf::heartbeat:IPaddr):        Started node1.ikki.com

查看集群配置:

# crm configure show
node node1.ikki.com \
        attributes standby="off"
node node2.ikki.com \
        attributes standby="off"
primitive mysqld lsb:mysqld
primitive mysqldrbd ocf:linbit:drbd \
        params drbd_resource="mydrbd" \
        op start interval="0" timeout="240" \
        op stop interval="0" timeout="100" \
        op monitor interval="20" role="Master" timeout="30" \
        op monitor interval="30" role="Slave" timeout="30"
primitive mystore ocf:heartbeat:Filesystem \
        params device="/dev/drbd0" directory="/mydata" fstype="ext3" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60"
primitive vip ocf:heartbeat:IPaddr \
        params ip="192.168.1.20" nic="eth0" cidr_netmask="24"
ms ms_mysqldrbd mysqldrbd \
        meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
colocation mysqld_with_mystore inf: mysqld mystore
colocation mystore_with_ms_mysqldrbd inf: mystore ms_mysqldrbd:Master
colocation vip_with_ms_mysqldrbd inf: ms_mysqldrbd:Master vip
order mysqld_after_mystore inf: mystore mysqld
order mystore_after_ms_mysqldrbd inf: ms_mysqldrbd:promote mystore:start
property $id="cib-bootstrap-options" \
        dc-version="1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="false" \
        no-quorum-policy="ignore"
rsc_defaults $id="rsc-options" \
        resource-stickiness="100"


11、模拟故障测试

在主节点上配置MysSQL远程访问账号(Node1)

# /usr/local/mysql/bin/mysql
mysql> grant all on *.* to root@'%' identified by 'ikki';
mysql> flush privileges;

在跳板机上远程测试访问(SteppingStone)

# mysql -uroot -h192.168.1.20 -p

将node1设置为standby并查看集群状态(Node1)

# crm node standby
# crm status
============
Last updated: Sun Sep 22 13:47:00 2013
Stack: openais
Current DC: node1.ikki.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
4 Resources configured.
============
Node node1.ikki.com: standby
Online: [ node2.ikki.com ]
 Master/Slave Set: ms_mysqldrbd [mysqldrbd]
     Masters: [ node2.ikki.com ]
     Stopped: [ mysqldrbd:0 ]
 mystore        (ocf::heartbeat:Filesystem):    Started node2.ikki.com
 mysqld (lsb:mysqld):   Started node2.ikki.com
 vip    (ocf::heartbeat:IPaddr):        Started node2.ikki.com

在跳板机上远程测试访问(SteppingStone)

# mysql -uroot -h192.168.1.20 -p

将node1设置为online并查看集群状态(Node1)

# crm node online
# crm status
============
Last updated: Sun Sep 22 13:52:09 2013
Stack: openais
Current DC: node1.ikki.com - partition with quorum
Version: 1.1.5-1.1.el5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, 2 expected votes
4 Resources configured.
============
Online: [ node2.ikki.com node1.ikki.com ]
 Master/Slave Set: ms_mysqldrbd [mysqldrbd]
     Masters: [ node2.ikki.com ]
     Slaves: [ node1.ikki.com ]
 mystore        (ocf::heartbeat:Filesystem):    Started node2.ikki.com
 mysqld (lsb:mysqld):   Started node2.ikki.com
 vip    (ocf::heartbeat:IPaddr):        Started node2.ikki.com