一)hearbeat原理
heartbeat (Linux-HA)的工作原理:heartbeat最核心的包括两个部分,心跳监测部分和资源接管部分,心跳监测可以通过网络链路和串口进行,而且支持冗 余链路,它们之间相互发送报文来告诉对方自己当前的状态,如果在指定的时间内未受到对方发送的报文,那么就认为对方失效,这时需启动资源接管模块来接管运 行在对方主机上的资源或者服务。
二)hearbeat配置
实现目的,当节点1宕机后,节点2能立马提供服务。
hearbeat1:192.168.1.122
hearbeat2:192.168.1.114
nfs:192.168.1.122
前提:
1)定义好节点名称,每个节点都能互相解析,可以在/etc/hosts中定义
2)当节点多时,时间最好都同步,最好有个时间服务器ntp
3)每个节点能基于ssh密钥通信
各节点修改主机名:
[root@node1 ~]# cat /etc/sysconfig/network NETWORKING=yes HOSTNAME=node1.shunzi.com [root@node1 ~]# uname -n node1.shunzi.com [root@node2 ~]# cat /etc/sysconfig/network NETWORKING=yes HOSTNAME=node2.shunzi.com [root@node2 ~]# uname -n node2.shunzi.com
各节点时间同步
ntpdate 0.centos.pool.ntp.org
每个节点基于密钥通信,无须输入密码
ssh-keygen -t rsa -P ''
scp -i .ssh/id_rsa.pub root@node2.shunzi.com
ssh-keygen -t rsa -P ''
scp -i .ssh/id_rsa.pub root@node2.shunzi.com
提前安装需要的包组和包
yum groupinstall "Development tools"
yum groupinstall "Server Platform Development"
yum -y install libnet PyXML perl-Timedate net-snmp-libs
安装hearbeat包,两个节点都需要安装
heartbeat-2.1.4-12.el6.x86_64.rpm核心程序包 heartbeat-debuginfo-2.1.4-12.el6.x86_64.rpm heartbeat-devel-2.1.4-12.el6.x86_64.rpm开发组件 heartbeat-gui-2.1.4-12.el6.x86_64.rpm图形界面 heartbeat-ldirectord-2.1.4-12.el6.x86_64.rp heartbeat-pils-2.1.4-12.el6.x86_64.rpm heartbeat-stonith-2.1.4-12.el6.x86_64.rpm rpm -ivh heartbeat-2.1.4-12.el6.x86_64.rpm heartbeat-stonith-2.1.4-12.el6.x86_64.rpm heartbeat-pils-2.1.4-12.el6.x86_64.rpm
安装完成后,把配置文件和认证文件复制过来;
cp /usr/share/doc/heartbeat-2.1.4/authkeys /etc/ha.d/
cp /usr/share/doc/heartbeat-2.1.4/ha.cf /etc/ha.d/
cp /usr/share/doc/heartbeat-2.1.4/haresources /etc/ha.d
修改认证文件:authkeys
可以通过openssl生成复杂密码
[root@node1 ha.d]# openssl rand -hex 8 生成16位的随机数
7dd04dabdfd104cd
vim /etc/ha.d/authkeys
权限必须是600
chmod 600 authkeys
配置核心配置文件
[root@node1 ha.d]# vim ha.cf
[root@node1 ha.d]# egrep -v "^$|^#" ha.cf logfile /var/log/ha-log--》自定义日志存放目录 deadtime 8--》探测对方存在次数 warntime 4--》发起警告 udpport 694--》使用端口 mcast eth0 225.0.0.1 694 1 0--》使用组播地址 auto_failback on--》启动自动转回 node node1.shunzi.com--》定义节点 node node2.shunzi.com--》定义节点 ping 192.168.1.253--》测试网络,这里用的路由网关 compression bz2--》传送心跳信息,选择压缩传送 compression_threshold 2--》大于2k的才压缩
既然是做web服务,安装httpd做测试;
node1配置
echo "192.168.1.122:node1" >> index.html
启动httpd
service httpd start
node2配置
[root@node2 htdocs]# echo "192.168.1.114:node2" >> index.html
[root@node2 htdocs]# curl http://192.168.1.114/index.html
192.168.1.114:node2
测试ok后,要关闭httpd,开机不能自动启动,因为做集群时各节点都需要资源代理通一管理
node1
service httpd stop chkconfig httpd off
node2
service httpd stop chkconfig httpd off
配置集群资源
由于我用了两台虚拟机,这里就在node1上配置个别名ip提供资源记录使用
ifconfig eth0:0 192.168.1.100/24 up
vim haresources
node1.shunzi.com 192.168.1.100/24/eth0 httpd
上面定义了两个资源,一个是ip,一个是web
意思为当访问192.168.1.100时优先访问node1节点,当node1节点宕机时,启动node2,node1恢复时,立马取会node2上的服务。
把修改好的配置文件,认证文件,资源记录文件。复制到各个节点
scp -p authkeys haresources ha.cf node2:/etc/ha.d/
node1启动hearbeat服务
[root@node1 ha.d]# service heartbeat restart
node2也启动
ssh node2.shunzi.com 'service heartbeat start'
查看日志
[root@node1 ha.d]# tail -f /var/log/ha-log heartbeat[39001]: 2014/04/25_00:25:38 info: Status update for node node2.shunzi.com: status active harc[39008]: 2014/04/25_00:25:38 info: Running /etc/ha.d/rc.d/status status heartbeat[39001]: 2014/04/25_00:25:38 info: Link 192.168.1.253:192.168.1.253 up. heartbeat[39001]: 2014/04/25_00:25:38 info: Status update for node 192.168.1.253: status ping heartbeat[39001]: 2014/04/25_00:25:39 info: Comm_now_up(): updating status to active heartbeat[39001]: 2014/04/25_00:25:39 info: Local status now set to: 'active' heartbeat[39001]: 2014/04/25_00:25:39 info: remote resource transition completed. heartbeat[39001]: 2014/04/25_00:25:39 info: remote resource transition completed. heartbeat[39001]: 2014/04/25_00:25:39 info: Local Resource acquisition completed. (none) heartbeat[39001]: 2014/04/25_00:25:40 info: node2.shunzi.com wants to go standby [foreign] heartbeat[39001]: 2014/04/25_00:25:51 info: standby: acquire [foreign] resources from node2.shunzi.com heartbeat[39029]: 2014/04/25_00:25:51 info: acquire local HA resources (standby). ResourceManager[39042]: 2014/04/25_00:25:51 info: Acquiring resource group: node1.shunzi.com 192.168.1.100/24/eth0 httpd IPaddr[39068]: 2014/04/25_00:25:51 INFO: Resource is stopped ResourceManager[39042]: 2014/04/25_00:25:51 info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.100/24/eth0 start IPaddr[39165]: 2014/04/25_00:25:51 INFO: Using calculated netmask for 192.168.1.100: 255.255.255.0 IPaddr[39165]: 2014/04/25_00:25:51 INFO: eval ifconfig eth0:0 192.168.1.100 netmask 255.255.255.0 broadcast 192.168.1.255 IPaddr[39136]: 2014/04/25_00:25:51 INFO: Success ResourceManager[39042]: 2014/04/25_00:25:51 info: Running /etc/init.d/httpd start heartbeat[39029]: 2014/04/25_00:25:51 info: local HA resource acquisition completed (standby). heartbeat[39001]: 2014/04/25_00:25:51 info: Standby resource acquisition done [foreign]. heartbeat[39001]: 2014/04/25_00:25:51 info: Initial resource acquisition complete (auto_failback) heartbeat[39001]: 2014/04/25_00:25:52 info: remote resource transition completed. heartbeat[39001]: 2014/04/25_00:25:59 info: node2.shunzi.com wants to go standby [foreign] heartbeat[39001]: 2014/04/25_00:26:10 info: standby: acquire [foreign] resources from node2.shunzi.com heartbeat[39296]: 2014/04/25_00:26:10 info: acquire local HA resources (standby). ResourceManager[39309]: 2014/04/25_00:26:10 info: Acquiring resource group: node1.shunzi.com 192.168.1.100/24/eth0 httpd IPaddr[39335]: 2014/04/25_00:26:10 INFO: Running OK heartbeat[39296]: 2014/04/25_00:26:10 info: local HA resource acquisition completed (standby). heartbeat[39001]: 2014/04/25_00:26:10 info: Standby resource acquisition done [foreign]. heartbeat[39001]: 2014/04/25_00:26:10 info: remote resource transition completed.
访问测试
测试停掉node1,看是否能自动转到node2上
service heartbeat stop
当停掉node1后,查看node2ip会发现已经自动接管过来vip地址了。
访问已经能自动接管;
node1故障恢复后,重新上线,会把node2的资源给夺回来。因为前面已经定于,优先使用node1.
PS:
hearbeat V1版本基于配置文件操作完成。实现了web的高可用性。