一,安装MHA基本环境

安装MHA节点

(1)基本环境说明


角色IP地址主机名     

=========================================     

主机192.168.1.121节点1     

从机192.168.1.122节点2     

从机192.168.1.123节点3     

监视主机192.168.1.125节点5 

(2)在node1,node2,node3,node5中操作:

#vi / etc / hosts

192.168.1.121 node1   

192.168.1.122 node2     

192.168.1.123 node3     

192.168.1.125 node5

安装MHA节点节点软件包:

#rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm#yum    

install perl-DBD-MySQL perl-CPAN -y     

#tar xf mha4mysql -node-0.56.tar.gz     

#cd mha4mysql-node-0.56     

#perl Makefile.PL        

#make && make install    

 

安装MHA

在节点5管理节点上操作:注:MHA管理器主机也是需要安装MHA节点,MHA管理器

#yum install perl-DBD-MySQL perl-CPAN perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-time-HiRes -y   

#tar xf mha4mysql-manager-0.56.tar.gz     

#cd mha4mysql-manager -0.56     

#perl Makefile.PL     

#make && make install

#说明:安装的脚本程序都在/ usr / local / bin /目录下。   

 

3.节点间配置SSH登录无密码验证(MHA主机之间使用密钥登录)

在node5(Monitor)中:

#ssh-keygen -t rsa   

#ssh-copy-id -i /root/.ssh/id_rsa.pub root @ node1     

#ssh-copy-id -i /root/.ssh/id_rsa.pub root @ node2 

#ssh-     copy-id -i /root/.ssh/id_rsa.pub root @ node3

在node1(Master)中:

#ssh-keygen -t rsa   

#ssh-copy-id -i /root/.ssh/id_rsa.pub root @ node2     

#ssh-copy-id -i /root/.ssh/id_rsa.pub root @ node3 

#ssh-     copy-id -i /root/.ssh/id_rsa.pub root @ node5

在node2(slave)中:

#ssh-keygen -t rsa   

#ssh-copy-id -i /root/.ssh/id_rsa.pub root @ node1     

#ssh-copy-id -i /root/.ssh/id_rsa.pub root @ node3 

#ssh-     copy-id -i /root/.ssh/id_rsa.pub root @ node5

在node3(slave)中:

#ssh-keygen -t rsa   

#ssh-copy-id -i /root/.ssh/id_rsa.pub root @ node1     

#ssh-copy-id -i /root/.ssh/id_rsa.pub root @ node2 

#ssh-     copy-id -i /root/.ssh/id_rsa.pub root @ node5

二,搭建主从复制环境

主从复制环境配置过程

(1)mysql安装过程略,但是三节点要创建如下链接

node1(主),node2(主备从),node3(从)

注意:创建如下链接:   

ln -s / usr / local / mysql / bin / * / usr / local / bin /

node1 my.cnf

server-id = 1   

binlog-format = ROW     

log-bin = master-bin     

log-bin-index = master-bin.index     

log-slave-updates = true     

relay_log_purge = 0

node2 my.cnf

server-id = 2   

binlog-format = ROW     

log-bin = master-bin     

log-bin-index = master-bin.index     

log-slave-updates = true     

relay_log_purge = 0

node3 my.cnf

binlog-format = ROW   

log-bin = mysql-bin     

relay-log = slave-relay-bin     

relay-log-index = slave-relay-bin.index     

log-slave-updates = true     

server-id = 11     

skip-name- resolve     

relay_log_purge = 0

(2)在node1(Master)上备份一份完整的数据:

#mysqldump -uroot -p123456 --master-data = 2 - 单事务-R - triggers -A> all.sql

其中--master-data = 2代表备份时刻记录主的Binlog位置和位置。

(3)在node1(Master)上创建复制用户:

mysql>授予复制从机*。*到'123456'确定的'repl'@'192.168.1.%';    

刷新权限;

(4)查看主库备份时的binlog名称和位置,MASTER_LOG_FILE和MASTER_LOG_POS:

#head -n 30 all.sql | grep'CHANGE MASTER TO'

- CHANGE MASTER TO MASTER_LOG_FILE ='master-bin.000004',MASTER_LOG_POS = 120;

(5)把备份复制到192.168.1.122和192.168.1.123

#scp all.sql   

192.168.1.122:/root/#scp all.sql 192.168.1.123:/root/

(6)分别在两台服务器上导入备份,执行复制相关命令

在node2,node3主机上操作:

#mysql -uroot -p123456 <all.sql

停止从站

CHANGE MASTER TO     

MASTER_HOST ='192.168.1.121',     

MASTER_USER ='repl',     

MASTER_PASSWORD ='123456',     

MASTER_LOG_FILE ='master-bin.000004',     

MASTER_LOG_POS = 120;

mysql> start slave;   

显示从属状态\ G

创建MHA管理用户,在主上创建。

将*。*的所有权限授予'123456'标识为'root'@'192.168.1.%'的权限;    

刷新权限;

 

三,配置Keepal VIP

vip配置可以采用两种方式,一种通过keepalived的方式管理虚拟ip的浮动;一人是通过脚本方式,本文通过keepalived方式实现

1.在node1(Master)与node2(备选主节点)安装keepalived。

#wget的http://www.keepalived.org/software/keepalived-1.2.12.tar.gz    

#焦油XF的keepalived-1.2.12.tar.gz     

#CD的keepalived-1.2.12     

#的./configure前缀= / usr / local / keepalived     

#make && make install     

#cp /usr/local/keepalived/etc/rc.d/init.d/keepalived /etc/init.d/#cp     

/ usr / local / keepalived / etc / sysconfig / keepalived / etc / sysconfig /     

#mkdir / etc / keepalived     

#cp /usr/local/keepalived/etc/keepalived/keepalived.conf / etc / keepalived /     

#cp / usr / local / keepalived / sbin / keepalived / usr / sbin /

配置keepalived的配置文件,在node1(master)上配置操作如下:

注意:keepalived配置成备份 - 备份,即IP地址切换后,主起来后IP地址不切换,本文监控脚本由MHA提供,keepalived不提供对mysqld的监控。

#vi /etc/keepalived/keepalived.conf     

!保持配置文件

global_defs {   

     notification_email {     

     abc@163.com     

   }     

   notification_email_from dba@dbserver.com     

   smtp_server 127.0.0.1     

   smtp_connect_timeout 30     

   router_id MySQL-HA     

}

vrrp_instance VI_1 {   

    state BACKUP     

    interface eth0     

    virtual_router_id 51     

    priority 150     

    advert_int 1     

    nopreempt

    身份验证{   

    auth_type PASS     

    auth_pass 1111     

    }

    virtual_ipaddress {   

        192.168.1.130     

    }     

}

配置keepalived的配置文件,在node2(备用节点)上配置操作如下:

#vi /etc/keepalived/keepalived.conf     

!保持配置文件

global_defs {   

     notification_email {     

     abc@163.com     

   }     

   notification_email_from dba@dbserver.com     

   smtp_server 127.0.0.1     

   smtp_connect_timeout 30     

   router_id MySQL-HA     

}

vrrp_instance VI_1 {   

    state BACKUP     

    interface eth0     

    virtual_router_id 51     

    priority 120     

    advert_int 1     

    nopreempt

    身份验证{   

    auth_type PASS     

    auth_pass 1111     

    }

    virtual_ipaddress {   

        192.168.1.130     

    }     

}

4. node1,node2启动keepalived服务

#service keepalived start   

#chkconfig keepalived on

5. node1查看VIP启动情况

[root @ 

node1]#ip a   1:lo:<LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN     

    link / loopback 00:00:00:00:00:00 brd 00:00:00:00:00 :00     

    inet 127.0.0.1/8 scope host lo     

    inet6 :: 1/128 scope host     

       valid_lft forever preferred_lft forever     

2:eth0:<BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000     

    link / ether 00:0c :29:4e:53:71 brd ff:ff:ff:ff:ff:ff     

    inet 192.168.1.121/24 brd 192.168.1.255范围全局eth0     

    inet 192.168.1.130/32范围全局eth0     

    inet6 fe80 :: 20c:29ff: fe4e:5371/64范围链接     

       valid_lft永远preferred_lft永远

四,配置MHA

监控创建MHA的工作目录,并且创建相关配置文件(在软件包解压后的目录里面有样例配置文件)。

#mkdir -p / etc / masterha   

#mkdir -p / var / log / masterha / app1     

#cp mha4mysql-manager-0.56 / samples / conf / app1.cnf / etc / masterha /

修改app1.cnf配置文件,修改后的文件内容如下:

#cat /etc/masterha/app1.cnf

[server default]   

manager_workdir = / var / log / masterha / app1     

manager_log = / var / log / masterha / app1 / manager.log     

master_binlog_dir = / usr / local / mysql / data /     

master_ip_failover_script = / usr / local / bin / master_ip_failover     

master_ip_online_change_script = / usr / local / bin / master_ip_online_change     

password = 123456     

user = root     

ping_interval = 1     

remote_workdir = / tmp     

repl_password = 123456     

repl_user = repl     

report_script = / usr / local / bin / send_report     

ssh_user = root  

[server1]   

hostname = 192.168.1.121     

port = 3306

[server2]   

hostname = 192.168.1.122     

port = 3306     

candidate_master = 1     

check_repl_delay = 0

[server3]   

hostname = 192.168.1.123     

port = 3306

说明:     

master_ip_failover_script = / usr / local / bin / master_ip_failover            #MHA自动切换执行的脚本,需要修改     

master_ip_online_change_script = / usr / local / bin / master_ip_online_change #手动

切换  需要执行的脚本,需要修改     report_script = / usr / local / bin / send_report 切换                                时发送邮件进行报告,需要修改

2.设置中继日志的清除方式(在每个从节点上):

(1)在节点2,节点3从节点上操作:

将relay_log_purge = 0加入my.cnf配置文件,前面已经配置。

(2)设定定期清理继电器脚本(node2,node3上操作):

#猫purge_relay_log.sh     

#!/斌/ bash的     

用户=根     

的passwd = 123456     

端口= 3306     

LOG_DIR = '/数据/ masterha /日志'     

WORK_DIR = '/数据'     

清除= '在/ usr / local / bin目录/ purge_relay_logs'

如果[!-d $ log_dir]   

then     

   mkdir $ log_dir -p     

fi

$ purge --user = $ user --password = $ passwd --disable_relay_log_purge --port = $ port --workdir = $ work_dir >> $ log_dir / purge_relay_logs.log 2>&1

配置定时计划任务

#crontab -e   0 4 * * * / bin / bash /root/purge_relay_log.sh

 

要求把keepalived服务引入MHA,我们只需要修改切换是触发的脚本文件master_ip_failover即可,在该脚本中添加在master发生宕机时对keepalived的处理。

(1)编辑脚本/ usr / local / bin / master_ip_failover,修改后如下:

#vi / usr / local / bin / master_ip_failover     

#!/ usr / bin / env perl     

use strict;    

使用警告FATAL =>'all';

使用Getopt :: Long;

我的(   

    $命令,$ ssh_user,$ orig_master_host,$ orig_master_ip,     

    $ orig_master_port,$ new_master_host,$ new_master_ip,$ new_master_port     

);

我的$ vip ='192.168.1.130';   

我的$ ssh_start_vip =“/etc/init.d/keepalived start”;    

我的$ ssh_stop_vip =“/etc/init.d/keepalived stop”;

GetOptions(   

    'command = s'=> \ $ command,     

    'ssh_user = s'=> \ $ ssh_user,     

    'orig_master_host = s'=> \ $ orig_master_host,     

    'orig_master_ip = s'=> \ $ orig_master_ip,     

    'orig_master_port = '=> \ $ orig_master_port,     

    'new_master_host = s'=> \ $ new_master_host,     

    'new_master_ip = s'=> \ $ new_master_ip,     

    'new_master_port = i'=> \ $ new_master_port,     

);

exit&main();   

sub main {     

    print“\ n \ nIN SCRIPT TEST ==== $ ssh_stop_vip == $ ssh_start_vip === \ n \ n”;    

    if($ command eq“stop”|| $ command eq“stopssh”){     

        my $ exit_code = 1;    

        eval {     

            print“禁用旧主机上的VIP:$ orig_master_host \ n”;    

            &stop_vip();    

            $ exit_code = 0;    

        };    

        if($ @){     

            warn“Got Error:$ @ \ n”;    

            退出$ exit_code;    

        }     

        exit $ exit_code;    

    }     

    elsif($ command eq“start”){

        我的$ exit_code = 10;   

        eval {     

            print“启用VIP - $ vip on the new master - $ new_master_host \ n”;    

            &start_vip();    

            $ exit_code = 0;    

        };    

        if($ @){     

            warn $ @;    

            退出$ exit_code;    

        }     

        exit $ exit_code;    

    } { 

    el } { 

        “ } ”{     “ }     ”\“”    

        退出0;    

    }     

    else {     

        &usage();    

        出口1;    

    }     

}     

sub start_vip(){     

    `ssh $ ssh_user \ @ $ new_master_host \“$ ssh_start_vip \”`;    

}     

#一个简单的系统调用,禁用在old_master 

子上的VIP     stop_vip(){     

    `ssh $ ssh_user \ @ $ orig_master_host \“$ ssh_stop_vip \”`;    

}}

sub usage {   

    print     

    “用法:master_ip_failover --command = start | stop | stopssh | status --orig_master_host = host --orig_master_ip = ip --orig_master_port = port --new_master_host = host --new_master_ip = ip --new_master_port = port \ n“;    

}

(2)编辑脚本master_ip_online_change,修改后如下:

#!/ usr / bin / env perl

#版权所有(C)2011 DeNA有限公司   

##     

这个程序是免费的软件; 您可以

根据

#自由软件基金会发布的GNU通用公共许可证的条款重新分配和/或修改     #    许可证的版本2或     

#(根据您的选择)任何更高版本。    

##     

这个程序是分发的,希望它是有用的,     

但没有任何的保证; 甚至没有

#适销性或适用于特定用途的默示保证     。有关

详细信息,请参阅     #GNU通用公共许可证。    

##     

您应该已经收到了GNU通用公共许可证

#的副本     以及该程序; 如果不,    

##注意:这是一个示例脚本,不完整。根据您的环境修改脚本。

使用严格   

使用警告FATAL =>'all';

使用Getopt :: Long;   

使用MHA :: DBHelper;    

使用MHA :: NodeUtil;    

使用Time :: HiRes qw(sleep gettimeofday tv_interval);    

使用Data :: Dumper;

我的$ _tstart;   

我的$ _running_interval = 0.1;    

我的(     

  $命令,$ orig_master_is_new_slave,$ orig_master_host,     

  $ orig_master_ip,$ orig_master_port,$ orig_master_user,     

  $ orig_master_password,$ orig_master_ssh_user,$ new_master_host,     

  $ new_master_ip,$ new_master_port,$ new_master_user,     

  $ new_master_password,$ new_master_ssh_user     

)    

我的$ vip ='192.168.1.130/24';    

我的$ key ='1';    

我的$ ssh_start_vip =“/ sbin / ifconfig eth0:$ key $ vip”;    

我的$ ssh_stop_vip =“/ sbin / ifconfig eth0:$ key down”;    

我的$ orig_master_ssh_port = 22;    

我的$ new_master_ssh_port = 22;    

exit&main();

sub current_time_us {   

  my($ sec,$ microsec)= gettimeofday();    

  我的$ curdate = localtime($ sec);    

  返回$ curdate。“”。sprintf(“%06d”,$ microsec);    

}

sub sleep_until {   

  my $ elapsed = tv_interval($ _ tstart);    

  if($ _running_interval> $ elapsed){     

    sleep($ _running_interval - $ elapsed);    

  }     

}

sub get_threads_util {   

  my $ dbh = shift;    

  我的$ my_connection_id = shift;    

  我的$ running_time_threshold = shift;    

  我的$ type = shift;    

  $ running_time_threshold = 0,除非($ running_time_threshold);    

  $ type = 0,除非($ type);    

  我的@threads;

  我的$ sth = $ dbh-> prepare(“SHOW PROCESSLIST”);   

  $ sth-> execute();

  while(my $ ref = $ sth-> fetchrow_hashref()){   

    my $ id = $ ref - > {Id};    

    我的$ user = $ ref - > {User};    

    我的$ host = $ ref - > {Host};    

    我的$ command = $ ref - > {Command};    

    我的$ state = $ ref - > {State};    

    我的$ query_time = $ ref - > {Time};    

    我的$ info = $ ref - > {Info};    

    $ info =s / ^ \ s *(。*?)\ s * $ / $ 1 / if defined($ info);    

    next if($ my_connection_id == $ id);    

    next if(defined($ query_time)&& $ query_time <$ running_time_threshold);    

    next if(defined($ command)&& $ command eq“Binlog Dump”);    

    next if(defined($ user)&& $ user eq“system user”);    

    if($ type> = 1){   

      next if(defined($ command)&& $ command eq“Sleep”);    

      next if(defined($ command)&& $ command eq“Connect”);    

    }

    if($ type> = 2){   

      next if(defined($ info)&& $ info =m / ^ select / i);    

      next if(defined($ info)&& $ info =m / ^ show / i);    

    }

    推送@threads,$ ref;   

  }     

  return @threads;    

}

sub main {   

  if($ command eq“stop”){     

    ##正当地杀死当前主机上的连接     

    #1.在新主机

    #2 上设置read_only = 1。DROP     USER使得没有应用用户可以建立新的连接     

    #3。在当前主机

    #4 上设置read_only = 1。杀死当前查询     

    #*任何数据库访问失败都会导致脚本死机     。    

    我的$ exit_code = 1;    

    eval {     

      ##在新主机上设置read_only = 1(以避免意外)     

      我的$ new_master_handler = new MHA :: DBHelper();

      #args:hostname,port,user,password,raise_error(die_on_error)_or_not   

      $ new_master_handler-> connect($ new_master_ip,$ new_master_port,     

        $ new_master_user,$ new_master_password,1);    

      打印current_time_us()。“设置read_only在新的主人..”;    

      $ new_master_handler-> enable_read_only();    

      if($ new_master_handler-> is_read_only()){     

        print“ok。\ n”;    

      }     

      else {     

        die“Failed!\ n”;    

      }     

      $ new_master_handler-> disconnect();

      #连接到原始主机,如果发生任何数据库错误,则会死亡   

      $ orig_master_handler = new MHA :: DBHelper();    

      $ orig_master_handler-> connect($ orig_master_ip,$ orig_master_port,     

        $ orig_master_user,$ orig_master_password,1);

      ##删除应用程序用户,以便没有人可以连接。事先禁用每会话binlog   

      $ orig_master_handler-> disable_log_bin_local();    

      打印current_time_us()。“在原始主机上吸引应用用户.. \ n”;    

      #FIXME_xxx_drop_app_user($ orig_master_handler);

      等待N * 100毫秒,以便当前的连接可以退出   

      我的$ time_until_read_only = 15;    

      $ _tstart = [gettimeofday];    

      我的@threads = get_threads_util($ orig_master_handler - > {dbh},     

        $ orig_master_handler - > {connection_id});    

      while($ time_until_read_only> 0 && $#threads> = 0){     

        if($ time_until_read_only%5 == 0){     

          printf     

“%s等待所有正在运行的%d线程断开连接..(最大%d毫秒)\ n”     

            current_time_us(),$#threads + 1,$ time_until_read_only * 100;    

          if($#threads <5){     

            print Data :: Dumper-> new([$ _]) - > Indent(0) - > Terse(1) - > Dump。    

              “\ n” foreach(@threads);    

          }     

        }     

        sleep_until();    

        $ _tstart = [gettimeofday];    

        $ time_until_read_only--;    

        @threads = get_threads_util($ orig_master_handler - > {dbh},     

          $ orig_master_handler - > {connection_id});    

      }}

      ##在当前主设备上设置read_only = 1,以便没有人(SUPER除外)可以写入   

      print_time_us()。“在原始主机上设置read_only = 1”。    

      $ orig_master_handler-> enable_read_only();    

      if($ orig_master_handler-> is_read_only()){     

        print“ok。\ n”;    

      }     

      else {     

        die“Failed!\ n”;    

      }}

      等待M * 100毫秒,以便当前的更新查询可以完成   

      我的$ time_until_kill_threads = 5;    

      @threads = get_threads_util($ orig_master_handler - > {dbh},     

        $ orig_master_handler - > {connection_id});    

      while($ time_until_kill_threads> 0 && $#threads> = 0){     

        if($ time_until_kill_threads%5 == 0){     

          printf     

“%s等待所有运行的%d查询断开连接..(最大%d毫秒)\ n”     

            current_time_us(),$#threads + 1,$ time_until_kill_threads * 100;    

          if($#threads <5){     

            print Data :: Dumper-> new([$ _]) - > Indent(0) - > Terse(1) - > Dump。“\ n”    

              foreach(@threads);    

          }     

        }     

        sleep_until();    

        $ _tstart = [gettimeofday];    

        $ time_until_kill_threads--;    

        @threads = get_threads_util($ orig_master_handler - > {dbh},     

          $ orig_master_handler - > {connection_id});    

      }}

      ## Terminating all threads   

      print current_time_us() . " Killing all application threads..\n";    

      $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );    

      print current_time_us() . " done.\n";    

      $orig_master_handler->enable_log_bin_local();    

      $orig_master_handler->disconnect();

      ## After finishing the script, MHA executes FLUSH TABLES WITH READ LOCK   

      eval {    

      `ssh -p$orig_master_ssh_port $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;    

        };    

        if ($@) {    

            warn $@;    

        }    

      $exit_code = 0;    

    };    

    if ($@) {    

      warn "Got Error: $@\n";    

      exit $exit_code;    

    }    

    exit $exit_code;    

  }    

  elsif ( $command eq "start" ) {    

    ## Activating master ip on the new master    

    # 1. Create app user with write privileges    

    # 2. Moving backup script if needed    

    # 3. Register new master's ip to the catalog database

# We don't return error even though activating updatable accounts/ip failed so that we don't interrupt slaves' recovery.   

# If exit code is 0 or 10, MHA does not abort    

    my $exit_code = 10;    

    eval {    

      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error_or_not   

      $new_master_handler->connect( $new_master_ip, $new_master_port,    

        $new_master_user, $new_master_password, 1 );

      ## Set read_only=0 on the new master   

      $new_master_handler->disable_log_bin_local();    

      print current_time_us() . " Set read_only=0 on the new master.\n";    

      $new_master_handler->disable_read_only();

      ## Creating an app user on the new master   

      print current_time_us() . " Creating app user on the new master..\n";    

      #FIXME_xxx_create_app_user($new_master_handler);    

      $new_master_handler->enable_log_bin_local();    

      $new_master_handler->disconnect();

      ## Update master ip on the catalog database, etc   

      `ssh -p$new_master_ssh_port $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;    

      $exit_code = 0;    

    };    

    if ($@) {    

      warn "Got Error: $@\n";    

      exit $exit_code;    

    }    

    exit $exit_code;    

  }    

  elsif ( $command eq "status" ) {

    # do nothing   

    exit 0;    

  }    

  else {    

    &usage();    

    exit 1;    

  }    

}

sub usage {   

  print    

"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";    

  die;    

}

(3) 编辑脚本send_report,修改后如下:

#!/usr/bin/perl

use strict;   

use warnings FATAL => 'all';    

use Mail::Sender;    

use Getopt::Long;

#new_master_host and new_slave_hosts are set only when recovering master succeeded   

my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );    

my $smtp='smtp.163.com';    

my $mail_from='xxxx';    

my $mail_user='xxxxx';    

my $mail_pass='xxxxx';    

my $mail_to=['xxxx','xxxx'];    

GetOptions(    

  'orig_master_host=s' => \$dead_master_host,    

  'new_master_host=s'  => \$new_master_host,    

  'new_slave_hosts=s'  => \$new_slave_hosts,    

  'subject=s'          => \$subject,    

  'body=s'             => \$body,    

);

mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);

sub mailToContacts {   

    my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_;    

    open my $DEBUG, "> /tmp/monitormail.log"    

        or die "Can't open the debug      file:$!\n";    

    my $sender = new Mail::Sender {    

        ctype       => 'text/plain; charset=utf-8',    

        encoding    => 'utf-8',    

        smtp        => $smtp,    

        from        => $mail_from,    

        auth        => 'LOGIN',    

        TLS_allowed => '0',    

        authid      => $user,    

        authpwd     => $passwd,    

        to          => $mail_to,    

        subject     => $subject,    

        debug       => $DEBUG    

    };

    $sender->MailMsg(   

        {   msg   => $msg,    

            debug => $DEBUG    

        }    

    ) or print $Mail::Sender::Error;    

    return 1;    

}

# Do whatever you want here

exit 0;

 

五、MHA的日常管理

1. 检查SSH配置(node5 Monitor 监控节点上操作),如下:

# masterha_check_ssh --conf=/etc/masterha/app1.cnf

Sun May  1 22:05:12 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.   

Sun May  1 22:05:12 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..    

Sun May  1 22:05:12 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..    

Sun May  1 22:05:12 2016 - [info] Starting SSH connection tests..    

Sun May  1 22:05:14 2016 - [debug]     

Sun May  1 22:05:12 2016 - [debug]  Connecting via SSH from root@192.168.1.121(192.168.1.121:22) to root@192.168.1.122(192.168.1.122:22)..    

Sun May  1 22:05:13 2016 - [debug]   ok.    

Sun May  1 22:05:13 2016 - [debug]  Connecting via SSH from root@192.168.1.121(192.168.1.121:22) to root@192.168.1.123(192.168.1.123:22)..    

Sun May  1 22:05:13 2016 - [debug]   ok.    

Sun May  1 22:05:14 2016 - [debug]     

Sun May  1 22:05:13 2016 - [debug]  Connecting via SSH from root@192.168.1.122(192.168.1.122:22) to root@192.168.1.121(192.168.1.121:22)..    

Sun May  1 22:05:13 2016 - [debug]   ok.    

Sun May  1 22:05:13 2016 - [debug]  Connecting via SSH from root@192.168.1.122(192.168.1.122:22) to root@192.168.1.123(192.168.1.123:22)..    

Sun May  1 22:05:14 2016 - [debug]   ok.    

Sun May  1 22:05:14 2016 - [debug]     

Sun May  1 22:05:13 2016 - [debug]  Connecting via SSH from root@192.168.1.123(192.168.1.123:22) to root@192.168.1.121(192.168.1.121:22)..    

Sun May  1 22:05:14 2016 - [debug]   ok.    

Sun May  1 22:05:14 2016 - [debug]  Connecting via SSH from root@192.168.1.123(192.168.1.123:22) to root@192.168.1.122(192.168.1.122:22)..    

Sun May  1 22:05:14 2016 - [debug]   ok.    

Sun May  1 22:05:14 2016 - [info] All SSH connection tests passed successfully.

2. 检查整个复制环境状况(node5 监控节点上操作),如下:

# masterha_check_repl --conf=/etc/masterha/app1.cnf

Sun May  1 22:46:44 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.   

Sun May  1 22:46:44 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..    

Sun May  1 22:46:44 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..    

Sun May  1 22:46:44 2016 - [info] MHA::MasterMonitor version 0.56.    

Sun May  1 22:46:45 2016 - [info] GTID failover mode = 0    

Sun May  1 22:46:45 2016 - [info] Dead Servers:    

Sun May  1 22:46:45 2016 - [info] Alive Servers:    

Sun May  1 22:46:45 2016 - [info]   192.168.1.121(192.168.1.121:3306)    

Sun May  1 22:46:45 2016 - [info]   192.168.1.122(192.168.1.122:3306)    

Sun May  1 22:46:45 2016 - [info]   192.168.1.123(192.168.1.123:3306)    

Sun May  1 22:46:45 2016 - [info] Alive Slaves:    

Sun May  1 22:46:45 2016 - [info]   192.168.1.122(192.168.1.122:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabled    

Sun May  1 22:46:45 2016 - [info]     Replicating from 192.168.1.121(192.168.1.121:3306)    

Sun May  1 22:46:45 2016 - [info]     Primary candidate for the new Master (candidate_master is set)    

Sun May  1 22:46:45 2016 - [info]   192.168.1.123(192.168.1.123:3306)  Version=5.6.29-log (oldest major version between slaves) log-bin:enabled    

Sun May  1 22:46:45 2016 - [info]     Replicating from 192.168.1.121(192.168.1.121:3306)    

Sun May  1 22:46:45 2016 - [info] Current Alive Master: 192.168.1.121(192.168.1.121:3306)    

Sun May  1 22:46:45 2016 - [info] Checking slave configurations..    

Sun May  1 22:46:45 2016 - [info]  read_only=1 is not set on slave 192.168.1.122(192.168.1.122:3306).    

Sun May  1 22:46:45 2016 - [warning]  relay_log_purge=0 is not set on slave 192.168.1.122(192.168.1.122:3306).    

Sun May  1 22:46:45 2016 - [info]  read_only=1 is not set on slave 192.168.1.123(192.168.1.123:3306).    

Sun May  1 22:46:45 2016 - [warning]  relay_log_purge=0 is not set on slave 192.168.1.123(192.168.1.123:3306).    

Sun May  1 22:46:45 2016 - [info] Checking replication filtering settings..    

Sun May  1 22:46:45 2016 - [info]  binlog_do_db= , binlog_ignore_db=     

Sun May  1 22:46:45 2016 - [info]  Replication filtering check ok.    

Sun May  1 22:46:45 2016 - [info] GTID (with auto-pos) is not supported    

Sun May  1 22:46:45 2016 - [info] Starting SSH connection tests..    

Sun May  1 22:46:46 2016 - [info] All SSH connection tests passed successfully.    

Sun May  1 22:46:46 2016 - [info] Checking MHA Node version..    

Sun May  1 22:46:47 2016 - [info]  Version check ok.    

Sun May  1 22:46:47 2016 - [info] Checking SSH publickey authentication settings on the current master..    

Sun May  1 22:46:47 2016 - [info] HealthCheck: SSH to 192.168.1.121 is reachable.    

Sun May  1 22:46:47 2016 - [info] Master MHA Node version is 0.56.    

Sun May  1 22:46:47 2016 - [info] Checking recovery script configurations on 192.168.1.121(192.168.1.121:3306)..    

Sun May  1 22:46:47 2016 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/usr/local/mysql/data/ --output_file=/tmp/save_binary_logs_test --manager_version=0.56 --start_file=master-bin.000008     

Sun May  1 22:46:47 2016 - [info]   Connecting to root@192.168.1.121(192.168.1.121:22)..     

  Creating /tmp if not exists..    ok.    

  Checking output directory is accessible or not..    

   ok.    

  Binlog found at /usr/local/mysql/data/, up to master-bin.000008    

Sun May  1 22:46:48 2016 - [info] Binlog setting check done.    

Sun May  1 22:46:48 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..    

Sun May  1 22:46:48 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=192.168.1.122 --slave_ip=192.168.1.122 --slave_port=3306 --workdir=/tmp --target_version=5.6.29-log --manager_version=0.56 --relay_log_info=/usr/local/mysql/data/relay-log.info  --relay_dir=/usr/local/mysql/data/  --slave_pass=xxx    

Sun May  1 22:46:48 2016 - [info]   Connecting to root@192.168.1.122(192.168.1.122:22)..     

  Checking slave recovery environment settings..    

    Opening /usr/local/mysql/data/relay-log.info ... ok.    

    Relay log found at /usr/local/mysql/data, up to node2-relay-bin.000002    

    Temporary relay log file is /usr/local/mysql/data/node2-relay-bin.000002    

    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.    

done.    

    Testing mysqlbinlog output.. done.    

    Cleaning up test file(s).. done.    

Sun May  1 22:46:48 2016 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=192.168.1.123 --slave_ip=192.168.1.123 --slave_port=3306 --workdir=/tmp --target_version=5.6.29-log --manager_version=0.56 --relay_log_info=/usr/local/mysql/data/relay-log.info  --relay_dir=/usr/local/mysql/data/  --slave_pass=xxx    

Sun May  1 22:46:48 2016 - [info]   Connecting to root@192.168.1.123(192.168.1.123:22)..     

  Checking slave recovery environment settings..    

    Opening /usr/local/mysql/data/relay-log.info ... ok.    

    Relay log found at /usr/local/mysql/data, up to slave-relay-bin.000012    

    Temporary relay log file is /usr/local/mysql/data/slave-relay-bin.000012    

    Testing mysql connection and privileges..Warning: Using a password on the command line interface can be insecure.    

done.    

    Testing mysqlbinlog output.. done.    

    Cleaning up test file(s).. done.    

Sun May  1 22:46:48 2016 - [info] Slaves settings check done.    

Sun May  1 22:46:48 2016 - [info]     

192.168.1.121(192.168.1.121:3306) (current master)    

+--192.168.1.122(192.168.1.122:3306)    

+--192.168.1.123(192.168.1.123:3306)

Sun May  1 22:46:48 2016 - [info] Checking replication health on 192.168.1.122..   

Sun May  1 22:46:48 2016 - [info]  ok.    

Sun May  1 22:46:48 2016 - [info] Checking replication health on 192.168.1.123..    

Sun May  1 22:46:48 2016 - [info]  ok.    

Sun May  1 22:46:48 2016 - [info] Checking master_ip_failover_script status:    

Sun May  1 22:46:48 2016 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.121 --orig_master_ip=192.168.1.121 --orig_master_port=3306

IN SCRIPT TEST====/etc/init.d/keepalived stop==/etc/init.d/keepalived start===

Checking the Status of the script.. OK    

Sun May  1 22:46:48 2016 - [info]  OK.    

Sun May  1 22:46:48 2016 - [warning] shutdown_script is not defined.    

Sun May  1 22:46:48 2016 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.   

[root@node5 masterha]#

3. 开启MHA Manager监控(node5操作)如下:

# mkdir -p  /var/log/masterha/app1/   

# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1 & 

参数说明:   

--remove_dead_master_conf     #该参数代表当发生主从切换后,老的主库的ip将会从配置文件中移除。    

--manger_log                  #日志存放位置    

--ignore_last_failover        #在缺省情况下,如果MHA检测到连续发生宕机,会生成app1.failover.complete文件,会造成MHA管理进程无法启动。

4. 查看MHA Manager监控是否正常:

# masterha_check_status --conf=/etc/masterha/app1.cnf   

app1 (pid:2480) is running(0:PING_OK), master:192.168.1.121

5. 查看启动日志(node5操作)如下:

# tail -n20 /var/log/masterha/app1/manager.log

6. 关闭MHA Manage监控:

(1) 关闭

# masterha_stop --conf=/etc/masterha/app1.cnf

(2) 启动

# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/masterha/app1/manager.log 2>&1

六、MHA Failover切换

1. 自动Failover切换

(1) 模拟master mysql关闭   

(2) VIP将会切换到node2    

(3) /etc/masterha/app1.cnf中将原主服务器配置文件清掉。    

(4) masterha_manager监控进程会自动退出关闭,并在/var/log/masterha/app1下生成app1.failover.complete文件,manager.log会记录全过程,从服务器会自动从新的主服务器复制。    

(5) 原主服务器mysqld启动的,需要清掉/var/log/masterha/app1下生成app1.failover.complete文件,添加node1配置文件到/etc/masterha/app1.cnf,通过manager.log中的记录的故障点,重新同步主服务器,成为从节点。

2. 手动Failover切换

(1) 先停MHA Manager进程。   

masterha_stop --conf=/etc/masterha/app1.cnf    

(2) 停掉master mysqld    

(3) 手动切换,在Manager主机上操作如下:    

# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=192.168.1.122 --dead_master_port=3306 --new_master_host=192.168.1.121 --new_master_port=3306 --ignore_last_failover

通过观察日志可以观察切换全过程。   

(4) 如上节方式恢复节点为从服务器。

3.  正常运行情况下切换(Master正在运行)

等补充。

4. 小结

通过对MMM,MHA的环境搭建测试,MHA由于采用复制架构,原理简单,在一些对数据要求比较高的环境,为了保证可靠性,最好与半同步结合使用。