手把手教你搭建Mysql8.0.22 MGR高可用集群:

为了应对事务一致性要求很高的系统对高可用数据库系统的要求,并且增强高可用集群的自管理能力,避免节点故障后的failover需要人工干预或其它辅助工具干预,MySQL5.7新引入了Group Replication,用于搭建更高事务一致性的高可用数据库集群系统。MGR是基于Paxos协议的Group Replication搭建的系统,不仅可以自动进行failover,而且同时保证系统中多个节点之间的事务一致性,避免因节点故障或网络问题而导致的节点间事务不一致。此外还提供了节点管理的能力,真正将整个集群做为一个整体对外提供服务。

MGR是基于原生复制及paxos协议的组复制技术,并以插件的方式提供,可以采取多主模式和单主模式。

  在单主模式下,会自动选主,只有一个节点可以对外提供写/读事务的服务,而其它所有节点只能提供只读事务的服务,是官方推荐的Group Replication复制模式。(目前最多支持9个节点) 

手把手教你搭建Mysql8.0.22 MGR高可用集群_java

在多主模式下,每个节点都可以对外提供读写事务的服务。但在多主模式下,多个节点间的事务可能有比较大的冲突,从而影响性能,并且对查询语句也有更多的限制。

手把手教你搭建Mysql8.0.22 MGR高可用集群_java_02


 

搭建规划:


手把手教你搭建Mysql8.0.22 MGR高可用集群_java_03

 

数据库参数配置:

Demo01:

 

cat >/etc/my.cnf <<"EOF"

[mysqld]

datadir=/var/lib/mysql

socket=/var/lib/mysql/mysql.sock

 

log-error=/var/log/mysqld.log

pid-file=/var/run/mysqld/mysqld.pid

 

#

# Disable other storage engines

#

disabled_storage_engines="MyISAM,BLACKHOLE,FEDERATED,ARCHIVE,MEMORY"

 

#

# Replication configuration parameters

#

server_id=1

gtid_mode=ON

enforce_gtid_consistency=ON

binlog_checksum=NONE           # Not needed from 8.0.21

 

log_bin=binlog

log_slave_updates=ON

binlog_format=ROW

master_info_repository=TABLE

relay_log_info_repository=TABLE

transaction_write_set_extraction=XXHASH64

 

#

# Group Replication configuration

#

plugin_load_add='group_replication.so'

group_replication_group_name="34975c79-405c-11eb-9f4c-5254044caef1"       ## 此处必须为uuid的格式

group_replication_start_on_boot=off

group_replication_local_address= "172.30.45.2:33061"

group_replication_group_seeds= "172.30.45.2:33061,172.30.45.3:33061,172.30.45.4:33061"

group_replication_bootstrap_group= off

 

loose-group_replication_recovery_retry_count=31536000

loose-group_replication_single_primary_mode=on

loose-group_replication_enforce_update_everywhere_checks=off

loose-group_replication_ip_whitelist="172.30.45.0/24,127.0.0.1/8"

 

log-error=/tmp/demo01_DB.err

 

EOF

 

 

Demo02:

cat >/etc/my.cnf <<"EOF"

[mysqld]

datadir=/var/lib/mysql

socket=/var/lib/mysql/mysql.sock

 

log-error=/var/log/mysqld.log

pid-file=/var/run/mysqld/mysqld.pid

 

#

# Disable other storage engines

#

disabled_storage_engines="MyISAM,BLACKHOLE,FEDERATED,ARCHIVE,MEMORY"

 

#

# Replication configuration parameters

#

server_id=2

gtid_mode=ON

enforce_gtid_consistency=ON

binlog_checksum=NONE           # Not needed from 8.0.21

 

log_bin=binlog

log_slave_updates=ON

binlog_format=ROW

master_info_repository=TABLE

relay_log_info_repository=TABLE

transaction_write_set_extraction=XXHASH64

 

#

# Group Replication configuration

#

plugin_load_add='group_replication.so'

group_replication_group_name="34975c79-405c-11eb-9f4c-5254044caef1"      ## 此处必须为uuid的格式

group_replication_start_on_boot=off

group_replication_local_address= "172.30.45.3:33061"

group_replication_group_seeds= "172.30.45.2:33061,172.30.45.3:33061,172.30.45.4:33061"

group_replication_bootstrap_group= off

 

loose-group_replication_recovery_retry_count=31536000

loose-group_replication_single_primary_mode=on

loose-group_replication_enforce_update_everywhere_checks=off

loose-group_replication_ip_whitelist="172.30.45.0/24,127.0.0.1/8"

 

log-error=/tmp/demo02_DB.err

 

EOF

Demo03:

cat >/etc/my.cnf <<"EOF"

[mysqld]

 

datadir=/var/lib/mysql

socket=/var/lib/mysql/mysql.sock

 

log-error=/var/log/mysqld.log

pid-file=/var/run/mysqld/mysqld.pid

 

#

# Disable other storage engines

#

disabled_storage_engines="MyISAM,BLACKHOLE,FEDERATED,ARCHIVE,MEMORY"

 

#

# Replication configuration parameters

#

server_id=3

gtid_mode=ON

enforce_gtid_consistency=ON

binlog_checksum=NONE           # Not needed from 8.0.21

 

log_bin=binlog

log_slave_updates=ON

binlog_format=ROW

master_info_repository=TABLE

relay_log_info_repository=TABLE

transaction_write_set_extraction=XXHASH64

 

#

# Group Replication configuration

#

plugin_load_add='group_replication.so'

group_replication_group_name="34975c79-405c-11eb-9f4c-5254044caef1"     ## 此处必须为uuid的格式

group_replication_start_on_boot=off

group_replication_local_address= "172.30.45.4:33061"

group_replication_group_seeds= "172.30.45.2:33061,172.30.45.3:33061,172.30.45.4:33061"

group_replication_bootstrap_group= off

 

loose-group_replication_recovery_retry_count=31536000

loose-group_replication_single_primary_mode=on

loose-group_replication_enforce_update_everywhere_checks=off

loose-group_replication_ip_whitelist="172.30.45.0/24,127.0.0.1/8"

 

 

log-error=/tmp/demo03_DB.err

 

EOF

 

创建复制用户

3个节点都执行一遍

SET SQL_LOG_BIN=0;

 

CREATE USER rpl_user@'%' IDENTIFIED BY 'rpl_user';

## ALTER USER 'rpl_user'@'%' IDENTIFIED WITH mysql_native_password BY 'rpl_user';

GRANT REPLICATION SLAVE ON *.* TO rpl_user@'%';

GRANT BACKUP_ADMIN ON *.* TO rpl_user@'%';

 

CREATE USER rpl_user@'localhost' IDENTIFIED BY 'rpl_user';

##ALTER USER 'rpl_user'@'localhost' IDENTIFIED WITH mysql_native_password BY 'rpl_user';

GRANT REPLICATION SLAVE ON *.* TO rpl_user@'localhost';

GRANT BACKUP_ADMIN ON *.* TO rpl_user@'localhost';

 

CREATE USER rpl_user@'127.0.0.1' IDENTIFIED BY 'rpl_user';

##ALTER USER 'rpl_user'@'127.0.0.1' IDENTIFIED WITH mysql_native_password BY 'rpl_user';

GRANT REPLICATION SLAVE ON *.* TO rpl_user@'127.0.0.1';

GRANT BACKUP_ADMIN ON *.* TO rpl_user@'127.0.0.1';

 

FLUSH PRIVILEGES;

SET SQL_LOG_BIN=1;

 

安装组复制插件

每个节点上都执行一遍

mysql>INSTALL PLUGIN group_replication SONAME 'group_replication.so';

 

查看插件是否安装成功

mysql> show plugins;

+---------------------------------+----------+--------------------+----------------------+---------+

| Name                            | Status   | Type               | Library              | License |

+---------------------------------+----------+--------------------+----------------------+---------+

| binlog                          | ACTIVE   | STORAGE ENGINE     | NULL                 | GPL     |

| mysql_native_password           | ACTIVE   | AUTHENTICATION     | NULL                 | GPL     |

| sha256_password                 | ACTIVE   | AUTHENTICATION     | NULL                 | GPL     |

| caching_sha2_password           | ACTIVE   | AUTHENTICATION     | NULL                 | GPL     |

| sha2_cache_cleaner              | ACTIVE   | AUDIT              | NULL                 | GPL     |

| CSV                             | ACTIVE   | STORAGE ENGINE     | NULL                 | GPL     |

| MEMORY                          | ACTIVE   | STORAGE ENGINE     | NULL                 | GPL     |

| InnoDB                          | ACTIVE   | STORAGE ENGINE     | NULL                 | GPL     |

| INNODB_TRX                      | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_CMP                      | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_CMP_RESET                | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_CMPMEM                   | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_CMPMEM_RESET             | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_CMP_PER_INDEX            | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_CMP_PER_INDEX_RESET      | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_BUFFER_PAGE              | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_BUFFER_PAGE_LRU          | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_BUFFER_POOL_STATS        | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_TEMP_TABLE_INFO          | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_METRICS                  | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_FT_DEFAULT_STOPWORD      | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_FT_DELETED               | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_FT_BEING_DELETED         | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_FT_CONFIG                | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_FT_INDEX_CACHE           | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_FT_INDEX_TABLE           | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_TABLES                   | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_TABLESTATS               | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_INDEXES                  | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_TABLESPACES              | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_COLUMNS                  | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_VIRTUAL                  | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_CACHED_INDEXES           | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| INNODB_SESSION_TEMP_TABLESPACES | ACTIVE   | INFORMATION SCHEMA | NULL                 | GPL     |

| MyISAM                          | ACTIVE   | STORAGE ENGINE     | NULL                 | GPL     |

| MRG_MYISAM                      | ACTIVE   | STORAGE ENGINE     | NULL                 | GPL     |

| PERFORMANCE_SCHEMA              | ACTIVE   | STORAGE ENGINE     | NULL                 | GPL     |

| TempTable                       | ACTIVE   | STORAGE ENGINE     | NULL                 | GPL     |

| ARCHIVE                         | ACTIVE   | STORAGE ENGINE     | NULL                 | GPL     |

| BLACKHOLE                       | ACTIVE   | STORAGE ENGINE     | NULL                 | GPL     |

| FEDERATED                       | DISABLED | STORAGE ENGINE     | NULL                 | GPL     |

| ngram                           | ACTIVE   | FTPARSER           | NULL                 | GPL     |

| mysqlx_cache_cleaner            | ACTIVE   | AUDIT              | NULL                 | GPL     |

| mysqlx                          | ACTIVE   | DAEMON             | NULL                 | GPL     |

| group_replication               | ACTIVE   | GROUP REPLICATION  | group_replication.so | GPL     |

+---------------------------------+----------+--------------------+----------------------+---------+

45 rows in set (0.00 sec)

 

插件安装成功!

 

配置通道的恢复凭据

分布式恢复,节点间执行恢复过程并与其他组成员同步,加入group组.

3个节点上都执行一遍:

CHANGE MASTER TO MASTER_USER='rpl_user', MASTER_PASSWORD='rpl_user'  FOR CHANNEL 'group_replication_recovery';

 

Demo01(主节点)

mysql> SET GLOBAL group_replication_bootstrap_group=ON;

mysql> START GROUP_REPLICATION USER='repl', PASSWORD='repl';   ## 8.0.17以后可以加入用户名,密码方式启动,        也可省略,直接启动

mysql> SET GLOBAL group_replication_bootstrap_group=OFF;

 

加入Secondary节点,在demo02、demo03上执行:

mysql> START GROUP_REPLICATION

 

查看MGR集群组状态:

demo01

demo02

手把手教你搭建Mysql8.0.22 MGR高可用集群_java_04

demo03

手把手教你搭建Mysql8.0.22 MGR高可用集群_java_05


故障排除各类后正常启动:

Master :

手把手教你搭建Mysql8.0.22 MGR高可用集群_java_06


创建一个测试库、表,并插入数据:

手把手教你搭建Mysql8.0.22 MGR高可用集群_java_07


登陆Slave1进行查询:

手把手教你搭建Mysql8.0.22 MGR高可用集群_java_08


查询结果:数据与主库一致

登陆Slave2进行查询:

手把手教你搭建Mysql8.0.22 MGR高可用集群_java_09


查询结果:数据与主库一致

主备自动切换测试

1.关闭主库

手把手教你搭建Mysql8.0.22 MGR高可用集群_java_10

 

2.登陆原Slave1库查询:

手把手教你搭建Mysql8.0.22 MGR高可用集群_java_11


结果:slave1成为新主库的从库

 

3.登陆原Slave2库查询:

手把手教你搭建Mysql8.0.22 MGR高可用集群_java_12


手把手教你搭建Mysql8.0.22 MGR高可用集群_java_13


结果:原slave2自动升级为主库

 

 

4. 原主库修复后拉起,并加入group replication组:

 

手把手教你搭建Mysql8.0.22 MGR高可用集群_java_14


手把手教你搭建Mysql8.0.22 MGR高可用集群_java_15

5.在demo03节点上插入生成新表插入数据:


手把手教你搭建Mysql8.0.22 MGR高可用集群_java_16



6.登陆demo01查询:

手把手教你搭建Mysql8.0.22 MGR高可用集群_java_17



结果:数据与主库一致,MGR故障恢复测试成功。

 

跌坑回顾:

 

故障现象1: 三个Node都可以做为主库拉起,但相互不能加入replication group组,报错如下:

手把手教你搭建Mysql8.0.22 MGR高可用集群_java_18


尝试过以下方法:

a. 初始以为是网络、防火墙、Selinux等问题,经一一排除后,slave仍旧无法加入Master的组。

b. 测试三台server之间的互通性,通过复制用户即mysql -urpl_user -prpl_user -h172.30.45.X -P33061 全部正常,但就是无法加入replication group

c. 排查参数文件my.cnf中 group_replication_group_seeds配置:

     i.检查IP、端口号全、半角和隐藏字符(调整到半角重输一次)

     ii.网上有帖子说属于Mgr的bug需要用主机名替换掉IP地址(故做更换测试)

     III.将33061端口用别的端口替换(甚至3台server分别用3个不同的端口,列如:33062、33063、33064) 

d. 检查/etc/hosts主机解析设置

e. 检查DNS域名解析配置 /etc/resolv.conf

f. 将my.cnf中增加single-primary参数 group_replication_single_primary_mode=on

g. 将my.cnf中增加IP白名单参数group_replication_ip_whitelist="172.30.45.0/24,127.0.0.1/8"

h. group_replication_enforce_update_everywhere_checks=off

i. 检查Mgr组配置里的name设置,将group_replication_group_name设置为uuid格式(此处设置错误,mysql会启动不了,并报出错误提示)。

 

这个问题困扰了我很久,网上各类帖子、论坛翻遍,官方文档反复阅读都没找到排除故障的方法。

结果:发现是理解上的错误,group_name的值不能设置为每个节点的uuid,无论有多少个节点(目前mgr最多支持9个),uuid都必须一致。 

 

2020-12-28T09:43:20.965368Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.22'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MySQL Community Server - GPL.

2020-12-28T09:46:29.317175Z 8 [System] [MY-013587] [Repl] Plugin group_replication reported: 'Plugin 'group_replication' is starting.'

2020-12-28T09:46:29.318237Z 9 [System] [MY-011565] [Repl] Plugin group_replication reported: 'Setting super_read_only=ON.'

2020-12-28T09:46:29.320444Z 8 [Warning] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Automatically adding IPv4 localhost address to the allowlist. It is mandatory that it is added.'

2020-12-28T09:46:29.320488Z 8 [Warning] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Automatically adding IPv6 localhost address to the allowlist. It is mandatory that it is added.'

2020-12-28T09:46:29.354687Z 10 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.

2020-12-28T09:46:30.762234Z 8 [System] [MY-011511] [Repl] Plugin group_replication reported: 'This server is working as secondary member with primary member address demo01:3306.'

2020-12-28T09:46:31.762960Z 0 [System] [MY-013471] [Repl] Plugin group_replication reported: 'Distributed recovery will transfer data using: Incremental recovery from a group donor'

2020-12-28T09:46:31.763576Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to demo02:3306, demo01:3306 on view 16091487317148096:2.

 

     3个节点设置一致后,第二、第三节点很快做为secondary加入replication group组。

mysql> Stop GROUP_REPLICATION ;

Query OK, 0 rows affected (4.07 sec)

 

故障现象2:主库一写数据,secondary库就离线。在日志中查询为如下错误:

手把手教你搭建Mysql8.0.22 MGR高可用集群_java_19


通过 select * from performance_schema.replication_connection_status \G;

手把手教你搭建Mysql8.0.22 MGR高可用集群_java_20


原因:mysql为8.0.22默认使用 caching_sha2_password 身份验证机制——从原来的 mysql_native_password 更改为 caching_sha2_password。 从 5.7 升级 8.0 版本的不会改变现有用户的身份验证方法,但新用户会默认使用新的 caching_sha2_password 

解决方法:

ALTER USER 'rpl_user'@'%' IDENTIFIED WITH mysql_native_password BY 'rpl_user';

ALTER USER 'rpl_user'@'localhost' IDENTIFIED WITH mysql_native_password BY 'rpl_user';

ALTER USER 'rpl_user'@'127.0.0.1' IDENTIFIED WITH mysql_native_password BY 'rpl_user';

手把手教你搭建Mysql8.0.22 MGR高可用集群_java_21


Primary库与secondary库间数据传递正常,问题解决。

 

故障现象3:Primary库与secondary库间数据不一致,复制数据时发生致命错误,备库离开replication 组

手把手教你搭建Mysql8.0.22 MGR高可用集群_java_22

解决方法:

1. 清空从节点的数据

2. 从主库备份数据传至备库进行恢复

3. 在备库执行reset master

4. 重新执行 CHANGE MASTER TO MASTER_USER='rpl_user', MASTER_PASSWORD='rpl_user'  FOR CHANNEL 'group_replication_recovery'

5. START GROUP_REPLICATION 

 

故障现象4:插入数据报错

手把手教你搭建Mysql8.0.22 MGR高可用集群_java_23

手把手教你搭建Mysql8.0.22 MGR高可用集群_java_24

解决方法:

给表添加主键:

mysql> alter table test3 add  primary key(id);

Query OK, 0 rows affected (0.05 sec)

Records: 0  Duplicates: 0  Warnings: 0

 

再次对表执行插入数据操作,成功

mysql> insert into test3 values(1,'aaa');

Query OK, 1 row affected (0.01 sec)

 

故障现象5:主库全备传到备库,备库恢复后,发现未给复制组用户授权,在主、备库上分别执行授权后发现 一旦执行事务,备库就退出group

原因:主备库单独执行了事务,写入了自己的binlog,导致不一致

 

解决方法: 

发现不一致时执行关闭写入binlog命令,授权完毕后再开启。

SET SQL_LOG_BIN=0;

GRANT BACKUP_ADMIN ON *.* TO repl_user@'%';

FLUSH PRIVILEGES;

SET SQL_LOG_BIN=1;

 

若已经不一致了,需在备库reset master再执行加入group的命令。

 

运维路上不平坦,没有趟不过去的坑,只有尚未趟过的坑、和将熬而未熬的漫漫长夜。工作经验分享给大家,希望能帮到大家少加班、少熬夜!