手把手教你搭建Mysql8.0.22 MGR高可用集群:
为了应对事务一致性要求很高的系统对高可用数据库系统的要求,并且增强高可用集群的自管理能力,避免节点故障后的failover需要人工干预或其它辅助工具干预,MySQL5.7新引入了Group Replication,用于搭建更高事务一致性的高可用数据库集群系统。MGR是基于Paxos协议的Group Replication搭建的系统,不仅可以自动进行failover,而且同时保证系统中多个节点之间的事务一致性,避免因节点故障或网络问题而导致的节点间事务不一致。此外还提供了节点管理的能力,真正将整个集群做为一个整体对外提供服务。
MGR是基于原生复制及paxos协议的组复制技术,并以插件的方式提供,可以采取多主模式和单主模式。
在单主模式下,会自动选主,只有一个节点可以对外提供写/读事务的服务,而其它所有节点只能提供只读事务的服务,是官方推荐的Group Replication复制模式。(目前最多支持9个节点)
在多主模式下,每个节点都可以对外提供读写事务的服务。但在多主模式下,多个节点间的事务可能有比较大的冲突,从而影响性能,并且对查询语句也有更多的限制。
搭建规划:
数据库参数配置:
Demo01:
cat >/etc/my.cnf <<"EOF"
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
#
# Disable other storage engines
#
disabled_storage_engines="MyISAM,BLACKHOLE,FEDERATED,ARCHIVE,MEMORY"
#
# Replication configuration parameters
#
server_id=1
gtid_mode=ON
enforce_gtid_consistency=ON
binlog_checksum=NONE # Not needed from 8.0.21
log_bin=binlog
log_slave_updates=ON
binlog_format=ROW
master_info_repository=TABLE
relay_log_info_repository=TABLE
transaction_write_set_extraction=XXHASH64
#
# Group Replication configuration
#
plugin_load_add='group_replication.so'
group_replication_group_name="34975c79-405c-11eb-9f4c-5254044caef1" ## 此处必须为uuid的格式
group_replication_start_on_boot=off
group_replication_local_address= "172.30.45.2:33061"
group_replication_group_seeds= "172.30.45.2:33061,172.30.45.3:33061,172.30.45.4:33061"
group_replication_bootstrap_group= off
loose-group_replication_recovery_retry_count=31536000
loose-group_replication_single_primary_mode=on
loose-group_replication_enforce_update_everywhere_checks=off
loose-group_replication_ip_whitelist="172.30.45.0/24,127.0.0.1/8"
log-error=/tmp/demo01_DB.err
EOF
Demo02:
cat >/etc/my.cnf <<"EOF"
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
#
# Disable other storage engines
#
disabled_storage_engines="MyISAM,BLACKHOLE,FEDERATED,ARCHIVE,MEMORY"
#
# Replication configuration parameters
#
server_id=2
gtid_mode=ON
enforce_gtid_consistency=ON
binlog_checksum=NONE # Not needed from 8.0.21
log_bin=binlog
log_slave_updates=ON
binlog_format=ROW
master_info_repository=TABLE
relay_log_info_repository=TABLE
transaction_write_set_extraction=XXHASH64
#
# Group Replication configuration
#
plugin_load_add='group_replication.so'
group_replication_group_name="34975c79-405c-11eb-9f4c-5254044caef1" ## 此处必须为uuid的格式
group_replication_start_on_boot=off
group_replication_local_address= "172.30.45.3:33061"
group_replication_group_seeds= "172.30.45.2:33061,172.30.45.3:33061,172.30.45.4:33061"
group_replication_bootstrap_group= off
loose-group_replication_recovery_retry_count=31536000
loose-group_replication_single_primary_mode=on
loose-group_replication_enforce_update_everywhere_checks=off
loose-group_replication_ip_whitelist="172.30.45.0/24,127.0.0.1/8"
log-error=/tmp/demo02_DB.err
EOF
Demo03:
cat >/etc/my.cnf <<"EOF"
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
#
# Disable other storage engines
#
disabled_storage_engines="MyISAM,BLACKHOLE,FEDERATED,ARCHIVE,MEMORY"
#
# Replication configuration parameters
#
server_id=3
gtid_mode=ON
enforce_gtid_consistency=ON
binlog_checksum=NONE # Not needed from 8.0.21
log_bin=binlog
log_slave_updates=ON
binlog_format=ROW
master_info_repository=TABLE
relay_log_info_repository=TABLE
transaction_write_set_extraction=XXHASH64
#
# Group Replication configuration
#
plugin_load_add='group_replication.so'
group_replication_group_name="34975c79-405c-11eb-9f4c-5254044caef1" ## 此处必须为uuid的格式
group_replication_start_on_boot=off
group_replication_local_address= "172.30.45.4:33061"
group_replication_group_seeds= "172.30.45.2:33061,172.30.45.3:33061,172.30.45.4:33061"
group_replication_bootstrap_group= off
loose-group_replication_recovery_retry_count=31536000
loose-group_replication_single_primary_mode=on
loose-group_replication_enforce_update_everywhere_checks=off
loose-group_replication_ip_whitelist="172.30.45.0/24,127.0.0.1/8"
log-error=/tmp/demo03_DB.err
EOF
创建复制用户
3个节点都执行一遍
SET SQL_LOG_BIN=0;
CREATE USER rpl_user@'%' IDENTIFIED BY 'rpl_user';
## ALTER USER 'rpl_user'@'%' IDENTIFIED WITH mysql_native_password BY 'rpl_user';
GRANT REPLICATION SLAVE ON *.* TO rpl_user@'%';
GRANT BACKUP_ADMIN ON *.* TO rpl_user@'%';
CREATE USER rpl_user@'localhost' IDENTIFIED BY 'rpl_user';
##ALTER USER 'rpl_user'@'localhost' IDENTIFIED WITH mysql_native_password BY 'rpl_user';
GRANT REPLICATION SLAVE ON *.* TO rpl_user@'localhost';
GRANT BACKUP_ADMIN ON *.* TO rpl_user@'localhost';
CREATE USER rpl_user@'127.0.0.1' IDENTIFIED BY 'rpl_user';
##ALTER USER 'rpl_user'@'127.0.0.1' IDENTIFIED WITH mysql_native_password BY 'rpl_user';
GRANT REPLICATION SLAVE ON *.* TO rpl_user@'127.0.0.1';
GRANT BACKUP_ADMIN ON *.* TO rpl_user@'127.0.0.1';
FLUSH PRIVILEGES;
SET SQL_LOG_BIN=1;
安装组复制插件
每个节点上都执行一遍
mysql>INSTALL PLUGIN group_replication SONAME 'group_replication.so';
查看插件是否安装成功
mysql> show plugins;
+---------------------------------+----------+--------------------+----------------------+---------+
| Name | Status | Type | Library | License |
+---------------------------------+----------+--------------------+----------------------+---------+
| binlog | ACTIVE | STORAGE ENGINE | NULL | GPL |
| mysql_native_password | ACTIVE | AUTHENTICATION | NULL | GPL |
| sha256_password | ACTIVE | AUTHENTICATION | NULL | GPL |
| caching_sha2_password | ACTIVE | AUTHENTICATION | NULL | GPL |
| sha2_cache_cleaner | ACTIVE | AUDIT | NULL | GPL |
| CSV | ACTIVE | STORAGE ENGINE | NULL | GPL |
| MEMORY | ACTIVE | STORAGE ENGINE | NULL | GPL |
| InnoDB | ACTIVE | STORAGE ENGINE | NULL | GPL |
| INNODB_TRX | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_CMP | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_CMP_RESET | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_CMPMEM | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_CMPMEM_RESET | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_CMP_PER_INDEX | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_CMP_PER_INDEX_RESET | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_BUFFER_PAGE | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_BUFFER_PAGE_LRU | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_BUFFER_POOL_STATS | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_TEMP_TABLE_INFO | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_METRICS | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_FT_DEFAULT_STOPWORD | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_FT_DELETED | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_FT_BEING_DELETED | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_FT_CONFIG | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_FT_INDEX_CACHE | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_FT_INDEX_TABLE | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_TABLES | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_TABLESTATS | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_INDEXES | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_TABLESPACES | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_COLUMNS | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_VIRTUAL | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_CACHED_INDEXES | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| INNODB_SESSION_TEMP_TABLESPACES | ACTIVE | INFORMATION SCHEMA | NULL | GPL |
| MyISAM | ACTIVE | STORAGE ENGINE | NULL | GPL |
| MRG_MYISAM | ACTIVE | STORAGE ENGINE | NULL | GPL |
| PERFORMANCE_SCHEMA | ACTIVE | STORAGE ENGINE | NULL | GPL |
| TempTable | ACTIVE | STORAGE ENGINE | NULL | GPL |
| ARCHIVE | ACTIVE | STORAGE ENGINE | NULL | GPL |
| BLACKHOLE | ACTIVE | STORAGE ENGINE | NULL | GPL |
| FEDERATED | DISABLED | STORAGE ENGINE | NULL | GPL |
| ngram | ACTIVE | FTPARSER | NULL | GPL |
| mysqlx_cache_cleaner | ACTIVE | AUDIT | NULL | GPL |
| mysqlx | ACTIVE | DAEMON | NULL | GPL |
| group_replication | ACTIVE | GROUP REPLICATION | group_replication.so | GPL |
+---------------------------------+----------+--------------------+----------------------+---------+
45 rows in set (0.00 sec)
插件安装成功!
配置通道的恢复凭据
分布式恢复,节点间执行恢复过程并与其他组成员同步,加入group组.
3个节点上都执行一遍:
CHANGE MASTER TO MASTER_USER='rpl_user', MASTER_PASSWORD='rpl_user' FOR CHANNEL 'group_replication_recovery';
Demo01(主节点)
mysql> SET GLOBAL group_replication_bootstrap_group=ON;
mysql> START GROUP_REPLICATION USER='repl', PASSWORD='repl'; ## 8.0.17以后可以加入用户名,密码方式启动, 也可省略,直接启动
mysql> SET GLOBAL group_replication_bootstrap_group=OFF;
加入Secondary节点,在demo02、demo03上执行:
mysql> START GROUP_REPLICATION
查看MGR集群组状态:
demo01
demo02
demo03
故障排除各类后正常启动:
Master :
创建一个测试库、表,并插入数据:
登陆Slave1进行查询:
查询结果:数据与主库一致
登陆Slave2进行查询:
查询结果:数据与主库一致
主备自动切换测试
1.关闭主库
2.登陆原Slave1库查询:
结果:slave1成为新主库的从库
3.登陆原Slave2库查询:
结果:原slave2自动升级为主库
4. 原主库修复后拉起,并加入group replication组:
5.在demo03节点上插入生成新表插入数据:
6.登陆demo01查询:
结果:数据与主库一致,MGR故障恢复测试成功。
跌坑回顾:
故障现象1: 三个Node都可以做为主库拉起,但相互不能加入replication group组,报错如下:
尝试过以下方法:
a. 初始以为是网络、防火墙、Selinux等问题,经一一排除后,slave仍旧无法加入Master的组。
b. 测试三台server之间的互通性,通过复制用户即mysql -urpl_user -prpl_user -h172.30.45.X -P33061 全部正常,但就是无法加入replication group
c. 排查参数文件my.cnf中 group_replication_group_seeds配置:
i.检查IP、端口号全、半角和隐藏字符(调整到半角重输一次)
ii.网上有帖子说属于Mgr的bug需要用主机名替换掉IP地址(故做更换测试)
III.将33061端口用别的端口替换(甚至3台server分别用3个不同的端口,列如:33062、33063、33064)
d. 检查/etc/hosts主机解析设置
e. 检查DNS域名解析配置 /etc/resolv.conf
f. 将my.cnf中增加single-primary参数 group_replication_single_primary_mode=on
g. 将my.cnf中增加IP白名单参数group_replication_ip_whitelist="172.30.45.0/24,127.0.0.1/8"
h. group_replication_enforce_update_everywhere_checks=off
i. 检查Mgr组配置里的name设置,将group_replication_group_name设置为uuid格式(此处设置错误,mysql会启动不了,并报出错误提示)。
这个问题困扰了我很久,网上各类帖子、论坛翻遍,官方文档反复阅读都没找到排除故障的方法。
结果:发现是理解上的错误,group_name的值不能设置为每个节点的uuid,无论有多少个节点(目前mgr最多支持9个),uuid都必须一致。
2020-12-28T09:43:20.965368Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.0.22' socket: '/var/lib/mysql/mysql.sock' port: 3306 MySQL Community Server - GPL.
2020-12-28T09:46:29.317175Z 8 [System] [MY-013587] [Repl] Plugin group_replication reported: 'Plugin 'group_replication' is starting.'
2020-12-28T09:46:29.318237Z 9 [System] [MY-011565] [Repl] Plugin group_replication reported: 'Setting super_read_only=ON.'
2020-12-28T09:46:29.320444Z 8 [Warning] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Automatically adding IPv4 localhost address to the allowlist. It is mandatory that it is added.'
2020-12-28T09:46:29.320488Z 8 [Warning] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Automatically adding IPv6 localhost address to the allowlist. It is mandatory that it is added.'
2020-12-28T09:46:29.354687Z 10 [System] [MY-010597] [Repl] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_applier' executed'. Previous state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='<NULL>', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2020-12-28T09:46:30.762234Z 8 [System] [MY-011511] [Repl] Plugin group_replication reported: 'This server is working as secondary member with primary member address demo01:3306.'
2020-12-28T09:46:31.762960Z 0 [System] [MY-013471] [Repl] Plugin group_replication reported: 'Distributed recovery will transfer data using: Incremental recovery from a group donor'
2020-12-28T09:46:31.763576Z 0 [System] [MY-011503] [Repl] Plugin group_replication reported: 'Group membership changed to demo02:3306, demo01:3306 on view 16091487317148096:2.
3个节点设置一致后,第二、第三节点很快做为secondary加入replication group组。
mysql> Stop GROUP_REPLICATION ;
Query OK, 0 rows affected (4.07 sec)
故障现象2:主库一写数据,secondary库就离线。在日志中查询为如下错误:
通过 select * from performance_schema.replication_connection_status \G;
原因:mysql为8.0.22默认使用 caching_sha2_password 身份验证机制——从原来的 mysql_native_password 更改为 caching_sha2_password。 从 5.7 升级 8.0 版本的不会改变现有用户的身份验证方法,但新用户会默认使用新的 caching_sha2_password
解决方法:
ALTER USER 'rpl_user'@'%' IDENTIFIED WITH mysql_native_password BY 'rpl_user';
ALTER USER 'rpl_user'@'localhost' IDENTIFIED WITH mysql_native_password BY 'rpl_user';
ALTER USER 'rpl_user'@'127.0.0.1' IDENTIFIED WITH mysql_native_password BY 'rpl_user';
Primary库与secondary库间数据传递正常,问题解决。
故障现象3:Primary库与secondary库间数据不一致,复制数据时发生致命错误,备库离开replication 组
解决方法:
1. 清空从节点的数据
2. 从主库备份数据传至备库进行恢复
3. 在备库执行reset master
4. 重新执行 CHANGE MASTER TO MASTER_USER='rpl_user', MASTER_PASSWORD='rpl_user' FOR CHANNEL 'group_replication_recovery'
5. START GROUP_REPLICATION
故障现象4:插入数据报错
解决方法:
给表添加主键:
mysql> alter table test3 add primary key(id);
Query OK, 0 rows affected (0.05 sec)
Records: 0 Duplicates: 0 Warnings: 0
再次对表执行插入数据操作,成功
mysql> insert into test3 values(1,'aaa');
Query OK, 1 row affected (0.01 sec)
故障现象5:主库全备传到备库,备库恢复后,发现未给复制组用户授权,在主、备库上分别执行授权后发现 一旦执行事务,备库就退出group
原因:主备库单独执行了事务,写入了自己的binlog,导致不一致
解决方法:
发现不一致时执行关闭写入binlog命令,授权完毕后再开启。
SET SQL_LOG_BIN=0;
GRANT BACKUP_ADMIN ON *.* TO repl_user@'%';
FLUSH PRIVILEGES;
SET SQL_LOG_BIN=1;
若已经不一致了,需在备库reset master再执行加入group的命令。
运维路上不平坦,没有趟不过去的坑,只有尚未趟过的坑、和将熬而未熬的漫漫长夜。工作经验分享给大家,希望能帮到大家少加班、少熬夜!