mysql主从忽略从写入错误 mysql 1317主从中断

转载

mob64ca14085c24 2023-10-16 21:34:36

文章标签 mysql主从忽略从写入错误 SSL SQL mysql 文章分类 MySQL 数据库

一、说明

mysql5.7的GTID多线程主从模式的复制，复制延时过大：

【 mysql主从告警】时间:20170725 14:25:02|【192.xx : xx】与【从库：192.xx : xx 】主从延时过大,超过457642 s!

mysql主从忽略从写入错误 mysql 1317主从中断_SSL

异常分析：

1、从库基本不可用

2、slave备份数据不可靠，延迟太大；

3、若半同步复制情况，影响主库的性能

二、排查问题

2.1 查看日志

首先想到的都是先show processlist下，查看状态：

mysql> show slave status \G ************************* 1. row ************************* Slave_IO_State: Waiting for master to send event Master_Host: 192.xx Master_User: myrep Master_Port: xx20 Connect_Retry: 60 Master_Log_File: mysql-bin.000091 Read_Master_Log_Pos: 850335882 Relay_Log_File: relay-bin.000249 Relay_Log_Pos: 414 Relay_Master_Log_File: mysql-bin.000078 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: information_schema,performance_schema,sys Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 1048017045 Relay_Log_Space: 14058364808 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 1484 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 56001 Master_UUID: 2199b808-1dc9-11e7-97a0-1418773c7c83 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Waiting for dependent transaction to commit Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 2199b808-1dc9-11e7-97a0-1418773c7c83:203746-654994 Executed_Gtid_Set: 2199b808-1dc9-11e7-97a0-1418773c7c83:1-203745, 984721db-1dd1-11e7-b548-b8ca3af0976a:1-3 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec)

红色标记的是表示连续多次查看Relay_Log_Pos和Exec_Master_Log_Pos不变；到主库查看binlog

# at 1048017045 #170720 7:17:39 server id 56001 end_log_pos 1048017110 CRC32 0x595de4a4 GTID last_committed=59399 sequence_number=59400 SET @@SESSION.GTID_NEXT= '2199b808-1dc9-11e7-97a0-1418773c7c83:203746'/!/; # at 1048017110 #170720 7:17:39 server id 56001 end_log_pos 1048017187 CRC32 0x2bb724c0 Query thread_id=19064 exec_time=0 error_code=0 SET TIMESTAMP=1500506259/!/; SET @@session.pseudo_thread_id=19064/!/; SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/!/; SET @@session.sql_mode=0/!/; SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/!/; /!\C utf8 //!/; SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/!/; SET @@session.lc_time_names=0/!/; SET @@session.collation_database=DEFAULT/!/; BEGIN /!/; # at 1048017187 #170720 7:17:39 server id 56001 end_log_pos 1048017288 CRC32 0xfc36c6b9 Table_map: `dbname`.`app_xx_area_day` mapped to number 1071 # at 1048017288 #170720 7:17:39 server id 56001 end_log_pos 1048025438 CRC32 0xea7754b2 Delete_rows: table id 1071 # at 1048025438 #170720 7:17:39 server id 56001 end_log_pos 1048033606 CRC32 0xc6ca1995 Delete_rows: table id 1071 # at 1048033606 #170720 7:17:39 server id 56001 end_log_pos 1048041769 CRC32 0xb99f13f4 Delete_rows: table id 1071 # at 1048041769 #170720 7:17:39 server id 56001 end_log_pos 1048049923 CRC32 0x811e0cd7 Delete_rows: table id 1071 # at 1048049923 #170720 7:17:39 server id 56001 end_log_pos 1048058135 CRC32 0xbcbcae3a Delete_rows: table id 1071 # at 1048058135 #170720 7:17:39 server id 56001 end_log_pos 1048066299 CRC32 0xec29b40e Delete_rows: table id 1071 # at 1048066299 #170720 7:17:39 server id 56001 end_log_pos 1048074464 CRC32 0x9b0d3c60 Delete_rows: table id 1071 # at 1048074464 #170720 7:17:39 server id 56001 end_log_pos 1048082635 CRC32 0xf80b65a1 Delete_rows: table id 1071 # at 1048082635 #170720 7:17:39 server id 56001 end_log_pos 1048090830 CRC32 0x4518cd5c Delete_rows: table id 1071 # at 1048090830 #170720 7:17:39 server id 56001 end_log_pos 1048098993 CRC32 0x3330a34b Delete_rows: table id 1071 # at 1048098993 #170720 7:17:39 server id 56001 end_log_pos 1048107195 CRC32 0xbaa07468 Delete_rows: table id 1071 # at 1048107195 ....

binlog看到事务过长，表Table_map: `dbname`.`app_xx_area_day` mapped to number 事务操作过长。

2.2 重新开启slave

由于之前在没验证之前，直接想通过stop slave后再start slave，发现stop slave后一直挂住：

mysql主从忽略从写入错误 mysql 1317主从中断_mysql_02

通过2.1查看事物日志可用知道，在stop slave时候需要做一次事物回滚操作，导致stop slave夯住，无法停止slave操作；

2.3 同步挂住到表

由于我们知道是某个表导致同步异常的，但是不清楚为什么这个表会导致同步异常；我们查看这个表的情况，这里说明下我们同步记录的日志模式是ROW模式；

mysql主从忽略从写入错误 mysql 1317主从中断_SQL_03

该表一直是In_use状态，查看表结构：

mysql主从忽略从写入错误 mysql 1317主从中断_SQL_04

查看线上该表是1个一千个分区的表，而且该表有一个很明显的问题是没有主键，分区字段区分度很低；

我们总结下：

a.show slave status 显示position一直没有变化；

b.show open tables显示某个表一直是in_use为1；

c. show create table 查看表结构可以看到无主键，或者无任何索引，或者索引区分度很差。

三、解决方法

a. 找到表区分度比较高的几个字段, 可以使用这个方法判断：
select count(*) from xx;
select count(*) from (select distinct xx from xxx) t;
如果2个查询count(*)的结果差不多，说明可以对这些字段加索引
b. 备库stop slave;
可能会执行比较久，因为需要回滚事务。
c. 备库
set sql_log_bin=0;
alter table xx add key xx(xx);
老的版本slave应用binlog时只会选择第一个索引，需要把新加的索引放在最前面，可以先把老的索引删掉，建新的索引，再把老的索引建上。可以放到一个sql中执行。
d. 备库start slave
如果是innodb,可以通过show innodb status来查看 rows_inserted,updated,deleted,selected这几个指标来判断。
如果每秒修改的记录数比较多，说明复制正在以比较快的速度执行。

当然最终还是要解决该表分区太大的问题，从根源上去解决。

我是直接xtrabackup重做的，主从延时过大了。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。