//mysql slave-skip-errors=all 深层理解
# by coco
# 2014-08-04
周末的时候遇到一个问题,百思不得其解,突然感觉是掉坑里了。
问题重现: 在从库上删除一个数据库,早上9点钟删除的,因为配置文件里面是:slave-skip-errors=all
观察了40min中间也有对该库的更新操作,一直没报错(主从没有中断),心里很放心,想着没事,配置的
跳过所有错误,而且接着执行的该库的更新,从库也没有中断,心里挺感激slave-skip-errors=all参数
的,也不用修改从库的配置,重启了,完事都省事了。
疏忽大意的后果就是:故障的危险存在,却感觉不到。结果下午15:01的时候主从终于中断。
报错日志:
2014-07-07 05:34:50 27358 [Note] InnoDB: The InnoDB memory heap is disabled
2014-07-07 05:34:50 27358 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
8794,1 97%
2014-08-02 15:01:00 27358 [ERROR] Slave SQL: Error executing row event: 'Table 'productinfo.abnorpd_info_prodes
c_1' doesn't exist', Error_code: 1146
2014-08-02 15:01:00 27358 [Warning] Slave: Table 'productinfo.abnorpd_info_prodesc_1' doesn't exist Error_code:
1146
2014-08-02 15:01:00 27358 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart t
he slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000420' position 247745876
解析中断这一段时间的二进制日志,看到如下的信息:(报错那一刻操作了什么,一大串看不懂呀。
)
# at 247745876
#140802 15:01:00 server id 161 end_log_pos 247745955 CRC32 0x9856d159 Query thread_id=28531854 exec_time=0 e
rror_code=0
SET TIMESTAMP=1406962860/*!*/;
BEGIN
/*!*/;
# at 247745955
#140802 15:01:00 server id 161 end_log_pos 247746029 CRC32 0xe8b15c40 Table_map: `productinfo`.`abnorpd_info_prodesc_1`
mapped to number 32795
# at 247746029
#140802 15:01:00 server id 161 end_log_pos 247757994 CRC32 0x1460ce91 Delete_rows: table id 32795 flags: STMT_END_F
BINLOG '
rIzcUxOhAAAASgAAAO1NxA4AABuAAAAAAAEAC3Byb2R1Y3RpbmZvABZhYm5vcnBkX2luZm9fcHJv
ZGVzY18xAAID/AEDAEBcseg=
rIzcUyChAAAAvS4AAKp8xA4AABuAAAAAAAEAAgAC//y8BAYAki4AJmx0O3Agc3R5bGU9JnF1b3Q7
bWFyZ2luLXRvcDowcHg7bWFyZ2luLWJvdHRvbTowcHg7cGFkZGluZzowcHg7bGluZS1oZWlnaHQ6
MS41O2NsZWFyOmJvdGg7Y29sb3I6IzMzMzMzMztmb250LWZhbWlseTpUYWhvbWEsIEFyaWFsLCDl
rovkvZMsIHNhbnMtc2VyaWY7Zm9udC1zaXplOjEycHg7d2hpdGUtc3BhY2U6bm9ybWFsO2JhY2tn
cm91bmQtY29sb3I6I0ZGRkZGRjsmcXVvdDsmZ3Q7Jmx0O3NwYW4gc3R5bGU9JnF1b3Q7Zm9udC1z
aXplOjEycHQ7bGluZS1oZWlnaHQ6MS41OyZxdW90OyZndDtIQljlm57msrnmu6Tmsrnlmajmu6To
iq/vvIzljovlipvnrqHot6/ov4fmu6Tlmajmu6Toiq/vvIxIQlguQkjmsLTkuIDkuZnkuozphofm
u6Toiq/vvIzpq5jljovov4fmu6Tlmajmu6Toiq/vvIznlLXljoLmtrLljovnq5nmu6Toiq/vvIzm
u6Toiq/nlJ/kuqfljoLlrrbvvIzmu6Toiq/ku7fmoLzjgIImbHQ7L3NwYW4mZ3Q7Jmx0Oy9wJmd0
OyZsdDtwIHN0eWxlPSZxdW90O21hcmdpbi10b3A6MHB4O21hcmdpbi1ib3R0b206MHB4O3doaXRl
LXNwYWNlOm5vcm1hbDtwYWRkaW5nOjBweDtsaW5lLWhlaWdodDoxLjU7Y2xlYXI6Ym90aDtjb2xv
cjojMzMzMzMzO2ZvbnQtZmFtaWx5OlRhaG9tYSwgQXJpYWwsIOWui+S9kywgc2Fucy1zZXJpZjtm
b250LXNpemU6MTJweDtiYWNrZ3JvdW5kLWNvbG9yOiNGRkZGRkY7JnF1b3Q7Jmd0OyZsdDtzcGFu
IHN0eWxlPSZxdW90O2ZvbnQtc2l6ZToxMnB0O2NvbG9yOiNFNTMzMzM7JnF1b3Q7Jmd0O0hCWOmr
mOWOi+i/h+a7pOWZqOa7pOiKr+mqqOaetuS4uuWGsuWtlOe9kei/h+a7pOadkOaWmeS4uue6uOa7
pOWSjOaIluWMlue6pO+8jOi/h+a7pOeyvuW6pjEtNDDOvO+8jOW9k+a7pOiKr+axoeafk+WgteWh
nuWIsOi/m+WHuuayueWPo+WOi+W3ruS4ujAuMzVNUGHml7bvvIzljbPlj5Hlh7rlvIDlhbPkv6Hl
j7fvvIzmraTml7bmm7TmjaLmu6Toiq/vvIzku6Xovr7liLDkv53miqTns7vnu5/lronlhajnmoTn
show binlog events in 'mysql-bin.000420' from 247745876 limit 10;
+------------------+-----------+-------------+-----------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+------------------+-----------+-------------+-----------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| mysql-bin.000420 | 247745876 | Query | 161 | 247745955 | BEGIN |
| mysql-bin.000420 | 247745955 | Table_map | 161 | 247746029 | table_id: 32795 (productinfo.abnorpd_info_prodesc_1) |
| mysql-bin.000420 | 247746029 | Delete_rows | 161 | 247757994 | table_id: 32795 flags: STMT_END_F |
| mysql-bin.000420 | 247757994 | Xid | 161 | 247758025 | COMMIT /* xid=252181343 */ |
| mysql-bin.000420 | 247758025 | Query | 161 | 247758118 | BEGIN |
| mysql-bin.000420 | 247758118 | Query | 161 | 247758267 | use `productinfo`; delete from productinfo.abnorpd_pidgid where `pid` = '394428' |
| mysql-bin.000420 | 247758267 | Xid | 161 | 247758298 | COMMIT /* xid=252181344 */ |
| mysql-bin.000420 | 247758298 | Query | 161 | 247758391 | BEGIN
| mysql-bin.000420 | 247758391 | Query | 161 | 247758686 | use `productinfo`; update productinfo.pd_info set `repubtime`=1406962860,`endtime`=1407567660 where `pid` IN(26539170,31552303,36949618,37316846,37317735,37319669,37320493,37354757,37355154,37355457,39893784,39895067,40449867) |
| mysql-bin.000420 | 247758686 | Xid | 161 | 247758717 | COMMIT /* xid=252181348 */ |
+------------------+-----------+-------------+-----------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
10 rows in set (0.00 sec)
中断的那一刻执行的是:对productinfo.abnorpd_info_prodesc_1更新操作,为什么不像往常那样
显示更新内容呢?在这里具体解释一下:
Row Based Replication(下称RBR)环境中遇到了一个Bug。这个bug简单的描述就是:
RBR对于DML需要通过table-map的event来标注每一个有更新的表。而当一个DML同时操作多个表,
且其中2个表的mapid相同时(通常为0),会导致slave执行这个event时crash,并重启mysqld实例
可见这个bug的毁灭性极大。
结合本次故障:
table_id: 32795 (productinfo.abnorpd_info_prodesc_1)
table_id: 32795 flags: STMT_END_F
确实是同样的id,所以就会造成主从crash.
同时还要记录一下,主从同步虽然可以slave-skip-errors=all,但是并不能跳过所有DDL所引起的主从,
还是详细参考官网吧:
mysql 官网也详细的给出了,具体跳过错误的3种形式,再次记录一下,下次千万不要挑这个坑了。
--slave-skip-errors=[err_code1,err_code2,...|all|ddl_exist_errors]
Command-Line Format | | ||
Option-File Format | | ||
System Variable Name | | ||
Variable Scope | Global | ||
Dynamic Variable | No | ||
| Permitted Values | ||
Type | | ||
Default | | ||
Valid Values | | ||
| |||
| |||
|
MySQL 5.6 as well as MySQL Cluster NDB 7.3 support an additional shorthand value
ddl_exist_errors, which is equivalent to the error code list 1007,1008,1050,1051,
1054,1060,1061,1068,1094,1146.
Examples:
--slave-skip-errors=1062,1053
--slave-skip-errors=all
--slave-skip-errors=ddl_exist_errors