//mysql slave-skip-errors=all 深层理解



# by coco

# 2014-08-04



周末的时候遇到一个问题,百思不得其解,突然感觉是掉坑里了。

问题重现: 在从库上删除一个数据库,早上9点钟删除的,因为配置文件里面是:slave-skip-errors=all

观察了40min中间也有对该库的更新操作,一直没报错(主从没有中断),心里很放心,想着没事,配置的

跳过所有错误,而且接着执行的该库的更新,从库也没有中断,心里挺感激slave-skip-errors=all参数

的,也不用修改从库的配置,重启了,完事都省事了。



疏忽大意的后果就是:故障的危险存在,却感觉不到。结果下午15:01的时候主从终于中断。



报错日志:

2014-07-07 05:34:50 27358 [Note] InnoDB: The InnoDB memory heap is disabled

2014-07-07 05:34:50 27358 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins                          

                                                                                             8794,1        97%

2014-08-02 15:01:00 27358 [ERROR] Slave SQL: Error executing row event: 'Table 'productinfo.abnorpd_info_prodes

c_1' doesn't exist', Error_code: 1146                                                                          

2014-08-02 15:01:00 27358 [Warning] Slave: Table 'productinfo.abnorpd_info_prodesc_1' doesn't exist Error_code:

 1146

2014-08-02 15:01:00 27358 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart t

he slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000420' position 247745876



解析中断这一段时间的二进制日志,看到如下的信息:(报错那一刻操作了什么,一大串看不懂呀。

)
 # at 247745876
 #140802 15:01:00 server id 161  end_log_pos 247745955 CRC32 0x9856d159  Query   thread_id=28531854      exec_time=0     e
 rror_code=0
 SET TIMESTAMP=1406962860/*!*/;
 BEGIN
 /*!*/;
# at 247745955
 #140802 15:01:00 server id 161  end_log_pos 247746029 CRC32 0xe8b15c40  Table_map: `productinfo`.`abnorpd_info_prodesc_1`
  mapped to number 32795
 # at 247746029
 #140802 15:01:00 server id 161  end_log_pos 247757994 CRC32 0x1460ce91  Delete_rows: table id 32795 flags: STMT_END_F


 BINLOG '
 rIzcUxOhAAAASgAAAO1NxA4AABuAAAAAAAEAC3Byb2R1Y3RpbmZvABZhYm5vcnBkX2luZm9fcHJv
 ZGVzY18xAAID/AEDAEBcseg=
 rIzcUyChAAAAvS4AAKp8xA4AABuAAAAAAAEAAgAC//y8BAYAki4AJmx0O3Agc3R5bGU9JnF1b3Q7
 bWFyZ2luLXRvcDowcHg7bWFyZ2luLWJvdHRvbTowcHg7cGFkZGluZzowcHg7bGluZS1oZWlnaHQ6
 MS41O2NsZWFyOmJvdGg7Y29sb3I6IzMzMzMzMztmb250LWZhbWlseTpUYWhvbWEsIEFyaWFsLCDl
 rovkvZMsIHNhbnMtc2VyaWY7Zm9udC1zaXplOjEycHg7d2hpdGUtc3BhY2U6bm9ybWFsO2JhY2tn
 cm91bmQtY29sb3I6I0ZGRkZGRjsmcXVvdDsmZ3Q7Jmx0O3NwYW4gc3R5bGU9JnF1b3Q7Zm9udC1z
 aXplOjEycHQ7bGluZS1oZWlnaHQ6MS41OyZxdW90OyZndDtIQljlm57msrnmu6Tmsrnlmajmu6To
 iq/vvIzljovlipvnrqHot6/ov4fmu6Tlmajmu6Toiq/vvIxIQlguQkjmsLTkuIDkuZnkuozphofm
 u6Toiq/vvIzpq5jljovov4fmu6Tlmajmu6Toiq/vvIznlLXljoLmtrLljovnq5nmu6Toiq/vvIzm
 u6Toiq/nlJ/kuqfljoLlrrbvvIzmu6Toiq/ku7fmoLzjgIImbHQ7L3NwYW4mZ3Q7Jmx0Oy9wJmd0
 OyZsdDtwIHN0eWxlPSZxdW90O21hcmdpbi10b3A6MHB4O21hcmdpbi1ib3R0b206MHB4O3doaXRl
 LXNwYWNlOm5vcm1hbDtwYWRkaW5nOjBweDtsaW5lLWhlaWdodDoxLjU7Y2xlYXI6Ym90aDtjb2xv
 cjojMzMzMzMzO2ZvbnQtZmFtaWx5OlRhaG9tYSwgQXJpYWwsIOWui+S9kywgc2Fucy1zZXJpZjtm
 b250LXNpemU6MTJweDtiYWNrZ3JvdW5kLWNvbG9yOiNGRkZGRkY7JnF1b3Q7Jmd0OyZsdDtzcGFu
 IHN0eWxlPSZxdW90O2ZvbnQtc2l6ZToxMnB0O2NvbG9yOiNFNTMzMzM7JnF1b3Q7Jmd0O0hCWOmr
 mOWOi+i/h+a7pOWZqOa7pOiKr+mqqOaetuS4uuWGsuWtlOe9kei/h+a7pOadkOaWmeS4uue6uOa7
 pOWSjOaIluWMlue6pO+8jOi/h+a7pOeyvuW6pjEtNDDOvO+8jOW9k+a7pOiKr+axoeafk+WgteWh
 nuWIsOi/m+WHuuayueWPo+WOi+W3ruS4ujAuMzVNUGHml7bvvIzljbPlj5Hlh7rlvIDlhbPkv6Hl
 j7fvvIzmraTml7bmm7TmjaLmu6Toiq/vvIzku6Xovr7liLDkv53miqTns7vnu5/lronlhajnmoTn
 show binlog events in 'mysql-bin.000420' from 247745876 limit 10;
 +------------------+-----------+-------------+-----------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 | Log_name         | Pos       | Event_type  | Server_id | End_log_pos | Info                                                                                                                                                                                                                               |
 +------------------+-----------+-------------+-----------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 | mysql-bin.000420 | 247745876 | Query       |       161 |   247745955 | BEGIN                                                                                                                                                                                                                              |
 | mysql-bin.000420 | 247745955 | Table_map   |       161 |   247746029 | table_id: 32795 (productinfo.abnorpd_info_prodesc_1)                                                                                                                                                                               |
 | mysql-bin.000420 | 247746029 | Delete_rows |       161 |   247757994 | table_id: 32795 flags: STMT_END_F                                                                                                                                                                                                  |
 | mysql-bin.000420 | 247757994 | Xid         |       161 |   247758025 | COMMIT /* xid=252181343 */                                                                                                                                                                                                         |
 | mysql-bin.000420 | 247758025 | Query       |       161 |   247758118 | BEGIN                                                                                                                                                                                                                              |
 | mysql-bin.000420 | 247758118 | Query       |       161 |   247758267 | use `productinfo`; delete from productinfo.abnorpd_pidgid where `pid` = '394428'                                                                                                                                                   |
 | mysql-bin.000420 | 247758267 | Xid         |       161 |   247758298 | COMMIT /* xid=252181344 */                                                                                                                                                                                                         |
 | mysql-bin.000420 | 247758298 | Query       |       161 |   247758391 | BEGIN
 | mysql-bin.000420 | 247758391 | Query       |       161 |   247758686 | use `productinfo`; update productinfo.pd_info set `repubtime`=1406962860,`endtime`=1407567660 where `pid` IN(26539170,31552303,36949618,37316846,37317735,37319669,37320493,37354757,37355154,37355457,39893784,39895067,40449867) |
 | mysql-bin.000420 | 247758686 | Xid         |       161 |   247758717 | COMMIT /* xid=252181348 */                                                                                                                                                                                                         |
 +------------------+-----------+-------------+-----------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 10 rows in set (0.00 sec)


中断的那一刻执行的是:对productinfo.abnorpd_info_prodesc_1更新操作,为什么不像往常那样

显示更新内容呢?在这里具体解释一下:

Row Based Replication(下称RBR)环境中遇到了一个Bug。这个bug简单的描述就是:

RBR对于DML需要通过table-map的event来标注每一个有更新的表。而当一个DML同时操作多个表,

且其中2个表的mapid相同时(通常为0),会导致slave执行这个event时crash,并重启mysqld实例

可见这个bug的毁灭性极大。



结合本次故障:

table_id: 32795 (productinfo.abnorpd_info_prodesc_1)   

table_id: 32795 flags: STMT_END_F                      

确实是同样的id,所以就会造成主从crash.



同时还要记录一下,主从同步虽然可以slave-skip-errors=all,但是并不能跳过所有DDL所引起的主从,


还是详细参考官网吧:


mysql 官网也详细的给出了,具体跳过错误的3种形式,再次记录一下,下次千万不要挑这个坑了。



--slave-skip-errors=[err_code1,err_code2,...|all|ddl_exist_errors]


Command-Line Format

--slave-skip-errors=name

Option-File Format

slave-skip-errors

System Variable Name

slave_skip_errors

Variable Scope

Global

Dynamic Variable

No

 

Permitted Values

Type

string

Default

OFF

Valid Values

OFF

[list of error codes]

all

ddl_exist_errors




MySQL 5.6 as well as MySQL Cluster NDB 7.3 support an additional shorthand value 


ddl_exist_errors, which is equivalent to the error code list 1007,1008,1050,1051,


1054,1060,1061,1068,1094,1146.


Examples:



--slave-skip-errors=1062,1053


--slave-skip-errors=all


--slave-skip-errors=ddl_exist_errors