MySQL性能优化之filesort
0. 问题
因为项目需求,需要使用一个group + join操作,但是因为表的数据量特别大,导致执行SQL的过程极其缓慢。在azkaban中调用大概需要100min。
里面的SQL怎么写的呢?主要如下:
select
t_1.*
,dd.department_type as department_type
,'${DATE_KEY}' as date_key
,current_timestamp as ETL_Created_Date
,current_timestamp as ETL_Modified_Date
,'${PROCESS_ID}' as process_id
,'mnt_fact' as rec_source
,dd.department_name as bns_area_name
from
(select
0 as is_qr
,5 as qualifier_id
···
,sum(mf.bare_order_qt) as bare_order_qt
,sum(mf.bare_logic_bns_amount) as bare_logic_bns_amount
,sum(mf.bare_sell_devices) as bare_sell_devices
,count(mf.device_code) as device_qt
,count(distinct mf.shop_id) as shop_qt
from mnt_fact mf
where date_key <= '${DATE_KEY}'
and date_key >= date_format(date_sub('${DATE_KEY}', interval day('${DATE_KEY}')-1 day),'%Y%m%d')
group by
mf.bns_area1_id
,mf.device_type_class
)t_1
inner join datamart.vw_dim_department_cur dd
on t_1.bns_area_id = dd.department_id
使用explain 查看执行计划如下:
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1564460
filtered: 100.00
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: dim_department
partitions: NULL
type: ref
possible_keys: idx_department_id
key: idx_department_id
key_len: 8
ref: t_1.bns_area_id
rows: 5
filtered: 10.00
Extra: Using index condition
*************************** 3. row ***************************
id: 2
select_type: DERIVED
table: mf
partitions: part_20181101,part_20181102,part_20181103,part_20181104,part_20181105,part_20181106,part_20181107,part_20181108,part_20181109,part_20181110,part_20181111,part_20181112,part_20181113,part_20181114,part_20181115,part_20181116,part_20181117,part_20181118,part_20181119,part_20181120,part_20181121,part_20181122,part_20181123,part_20181124,part_20181125,part_20181126,part_20181127,part_20181128
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 14082963
filtered: 11.11
Extra: Using where; Using filesort
3 rows in set, 1 warning (0.00 sec)
可以看到这里的第三行,出现了Using filesort
,这个就是在group by时,我们就会出现文件排序的问题。下面我详细解释一下这个filesort
1. filesort
是什么?
2. 原因
2. 解决办法
创建索引,避免使用filesort。
查看原表索引如下:
mysql> show index from mnt_fact;
+----------+------------+-----------------------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+-----------------------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| mnt_fact | 0 | PRIMARY | 1 | ID | A | 15902350 | NULL | NULL | | BTREE | | |
| mnt_fact | 0 | PRIMARY | 2 | date_key | A | 15978838 | NULL | NULL | | BTREE | | |
| mnt_fact | 1 | idx_device | 1 | device_code | A | 15914067 | NULL | NULL | YES | BTREE | | |
| mnt_fact | 1 | idx_dept_key | 1 | department_key | A | 60584 | NULL | NULL | YES | BTREE | | |
| mnt_fact | 1 | idx_emp_shop_devtype | 1 | employee_code | A | 66114 | NULL | NULL | | BTREE | | |
| mnt_fact | 1 | idx_emp_shop_devtype | 2 | shop_id | A | 8483195 | NULL | NULL | YES | BTREE | | |
| mnt_fact | 1 | idx_emp_shop_devtype | 3 | device_type_class | A | 7341482 | NULL | NULL | YES | BTREE | | |
可以看到是没有mf.bns_area1_id
和 mf.device_type_class
的索引,虽然这个索引的Cardinality
是非常的小,但是因为其可以在group by中起到非常重要的作用,所以在执行上述这个SQL时,仍然会按照索引group。
针对group by 字段添加如下索引:
| mnt_fact | 1 | idx_bns6_deviceTypeClass_empCode_shopID | 1 | bns_area6_id | A | 21778 | NULL | NULL | YES | BTREE | | |
| mnt_fact | 1 | idx_bns6_deviceTypeClass_empCode_shopID | 2 | device_type_class | A | 66270 | NULL | NULL | YES | BTREE | | |
| mnt_fact | 1 | idx_bns6_deviceTypeClass_empCode_shopID | 3 | employee_code | A | 176908 | NULL | NULL | | BTREE | | |
| mnt_fact | 1 | idx_bns6_deviceTypeClass_empCode_shopID | 4 | shop_id | A | 7485426 | NULL | NULL | YES | BTREE | | |
执行计划如下:
*************************** 3. row ***************************
id: 2
select_type: DERIVED
table: mf
partitions: part_20181101,part_20181102,part_20181103,part_20181104,part_20181105,part_20181106,part_20181107,part_20181108,part_20181109,part_20181110,part_20181111,part_20181112,part_20181113,part_20181114,part_20181115,part_20181116,part_20181117,part_20181118,part_20181119,part_20181120,part_20181121,part_20181122,part_20181123,part_20181124,part_20181125,part_20181126,part_20181127,part_20181128
type: index
possible_keys: idx_device_type_class_bns4,idx_bns1_bns2_bns3_bns4_bns5_bns6_deviceTypeClass
key: idx_device_type_class_bns4
key_len: 214
ref: NULL
rows: 14103728
filtered: 11.11
Extra: Using where
3 rows in set, 1 warning (0.00 sec)
可以看到这次没有执行filesort
。然后执行SQL会从原先的100min 缩减到10 min。