MySQL性能优化之filesort

0. 问题

因为项目需求,需要使用一个group + join操作,但是因为表的数据量特别大,导致执行SQL的过程极其缓慢。在azkaban中调用大概需要100min。
MySQL性能优化之filesort_MySQL里面的SQL怎么写的呢?主要如下:

select 
t_1.*
,dd.department_type as department_type
,'${DATE_KEY}' as date_key
,current_timestamp 		as ETL_Created_Date
,current_timestamp 		as ETL_Modified_Date
,'${PROCESS_ID}' 		as process_id
,'mnt_fact' 			as rec_source
,dd.department_name  as bns_area_name
from
(select       
 0 as is_qr 
,5 as qualifier_id
···
,sum(mf.bare_order_qt) 				as bare_order_qt
,sum(mf.bare_logic_bns_amount) 		as bare_logic_bns_amount
,sum(mf.bare_sell_devices) 			as bare_sell_devices
,count(mf.device_code) 				as device_qt
,count(distinct mf.shop_id) 		as shop_qt
from mnt_fact mf
where date_key <= '${DATE_KEY}'
and date_key >= date_format(date_sub('${DATE_KEY}', interval day('${DATE_KEY}')-1 day),'%Y%m%d')
group by 
mf.bns_area1_id
,mf.device_type_class
)t_1
inner join datamart.vw_dim_department_cur dd
on t_1.bns_area_id = dd.department_id

使用explain 查看执行计划如下:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1564460
     filtered: 100.00
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: dim_department
   partitions: NULL
         type: ref
possible_keys: idx_department_id
          key: idx_department_id
      key_len: 8
          ref: t_1.bns_area_id
         rows: 5
     filtered: 10.00
        Extra: Using index condition
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: mf
   partitions: part_20181101,part_20181102,part_20181103,part_20181104,part_20181105,part_20181106,part_20181107,part_20181108,part_20181109,part_20181110,part_20181111,part_20181112,part_20181113,part_20181114,part_20181115,part_20181116,part_20181117,part_20181118,part_20181119,part_20181120,part_20181121,part_20181122,part_20181123,part_20181124,part_20181125,part_20181126,part_20181127,part_20181128
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 14082963
     filtered: 11.11
        Extra: Using where; Using filesort
3 rows in set, 1 warning (0.00 sec)

可以看到这里的第三行,出现了Using filesort,这个就是在group by时,我们就会出现文件排序的问题。下面我详细解释一下这个filesort

1. filesort 是什么?

2. 原因

2. 解决办法

创建索引,避免使用filesort。
查看原表索引如下:

mysql> show index from mnt_fact;
+----------+------------+-----------------------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name                                | Seq_in_index | Column_name       | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+-----------------------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| mnt_fact |          0 | PRIMARY                                 |            1 | ID                | A         |    15902350 |     NULL | NULL   |      | BTREE      |         |               |
| mnt_fact |          0 | PRIMARY                                 |            2 | date_key          | A         |    15978838 |     NULL | NULL   |      | BTREE      |         |               |
| mnt_fact |          1 | idx_device                              |            1 | device_code       | A         |    15914067 |     NULL | NULL   | YES  | BTREE      |         |               |
| mnt_fact |          1 | idx_dept_key                            |            1 | department_key    | A         |       60584 |     NULL | NULL   | YES  | BTREE      |         |               |
| mnt_fact |          1 | idx_emp_shop_devtype                    |            1 | employee_code     | A         |       66114 |     NULL | NULL   |      | BTREE      |         |               |
| mnt_fact |          1 | idx_emp_shop_devtype                    |            2 | shop_id           | A         |     8483195 |     NULL | NULL   | YES  | BTREE      |         |               |
| mnt_fact |          1 | idx_emp_shop_devtype                    |            3 | device_type_class | A         |     7341482 |     NULL | NULL   | YES  | BTREE      |         |               |

可以看到是没有mf.bns_area1_idmf.device_type_class 的索引,虽然这个索引的Cardinality是非常的小,但是因为其可以在group by中起到非常重要的作用,所以在执行上述这个SQL时,仍然会按照索引group。
针对group by 字段添加如下索引:

| mnt_fact |          1 | idx_bns6_deviceTypeClass_empCode_shopID |            1 | bns_area6_id      | A         |       21778 |     NULL | NULL   | YES  | BTREE      |         |               |
| mnt_fact |          1 | idx_bns6_deviceTypeClass_empCode_shopID |            2 | device_type_class | A         |       66270 |     NULL | NULL   | YES  | BTREE      |         |               |
| mnt_fact |          1 | idx_bns6_deviceTypeClass_empCode_shopID |            3 | employee_code     | A         |      176908 |     NULL | NULL   |      | BTREE      |         |               |
| mnt_fact |          1 | idx_bns6_deviceTypeClass_empCode_shopID |            4 | shop_id           | A         |     7485426 |     NULL | NULL   | YES  | BTREE      |         |               |

执行计划如下:

*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: mf
   partitions: part_20181101,part_20181102,part_20181103,part_20181104,part_20181105,part_20181106,part_20181107,part_20181108,part_20181109,part_20181110,part_20181111,part_20181112,part_20181113,part_20181114,part_20181115,part_20181116,part_20181117,part_20181118,part_20181119,part_20181120,part_20181121,part_20181122,part_20181123,part_20181124,part_20181125,part_20181126,part_20181127,part_20181128
         type: index
possible_keys: idx_device_type_class_bns4,idx_bns1_bns2_bns3_bns4_bns5_bns6_deviceTypeClass
          key: idx_device_type_class_bns4
      key_len: 214
          ref: NULL
         rows: 14103728
     filtered: 11.11
        Extra: Using where
3 rows in set, 1 warning (0.00 sec)

可以看到这次没有执行filesort。然后执行SQL会从原先的100min 缩减到10 min。