这是个不存在数据倾斜的场景。任务运行后只有一个reduce Task在执行。原因在于Hive表的Statistics信息数据错误,跟实际数据量不一致。导致解析执行计划优化任务数据量很小,分配一个Task处理数据。所以出现一个Task一直执行不完成的情况。

问题描述:

执行如下语句,从WebUI上看,有一个Task一直执行,且日志中看不到有效信息定位问题。

SELECT user_id,
MAX(max_value) AS max_value
FROM test.dwd_test_table
WHERE dt = '20211208'
GROUP BY user_id;

问题排查和处理:

1、查看explain,发现Statistics信息与实际不符

通过explain查看执行计划,其中数据统计Statistics: Num rows: 79394。实际有 568185268 条数据

2、通过ANALYZE TABLE语法重新统计STATISTICS


ANALYZE TABLE test.dwd_test_table partition(dt = '20211208') COMPUTE STATISTICS;

或者


ANALYZE TABLE test.dwd_test_table partition(dt) COMPUTE STATISTICS;


3、重新explain,查看数据情况。

Statistics: Num rows: 568185268,与实际数据条数相符


ANALYZE TABLE作用:

Hive DML操作都会给表生成统计信息statistics(LOAD DATA语句除外). statistics重要作用的其中之一,就是对查询优化提供数据。

statistics信息作为优化器的输入,以便优化器可以比较不同执行计划并选择最优。

对于已经存在的表和分区。 我们可以使用ANALYZE命令,重新收集statistics信息并保存到Hive metastore中。

针对本次执行计划异常,就是statistics信息不准确导致。在执行ANALYZE命令后。数据查询任务的执行计划正常。

执行ANALYZE TABLE前后对比:

前:Statistics: Num rows: 79394 Data size: 238182 Basic stats: COMPLETE Column stats: NONE |

后:Statistics: Num rows: 568185268 Data size: 2272741072 Basic stats: COMPLETE Column stats: NONE |

0: jdbc:hive2://172.168.0.1:10010/>
0: jdbc:hive2://172.168.0.1:10010/> explain
. . . . . . . . . . . . . . . . . .> SELECT user_id,
. . . . . . . . . . . . . . . . . .> MAX(max_value) AS max_value
. . . . . . . . . . . . . . . . . .> FROM test.dwd_test_table
. . . . . . . . . . . . . . . . . .> WHERE dt = '20211208'
. . . . . . . . . . . . . . . . . .> GROUP BY user_id;
INFO : Compiling command(queryId=hadoop_20211209194954_192b9bba-52fb-4f04-846d-42a68ac29a70): explain
SELECT user_id,
MAX(max_value) AS max_value
FROM test.dwd_test_table
WHERE dt = '20211208'
GROUP BY user_id
INFO : Semantic Analysis Completed

+----------------------------------------------------+
| Explain |
+----------------------------------------------------+
| STAGE DEPENDENCIES: |
| Stage-1 is a root stage |
| Stage-0 depends on stages: Stage-1 |
| |
| STAGE PLANS: |
| Stage: Stage-1 |
| Spark |
| Edges: |
| Reducer 2 <- Map 1 (GROUP, 1) |
| DagName: hadoop_20211209194954_192b9bba-52fb-4f04-846d-42a68ac29a70:7 |
| Vertices: |
| Map 1 |
| Map Operator Tree: |
| TableScan |
| alias: dwd_test_table |
| filterExpr: (dt = '20211208') (type: boolean) |
| Statistics: Num rows: 79394 Data size: 238182 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: user_id (type: string), max_value (type: smallint) |
| outputColumnNames: user_id, max_value |
| Statistics: Num rows: 79394 Data size: 238182 Basic stats: COMPLETE Column stats: NONE |
| Group By Operator |
| aggregations: max(max_value) |
| keys: user_id (type: string) |
| mode: hash |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 79394 Data size: 238182 Basic stats: COMPLETE Column stats: NONE |
| Reduce Output Operator |
| key expressions: _col0 (type: string) |
| sort order: + |
| Map-reduce partition columns: _col0 (type: string) |
| Statistics: Num rows: 79394 Data size: 238182 Basic stats: COMPLETE Column stats: NONE |
| value expressions: _col1 (type: smallint) |
| Reducer 2 |
| Reduce Operator Tree: |
| Group By Operator |
| aggregations: max(VALUE._col0) |
| keys: KEY._col0 (type: string) |
| mode: mergepartial |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 39697 Data size: 119091 Basic stats: COMPLETE Column stats: NONE |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 39697 Data size: 119091 Basic stats: COMPLETE Column stats: NONE |
| table: |
| input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
| output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| ListSink |
| |
+----------------------------------------------------+
55 rows selected (0.067 seconds)
0: jdbc:hive2://172.168.0.1:10010/> ANALYZE TABLE test.dwd_test_table partition(dt = '20211208') COMPUTE STATISTICS;
INFO : Job Progress Format
CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount
INFO : 2021-12-09 19:50:29,885 Stage-0_0: 0(+1)/70
INFO : 2021-12-09 19:50:32,906 Stage-0_0: 0(+1)/70
……
INFO : 2021-12-09 19:51:03,068 Stage-0_0: 68(+2)/70
INFO : 2021-12-09 19:51:06,123 Stage-0_0: 70/70 Finished
No rows affected (63.593 seconds)
0: jdbc:hive2://172.168.0.1:10010/> explain
. . . . . . . . . . . . . . . . . .> SELECT user_id,
. . . . . . . . . . . . . . . . . .> MAX(max_value) AS max_value
. . . . . . . . . . . . . . . . . .> FROM test.dwd_test_table
. . . . . . . . . . . . . . . . . .> WHERE dt = '20211208'
. . . . . . . . . . . . . . . . . .> GROUP BY user_id;
INFO : Starting task [Stage-2:EXPLAIN] in serial mode
INFO : Completed executing command(queryId=hadoop_20211209195144_960ed455-0d04-4fd3-b034-b81b24a9d5b3); Time taken: 0.002 seconds
INFO : OK
+----------------------------------------------------+
| Explain |
+----------------------------------------------------+
| STAGE DEPENDENCIES: |
| Stage-1 is a root stage |
| Stage-0 depends on stages: Stage-1 |
| |
| STAGE PLANS: |
| Stage: Stage-1 |
| Spark |
| Edges: |
| Reducer 2 <- Map 1 (GROUP, 9) |
| DagName: hadoop_20211209195144_960ed455-0d04-4fd3-b034-b81b24a9d5b3:9 |
| Vertices: |
| Map 1 |
| Map Operator Tree: |
| TableScan |
| alias: dwd_test_table |
| filterExpr: (dt = '20211208') (type: boolean) |
| Statistics: Num rows: 568185268 Data size: 2272741072 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: user_id (type: string), max_value (type: smallint) |
| outputColumnNames: user_id, max_value |
| Statistics: Num rows: 568185268 Data size: 2272741072 Basic stats: COMPLETE Column stats: NONE |
| Group By Operator |
| aggregations: max(max_value) |
| keys: user_id (type: string) |
| mode: hash |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 568185268 Data size: 2272741072 Basic stats: COMPLETE Column stats: NONE |
| Reduce Output Operator |
| key expressions: _col0 (type: string) |
| sort order: + |
| Map-reduce partition columns: _col0 (type: string) |
| Statistics: Num rows: 568185268 Data size: 2272741072 Basic stats: COMPLETE Column stats: NONE |
| value expressions: _col1 (type: smallint) |
| Reducer 2 |
| Reduce Operator Tree: |
| Group By Operator |
| aggregations: max(VALUE._col0) |
| keys: KEY._col0 (type: string) |
| mode: mergepartial |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 284092634 Data size: 1136370536 Basic stats: COMPLETE Column stats: NONE |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 284092634 Data size: 1136370536 Basic stats: COMPLETE Column stats: NONE |
| table: |
| input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
| output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
| |
| Stage: Stage-0 |
| Fetch Operator |
| limit: -1 |
| Processor Tree: |
| ListSink |
| |
+----------------------------------------------------+
55 rows selected (0.25 seconds)
0: jdbc:hive2://172.168.0.1:10010/>