计算每小时累计销售额
需求:
使用sql计算出销售订单 从0点到当前小时累计销售, 要求展示每小时的累计销售额(同样每分钟, 每半小时都可以使用此方法得出)
订单表:t_sales (order_id: 订单id; order_time: 订单时间(格式:2024-01-06 11:01:08); gmv: 交易金额;)
1- 数据准备
set odps.sql.hive.compatible=true;
WITH t_sales AS (
SELECT * FROM (
VALUES (100, '2023-11-21 00:05:32', 100)
, (101, '2023-11-21 01:09:20', 200)
, (102, '2023-11-21 04:09:53', 100)
, (103, '2023-11-21 04:32:20', 300)
, (104, '2023-11-21 07:05:20', 200)
, (105, '2023-11-21 09:08:25', 100)
, (106, '2023-11-21 11:05:20', 100)
, (107, '2023-11-21 16:05:41', 200)
, (108, '2023-11-21 20:00:20', 100)
, (109, '2023-11-21 22:57:00', 300)
, (110, '2023-11-22 01:10:10', 100)
, (111, '2023-11-22 05:21:00', 200)
) AS table_name(order_id, order_time, gmv)
)
2- 代码实现
- 取出 订单表的最小日期和最大日期
-- 1. 取出 订单表的最小日期和最大日期
SELECT
DATE_FORMAT(MIN(order_time),'yyyy-MM-dd 00:00:00') AS min_order_time
, DATE_FORMAT(MAX(order_time),'yyyy-MM-dd 23:59:59') AS max_order_time
FROM t_sales
;
min_order_time | max_order_time |
2023-11-21 00:00:00 | 2023-11-22 23:59:59 |
- 利用最小日期和最大日期, 得出连续的小时序列, 从最小日期的0点到最大日期的23点
-- 2. 利用最小日期和最大日期, 得出连续的小时序列, 从最小日期的0点到最大日期的23点
SELECT
min_order_time, pos, val
, FROM_UNIXTIME(UNIX_TIMESTAMP(min_order_time) + pos * 60 * 60, 'yyyy-MM-dd HH') AS hour_rn
FROM (
SELECT -- 1. 取出 订单表的最小日期和最大日期
DATE_FORMAT(MIN(order_time),'yyyy-MM-dd 00:00:00') AS min_order_time
, DATE_FORMAT(MAX(order_time),'yyyy-MM-dd 23:59:59') AS max_order_time
FROM t_sales
) a
LATERAL VIEW POSEXPLODE( SPLIT(REPEAT('#,', CAST( (UNIX_TIMESTAMP(max_order_time ) - UNIX_TIMESTAMP(min_order_time) + 1 )/(60*60) AS INT ) ), ',') ) tab as pos,val
-- 注意: 1. 时间戳相减需要加1, 需要计算最大日期最小日期的区间值; 2. 替换时使用双字符串, 切分的时候才不为空
;
min_order_time | pos | val | hour_rn |
2023-11-21 00:00:00 | 0 | # | 2023-11-21 00 |
2023-11-21 00:00:00 | 1 | # | 2023-11-21 01 |
2023-11-21 00:00:00 | 2 | # | 2023-11-21 02 |
2023-11-21 00:00:00 | 3 | # | 2023-11-21 03 |
2023-11-21 00:00:00 | 4 | # | 2023-11-21 04 |
2023-11-21 00:00:00 | 5 | # | 2023-11-21 05 |
2023-11-21 00:00:00 | 6 | # | 2023-11-21 06 |
2023-11-21 00:00:00 | 7 | # | 2023-11-21 07 |
2023-11-21 00:00:00 | 8 | # | 2023-11-21 08 |
2023-11-21 00:00:00 | 9 | # | 2023-11-21 09 |
2023-11-21 00:00:00 | 10 | # | 2023-11-21 10 |
… | … | … | … |
2023-11-21 00:00:00 | 23 | # | 2023-11-21 23 |
… | … | … | … |
2023-11-21 00:00:00 | 47 | # | 2023-11-22 23 |
- 将小时序列和订单时间(转换成小时), 一一关联
-- 3. 将小时序列和订单时间(转换成小时), 一一关联
SELECT
hour_rn
, NVL(gmv, 0) AS gmv
FROM (
SELECT -- 2. 利用最小日期和最大日期, 得出连续的小时序列, 从最小日期的0点到最大日期的23点
FROM_UNIXTIME(UNIX_TIMESTAMP(min_order_time) + pos * 60 * 60, 'yyyy-MM-dd HH') AS hour_rn
FROM (
SELECT -- 1. 取出 订单表的最小日期和最大日期
DATE_FORMAT(MIN(order_time),'yyyy-MM-dd 00:00:00') AS min_order_time
, DATE_FORMAT(MAX(order_time),'yyyy-MM-dd 23:59:59') AS max_order_time
FROM t_sales
) a
LATERAL VIEW POSEXPLODE( SPLIT(REPEAT('#,', CAST( (UNIX_TIMESTAMP(max_order_time ) - UNIX_TIMESTAMP(min_order_time) + 1 )/(60*60) AS INT ) ), ',') ) tab as pos,val
-- 注意: 1. 时间戳相减需要加1, 需要计算最大日期最小日期的区间值; 2. 替换时使用双字符串, 切分的时候才不为空
) a
LEFT JOIN
(
SELECT
SUBSTRING(order_time, 1, 13) AS order_time
, SUM(gmv) AS gmv
FROM t_sales
GROUP BY SUBSTRING(order_time, 1, 13)
) b
ON a.hour_rn = b.order_time
;
hour_rn | gmv |
2023-11-21 00 | 100 |
2023-11-21 01 | 200 |
2023-11-21 02 | 0 |
2023-11-21 03 | 0 |
2023-11-21 04 | 400 |
2023-11-21 05 | 0 |
2023-11-21 06 | 0 |
2023-11-21 07 | 200 |
2023-11-21 08 | 0 |
2023-11-21 09 | 100 |
2023-11-21 10 | 0 |
… | … |
- 使用窗口函数累加求和
-- 4. 使用窗口函数累加求和
SELECT
hour_rn
, SUM(gmv) OVER(PARTITION BY SUBSTRING(hour_rn, 1, 10) ORDER BY hour_rn) AS d_gmv
FROM (
SELECT -- 3. 将小时序列和订单时间(转换成小时), 一一关联
hour_rn
, NVL(gmv, 0) AS gmv
FROM (
SELECT -- 2. 利用最小日期和最大日期, 得出连续的小时序列, 从最小日期的0点到最大日期的23点
FROM_UNIXTIME(UNIX_TIMESTAMP(min_order_time) + pos * 60 * 60, 'yyyy-MM-dd HH') AS hour_rn
FROM (
SELECT -- 1. 取出 订单表的最小日期和最大日期
DATE_FORMAT(MIN(order_time),'yyyy-MM-dd 00:00:00') AS min_order_time
, DATE_FORMAT(MAX(order_time),'yyyy-MM-dd 23:59:59') AS max_order_time
FROM t_sales
) a
LATERAL VIEW POSEXPLODE( SPLIT(REPEAT('#,', CAST( (UNIX_TIMESTAMP(max_order_time ) - UNIX_TIMESTAMP(min_order_time) + 1 )/(60*60) AS INT ) ), ',') ) tab as pos,val
-- 注意: 1. 时间戳相减需要加1, 需要计算最大日期最小日期的区间值; 2. 替换时使用双字符串, 切分的时候才不为空
) a
LEFT JOIN
(
SELECT
SUBSTRING(order_time, 1, 13) AS order_time
, SUM(gmv) AS gmv
FROM t_sales
GROUP BY SUBSTRING(order_time, 1, 13)
) b
ON a.hour_rn = b.order_time
) t
;
hour_rn | d_gmv |
2023-11-21 00 | 100 |
2023-11-21 01 | 300 |
2023-11-21 02 | 300 |
2023-11-21 03 | 300 |
2023-11-21 04 | 700 |
2023-11-21 05 | 700 |
2023-11-21 06 | 700 |
2023-11-21 07 | 900 |
2023-11-21 08 | 900 |
2023-11-21 09 | 1000 |
2023-11-21 10 | 1000 |
2023-11-21 11 | 1100 |
2023-11-21 12 | 1100 |
2023-11-21 13 | 1100 |
2023-11-21 14 | 1100 |
2023-11-21 15 | 1100 |
2023-11-21 16 | 1300 |
2023-11-21 17 | 1300 |
2023-11-21 18 | 1300 |
2023-11-21 19 | 1300 |
2023-11-21 20 | 1400 |
2023-11-21 21 | 1400 |
2023-11-21 22 | 1700 |
2023-11-21 23 | 1700 |
2023-11-22 00 | 0 |
2023-11-22 01 | 100 |
2023-11-22 02 | 100 |
2023-11-22 03 | 100 |
2023-11-22 04 | 100 |
2023-11-22 05 | 300 |
2023-11-22 06 | 300 |
… | … |
2023-11-22 23 | 300 |
3- 总结
本题的核心是找到小时级别的序列,‘统计每年在校人数’ 也是相同类型的题目, 可以熟练使用, 不管是天级别,小时级别,分钟级别 都可以使用次方法实现;
主要使用了 LATERAL VIEW POSEXPLODE() 试图表 + 开窗函数 技术实现;
‘统计每年在校人数’:
end