窗口聚合函数
- 窗口聚合函数:SUM、AVG、COUNT、MAX、MIN
- 以SUM为例
- 总结
- 实现分区内起始到当前行的pv累加,指定排序默认窗口:取分区第一行开始到当前行的和
- 实现分区内起始到当前行的pv累加,指定排序默认窗口:取分区第一行开始到当前行的和,每组的相同level分享同一累加值。
- 实现分区内所有pv的累加,不指定排序默认窗口:从第一行到最后一行
- 实现分区内起始到当前行的pv累加,手动指定窗口大小从第一行到当前行
- 实现分区内指定前N行到当前行的pv累加
- 实现分区内指定前N行到后N行的pv累加
- 实现分区内指定当前行到后N行的pv累加
窗口聚合函数:SUM、AVG、COUNT、MAX、MIN
以SUM为例
总结
1.如果不使用 over(),窗口大小是针对查询产生的所有数据
- 功能:用于实现数据分区后的聚合
- 语法:fun_name(col1) over (partition by col2 order by col3)
- 示例:实现分区内的累加,其他的原理类似
- 不指定窗口时,不排序默认第一行到最后一行,排序默认第一行到当前行
- 指定窗口时 --rows between 起始位置 and 结束位置
- N preceding:往前多少行
- N following:往后多少行
- current row:当前行
- unbounded:起点或者终点,没有边界
- unbounded preceding 表示从前面的起点
- unbounded following:表示到后面的终点
- 注意
分组列表,组内按排列顺序求累加和, 每组的相同level(排序字段)分享同一累加值。
如果level(排序字段)有相同,需要注意是否是你想要的结果,是否需要考虑指定窗口位置
实现分区内起始到当前行的pv累加,指定排序默认窗口:取分区第一行开始到当前行的和
--创建文件:vim /export/datas/window.txt
cookie1,2018-04-10,1
cookie1,2018-04-11,5
cookie1,2018-04-12,7
cookie1,2018-04-13,3
cookie2,2018-04-13,3
cookie2,2018-04-14,2
cookie2,2018-04-15,4
cookie1,2018-04-14,2
cookie1,2018-04-15,4
cookie1,2018-04-16,4
cookie2,2018-04-10,1
cookie2,2018-04-11,5
cookie2,2018-04-12,7
--创建表
create database db_function;
use db_function;
create table hanjiaxiaozhi_f1(
cookieid string,
daystr string,
pv int
) row format delimited fields terminated by ',';
--加载数据
load data local inpath '/export/datas/window.txt' into table hanjiaxiaozhi_f1;
set hive.exec.mode.local.auto=true;
--实现分区内起始到当前行的pv累加,默认窗口:取分区第一行开始到当前行的和
select
cookieid,
daystr,
pv,
sum(pv) over(partition by cookieid order by daystr) as pv1
from
hanjiaxiaozhi_f1;
+-----------+-------------+-----+------+--+
| cookieid | daystr | pv | pv3 |
+-----------+-------------+-----+------+--+
| cookie1 | 2018-04-10 | 1 | 1 |
| cookie1 | 2018-04-11 | 5 | 6 |
| cookie1 | 2018-04-12 | 7 | 13 |
| cookie1 | 2018-04-13 | 3 | 16 |
| cookie1 | 2018-04-14 | 2 | 18 |
| cookie1 | 2018-04-15 | 4 | 22 |
| cookie1 | 2018-04-16 | 4 | 26 |
| cookie2 | 2018-04-10 | 1 | 1 |
| cookie2 | 2018-04-11 | 5 | 6 |
| cookie2 | 2018-04-12 | 7 | 13 |
| cookie2 | 2018-04-13 | 3 | 16 |
| cookie2 | 2018-04-14 | 2 | 18 |
| cookie2 | 2018-04-15 | 4 | 22 |
+-----------+-------------+-----+------+--+
实现分区内起始到当前行的pv累加,指定排序默认窗口:取分区第一行开始到当前行的和,每组的相同level分享同一累加值。
select id,score,
sum(score) over(patition by id order by score) cumm_scores
from GRADE
id | score | cumm_scores |
1 | 98 | 98 |
1 | 100 | 198 |
2 | 95 | 190 |
2 | 95 | 190 |
实现分区内所有pv的累加,不指定排序默认窗口:从第一行到最后一行
select
cookieid,
daystr,
pv,
sum(pv) over(partition by cookieid ) as pv2
from
hanjiaxiaozhi_f1;
+-----------+-------------+-----+------+--+
| cookieid | daystr | pv | pv2 |
+-----------+-------------+-----+------+--+
| cookie1 | 2018-04-16 | 4 | 26 |
| cookie1 | 2018-04-15 | 4 | 26 |
| cookie1 | 2018-04-13 | 3 | 26 |
| cookie1 | 2018-04-12 | 7 | 26 |
| cookie1 | 2018-04-11 | 5 | 26 |
| cookie1 | 2018-04-14 | 2 | 26 |
| cookie1 | 2018-04-10 | 1 | 26 |
| cookie2 | 2018-04-12 | 7 | 22 |
| cookie2 | 2018-04-11 | 5 | 22 |
| cookie2 | 2018-04-10 | 1 | 22 |
| cookie2 | 2018-04-14 | 2 | 22 |
| cookie2 | 2018-04-13 | 3 | 22 |
| cookie2 | 2018-04-15 | 4 | 22 |
+-----------+-------------+-----+------+--+
手动指定窗口的大小:分区
rows between 起始位置 and 结束位置
指定窗口的关键字:rows between 前 and 后
- N preceding:往前多少行
- N following:往后多少行
- current row:当前行
- unbounded:起点或者终点,没有边界
- unbounded preceding 表示从前面的起点
- unbounded following:表示到后面的终点
rows between unbounded preceding and current row
--实现分区内从第一行到当前行的pv累加
--手动指定窗口的大小:分区
rows between 起始位置 and 结束位置
rows between unbounded preceding and current row
实现分区内起始到当前行的pv累加,手动指定窗口大小从第一行到当前行
select
cookieid,
daystr,
pv,
sum(pv) over(partition by cookieid order by daystr rows between unbounded preceding and current row) as pv3
from
hanjiaxiaozhi_f1;
+-----------+-------------+-----+------+--+
| cookieid | daystr | pv | pv3 |
+-----------+-------------+-----+------+--+
| cookie1 | 2018-04-10 | 1 | 1 |
| cookie1 | 2018-04-11 | 5 | 6 |
| cookie1 | 2018-04-12 | 7 | 13 |
| cookie1 | 2018-04-13 | 3 | 16 |
| cookie1 | 2018-04-14 | 2 | 18 |
| cookie1 | 2018-04-15 | 4 | 22 |
| cookie1 | 2018-04-16 | 4 | 26
| cookie2 | 2018-04-10 | 1 | 1 |
| cookie2 | 2018-04-11 | 5 | 6 |
| cookie2 | 2018-04-12 | 7 | 13 |
| cookie2 | 2018-04-13 | 3 | 16 |
| cookie2 | 2018-04-14 | 2 | 18 |
| cookie2 | 2018-04-15 | 4 | 22 |
+-----------+-------------+-----+------+--+
实现分区内指定前N行到当前行的pv累加
select
cookieid,
daystr,
pv,
sum(pv) over(partition by cookieid order by daystr rows between 3 preceding and current row) as pv4
from
hanjiaxiaozhi_f1;
+-----------+-------------+-----+------+--+
| cookieid | daystr | pv | pv4 |
+-----------+-------------+-----+------+--+
| cookie1 | 2018-04-10 | 1 | 1 |
| cookie1 | 2018-04-11 | 5 | 6 |
| cookie1 | 2018-04-12 | 7 | 13 |
| cookie1 | 2018-04-13 | 3 | 16 |
| cookie1 | 2018-04-14 | 2 | 17 |
| cookie1 | 2018-04-15 | 4 | 16 |
| cookie1 | 2018-04-16 | 4 | 13 |
| cookie2 | 2018-04-10 | 1 | 1 |
| cookie2 | 2018-04-11 | 5 | 6 |
| cookie2 | 2018-04-12 | 7 | 13 |
| cookie2 | 2018-04-13 | 3 | 16 |
| cookie2 | 2018-04-14 | 2 | 17 |
| cookie2 | 2018-04-15 | 4 | 16 |
+-----------+-------------+-----+------+--+
实现分区内指定前N行到后N行的pv累加
select
cookieid,
daystr,
pv,
sum(pv) over(partition by cookieid order by daystr rows between 3 preceding and 1 following) as pv5
from
hanjiaxiaozhi_f1;
+-----------+-------------+-----+------+--+
| cookieid | daystr | pv | pv5 |
+-----------+-------------+-----+------+--+
| cookie1 | 2018-04-10 | 1 | 6 |
| cookie1 | 2018-04-11 | 5 | 13 |
| cookie1 | 2018-04-12 | 7 | 16 |
| cookie1 | 2018-04-13 | 3 | 18 |
| cookie1 | 2018-04-14 | 2 | 21 |
| cookie1 | 2018-04-15 | 4 | 20 |
| cookie1 | 2018-04-16 | 4 | 13 |
| cookie2 | 2018-04-10 | 1 | 6 |
| cookie2 | 2018-04-11 | 5 | 13 |
| cookie2 | 2018-04-12 | 7 | 16 |
| cookie2 | 2018-04-13 | 3 | 18 |
| cookie2 | 2018-04-14 | 2 | 21 |
| cookie2 | 2018-04-15 | 4 | 16 |
+-----------+-------------+-----+------+--+
实现分区内指定当前行到后N行的pv累加
select
cookieid,
daystr,
pv,
sum(pv) over(partition by cookieid order by daystr rows between current row and unbounded following) as pv6
from
hanjiaxiaozhi_f1;
+-----------+-------------+-----+------+--+
| cookieid | daystr | pv | pv6 |
+-----------+-------------+-----+------+--+
| cookie1 | 2018-04-10 | 1 | 26 |
| cookie1 | 2018-04-11 | 5 | 25 |
| cookie1 | 2018-04-12 | 7 | 20 |
| cookie1 | 2018-04-13 | 3 | 13 |
| cookie1 | 2018-04-14 | 2 | 10 |
| cookie1 | 2018-04-15 | 4 | 8 |
| cookie1 | 2018-04-16 | 4 | 4 |
| cookie2 | 2018-04-10 | 1 | 22 |
| cookie2 | 2018-04-11 | 5 | 21 |
| cookie2 | 2018-04-12 | 7 | 16 |
| cookie2 | 2018-04-13 | 3 | 9 |
| cookie2 | 2018-04-14 | 2 | 6 |
| cookie2 | 2018-04-15 | 4 | 4 |
+-----------+-------------+-----+------+--+