目录

  • mysql语法
  • 数据准备
  • 1.聚合函数(分组函数)
  • 1.聚合统计逻辑
  • 2.函数使用
  • 2.开窗函数
  • 1.语法
  • 2.聚合函数:多行数据 按照一定规则 进行聚合 为一行
  • 3.内置窗口函数
  • 4.内置窗口函数
  • 1.取值 串行
  • 1.串行
  • 2.取值
  • 2.排序
  • 分组
  • 排序

mysql语法

数据准备

create table emp (
    empno numeric(4) not null,
    ename varchar(10),
    job varchar(9),
    mgr numeric(4),
    hiredate datetime,
    sal numeric(7, 2),
    comm numeric(7, 2),
    deptno numeric(2)
);

insert into emp values (7369, 'SMITH', 'CLERK', 7902, '1980-12-17', 800, null, 20);
insert into emp values (7499, 'ALLEN', 'SALESMAN', 7698, '1981-02-20', 1600, 300, 30);
insert into emp values (7521, 'WARD', 'SALESMAN', 7698, '1981-02-22', 1250, 500, 30);
insert into emp values (7566, 'JONES', 'MANAGER', 7839, '1981-04-02', 2975, null, 20);
insert into emp values (7654, 'MARTIN', 'SALESMAN', 7698, '1981-09-28', 1250, 1400, 30);
insert into emp values (7698, 'BLAKE', 'MANAGER', 7839, '1981-05-01', 2850, null, 30);
insert into emp values (7782, 'CLARK', 'MANAGER', 7839, '1981-06-09', 2450, null, 10);
insert into emp values (7788, 'SCOTT', 'ANALYST', 7566, '1982-12-09', 3000, null, 20);
insert into emp values (7839, 'KING', 'PRESIDENT', null, '1981-11-17', 5000, null, 10);
insert into emp values (7844, 'TURNER', 'SALESMAN', 7698, '1981-09-08', 1500, 0, 30);
insert into emp values (7876, 'ADAMS', 'CLERK', 7788, '1983-01-12', 1100, null, 20);
insert into emp values (7900, 'JAMES', 'CLERK', 7698, '1981-12-03', 950, null, 30);
insert into emp values (7902, 'FORD', 'ANALYST', 7566, '1981-12-03', 3000, null, 20);
insert into emp values (7934, 'MILLER', 'CLERK', 7782, '1982-01-23', 1300, null, 10);

1.聚合函数(分组函数)

1.聚合统计逻辑

聚合统计:
        group by => 分组
            xianyu,<1,a,xc,asd>
            lxy,<as,zxf,zxf,qwr,ags>
        聚合函数 => 指标
            xianyu,4
            lxy,5

2.函数使用

group by =》 分组
聚合函数 =》 指标统计 sun avg max min count


需求:
    统计每个部门有多少个人?

    查什么?
        维度:部门
        指标:人数        

        select
        deptno,
        count(1) as cnt
        from emp
        group by deptno;

解释:
    count(1) 【1.代表 先放置一个假数,然后再查询】
        【2.理解为按照第几个字段进行查数】
    select
        select + 函数 => 可以校验函数是否存在

2.开窗函数

1.语法

窗口函数:
    窗口 + 函数
    窗口:函数运行时 计算数据集的范围
    函数:运行时的函数
        1.聚合函数
            sun avg max min count
        2.内置窗口函数
        
    语法结构:
        函数 over([partition by xxx,...] [order by xxx,...])
        over() 是以谁进行开窗【table or 数据集】
        partition by:以谁进行分组 【group by column】
        order by:以谁进行排序【column】

2.聚合函数:多行数据 按照一定规则 进行聚合 为一行

sum avg max...
    理论上:聚合后的行数 <= 聚合前的行数 【主要是看维度选取 group by 里面的字段】

    需求:
        既要显示 聚合前的数据 又要显示 聚合后的数据 ?

        id name sal   dt        sal_all
        1   zs  1000 2022-4     1000
        2   ls  2000 2022-4     2000
        3   wu  3000 2022-4     3000
        4   zs  1000 2022-5     2000
        5   ls  2000 2022-5     4000
        6   wu  3000 2022-5     6000

数据:
服务器 每天的启动 次数
linux01,2022-04-15,1
linux01,2022-04-16,5
linux01,2022-04-17,7
linux01,2022-04-18,2
linux01,2022-04-19,3
linux01,2022-04-20,10
linux01,2022-04-21,4

统计累计问题:
    创建表
        create table window01(
            name varchar(50),
            dt varchar(20),
            cnt int
        );
    插入数据
        insert into window01 values("linux01","2022-04-15",1);
        insert into window01 values("linux01","2022-04-16",5);
        insert into window01 values("linux01","2022-04-17",7);
        insert into window01 values("linux01","2022-04-18",2);
        insert into window01 values("linux01","2022-04-19",3);
        insert into window01 values("linux01","2022-04-20",10);
        insert into window01 values("linux01","2022-04-21",4);


        insert into window01 values("linux02","2022-04-18",20);
        insert into window01 values("linux02","2022-04-19",30);
        insert into window01 values("linux02","2022-04-20",10);
        insert into window01 values("linux02","2022-04-21",40);


    需求:
        每个服务器 每天 累积启动次数
        select
        name,
        dt,
        cnt,
        sum(cnt) over(partition by name order by dt) as cut_all
        from window01;

        +---------+------------+------+---------+
        | name    | dt         | cnt  | cut_all |
        +---------+------------+------+---------+
        | linux01 | 2022-04-15 |    1 |       1 |
        | linux01 | 2022-04-16 |    5 |       6 |
        | linux01 | 2022-04-17 |    7 |      13 |
        | linux01 | 2022-04-18 |    2 |      15 |
        | linux01 | 2022-04-19 |    3 |      18 |
        | linux01 | 2022-04-20 |   10 |      28 |
        | linux01 | 2022-04-21 |    4 |      32 |
        | linux02 | 2022-04-18 |   20 |      20 |
        | linux02 | 2022-04-19 |   30 |      50 |
        | linux02 | 2022-04-20 |   10 |      60 |
        | linux02 | 2022-04-21 |   40 |     100 |
        +---------+------------+------+---------+

        1 9 10 11 str 【字典序】
        1 10 11 9

         * 从1开始,1,2,3,4,5,6,7,8,9
         * 从10开始,1,10…19,2,3,4,5,6,7,8,9 
         * 从20开始,1,10…19,2,20…29,3,4,5,6,7,8,9
         * 以此类推,所有的10位数,都插入到与他们十位数位置上相等的个位数后面。

3.内置窗口函数

窗口大小 xxx between xxx and xxx

参数
(ROWS | RANGE) BETWEEN (UNBOUNDED | [num]) PRECEDING AND ([num] PRECEDING | CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN CURRENT ROW AND (CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN [num] FOLLOWING AND (UNBOUNDED | [num]) FOLLOWING

select
name,
dt,
cnt,
sum(cnt) over(partition by name order by dt) as cut_all,
-- 无边界
sum(cnt) over(partition by name order by dt rows between unbounded preceding and current row) as cut_all2,
-- 前三行 + 当前行
sum(cnt) over(partition by name order by dt rows between 3 preceding and current row) as cut_all3,
-- 前三行 + 当前行 + 下一行
sum(cnt) over(partition by name order by dt rows between 3 preceding and 1 following) as cut_all4,
-- 上面无边界 + 下面无边界
sum(cnt) over(partition by name order by dt rows between unbounded preceding and UNBOUNDED FOLLOWING) as cut_all5
from window01;

select
name,
dt,
cnt,
-- 常规分组排序求加和
sum(cnt) over(partition by name order by dt) as cut_all,
-- 整张表对时间排序,然后加和,作用到整张表,理解为18号并列有两条数据
sum(cnt) over(order by dt) as cut_all2,
-- 对整张表进行加和
sum(cnt) over() as cut_all3,
-- 直接按照名字进分组
sum(cnt) over(partition by name) as cut_all4
from window01
order by dt;

1.partition by 不加 => 作用整张表


数仓顺序
    ods不动
    union all + group by select ifnull case when
    join
    group by
    grouping sets 【维度组合】

4.内置窗口函数

1.取值 串行

1.串行
LAG 【窗口内 向上 第n行的值 当前行向上取一行】
                LAG(column [, N[, default]])
                column => 列名
                n => 取几行
                default => 取不到给默认值
            LEAD 【窗口内 向下 第n行的值 当前行向下取一行】

            select
            name,
            dt,
            cnt,
            sum(cnt) over(partition by name order by dt) as cut_all,
            lead(dt,1,"9999-99-99") over(partition by name order by dt) as lead_alias,
            lead(dt,1,"9999-99-99") over(partition by name order by dt) as lag_alias
            from window01;
2.取值
FIRST_VALUE() : 取分组内排序后 截止到当前行 第一个值
            LAST_VALUE():取分组内排序后 截止到当前行 最后一个值

            select
            name,
            dt,
            cnt,
            first_value(cnt) over(partition by name order by dt) as f_value,
            last_value(cnt) over(partition by name order by dt) as l_value
            from window01;

2.排序

分组
ntile
            需求:
                把数据按照某个字段进行排序,把数据分成n份ntile(n)
                如果不能平均分配 优先分配到编号小的里面
            select
            name,
            dt,
            cnt,
            sum(cnt) over(partition by name order by dt) as cut_all,
            -- 平均分成n份,不能平均分,优先把多余的放到最小的里面
            ntile(2) over(partition by name order by dt) as n2,
            ntile(3) over(partition by name order by dt) as n3
            from window01
            order by dt;
排序
rank : 从1开始,按照排序 相同会重复,名次会留下空位 生成组内的记录编号
            row_number: 从1开始,按照排序 生成组内的记录编号
            dense_rank:从1开始,按照排序 相同会重复,名次不会留下空位 生成组内的记录编号

            select
            name,
            dt,
            cnt,
            sum(cnt) over(partition by name order by dt) as cut_all,
            rank() over(partition by name order by cnt desc) as rk,
            row_number() over(partition by name order by cnt desc) as rw,
            dense_rank() over(partition by name order by cnt desc) as d_rk
            from window01;