clickhouse 遇到AggregateFunction 怎么查询 clickhouse select

转载

mob64ca140e76c8 2024-03-18 11:34:53

文章标签 clickhouse sample array join prewhere limit by 文章分类 云原生云计算

1. select完整语法

[WITH expr |(subquery)]
SELECT [DISTINCT] expr
[FROM [db.]table | (subquery) | table_function] [FINAL]
[SAMPLE expr]
[[LEFT] ARRAY JOIN]
[GLOBAL] [ALL|ANY|ASOF] [INNER | CROSS | [LEFT|RIGHT|FULL [OUTER]] ] JOIN
(subquery)|table ON|USING columns_list
[PREWHERE expr]
[WHERE expr]
[GROUP BY expr] [WITH ROLLUP|CUBE|TOTALS]
[HAVING expr]
[ORDER BY expr]
[LIMIT [n[,m]]
[UNION ALL]
[INTO OUTFILE filename]
[FORMAT format]
[LIMIT [offset] n BY columns]

语法的解析大致是按此顺序进行的

1.1 准备数据

clickhouse1 :) 
clickhouse1 :) create table select_table_test(
:-] id UInt32,
:-] name String,
:-] age UInt8,
:-] city String,
:-] score Float64
:-] ) engine = MergeTree()
:-] order by (id, intHash32(id))
:-] sample by intHash32(id);
clickhouse1 :) 
clickhouse1 :) insert into select_table_test(id, name, age, city, score) values(1, 'Zhangsan', 30, 'Beijing', 70);
clickhouse1 :) insert into select_table_test(id, name, age, city, score) values(2, 'Lisi', 40, 'Shanghai', 80);
clickhouse1 :) insert into select_table_test(id, name, age, city, score) values(3, 'Wangwu', 50, 'Guangzhou', 90);
clickhouse1 :)
clickhouse1 :) select * from select_table_test;
┌─id─┬─name─────┬─age─┬─city────┬─score─┐
│  1 │ Zhangsan │  30 │ Beijing │    70 │
└────┴──────────┴─────┴─────────┴───────┘
┌─id─┬─name─┬─age─┬─city─────┬─score─┐
│  2 │ Lisi │  40 │ Shanghai │    80 │
└────┴──────┴─────┴──────────┴───────┘
┌─id─┬─name───┬─age─┬─city──────┬─score─┐
│  3 │ Wangwu │  50 │ Guangzhou │    90 │
└────┴────────┴─────┴───────────┴───────┘

2. with

clickhouse1 :) 
clickhouse1 :) with 40 as fit_age
:-] select * from select_table_test where age = fit_age;
┌─id─┬─name─┬─age─┬─city─────┬─score─┐
│  2 │ Lisi │  40 │ Shanghai │    80 │
└────┴──────┴─────┴──────────┴───────┘
clickhouse1 :) 
clickhouse1 :) with sum(score) as sum_score
:-] select city, toString(sum_score), toUInt16(sum_score) from select_table_test group by city;
┌─city──────┬─toString(sum_score)─┬─toUInt16(sum_score)─┐
│ Shanghai  │ 80                  │                  80 │
│ Beijing   │ 70                  │                  70 │
│ Guangzhou │ 90                  │                  90 │
└───────────┴─────────────────────┴─────────────────────┘
clickhouse1 :) 
clickhouse1 :) with (
:-] select age from select_table_test where city = 'Beijing'
:-] ) as fit_age 
:-] select * from select_table_test where age = fit_age;
┌─id─┬─name─────┬─age─┬─city────┬─score─┐
│  1 │ Zhangsan │  30 │ Beijing │    70 │
└────┴──────────┴─────┴─────────┴───────┘
clickhouse1 :)

也可在from子句中的select查询使用with

3. from

clickhouse1 :)
clickhouse1 :) select number from numbers(3);
┌─number─┐
│      0 │
│      1 │
│      2 │
└────────┘
clickhouse1 :) 
clickhouse1 :) select 10 from system.one;
┌─10─┐
│ 10 │
└────┘
clickhouse1 :) 
clickhouse1 :) select 10;
┌─10─┐
│ 10 │
└────┘
clickhouse1 :)

select 10会从虚拟表system.one查询
from后的final关键字，表示先强制partition合并，再得到结果

4. sample

在数据相同，采样规则相同，采样结果是一样的
create table的sample by int_type必须整数类型

sample 0 ~ 1

clickhouse1 :)
clickhouse1 :) select name, _sample_factor from select_table_test sample 0.4;
┌─name─────┬─_sample_factor─┐
│ Zhangsan │            2.5 │
└──────────┴────────────────┘
┌─name───┬─_sample_factor─┐
│ Wangwu │            2.5 │
└────────┴────────────────┘
clickhouse1 :) 
clickhouse1 :) select count(*) * any(_sample_factor) from select_table_test sample 4 / 10;
┌─multiply(count(), any(_sample_factor))─┐
│                                      5 │
└────────────────────────────────────────┘
clickhouse1 :)

采样因子等于0时，表示不进行采用，返回所有数据

sample 0 ~ 1 offset 0 ~ 1

clickhouse1 :)
clickhouse1 :) select * from select_table_test sample 0.3 offset 0.1;
┌─id─┬─name─────┬─age─┬─city────┬─score─┐
│  1 │ Zhangsan │  30 │ Beijing │    70 │
└────┴──────────┴─────┴─────────┴───────┘
┌─id─┬─name───┬─age─┬─city──────┬─score─┐
│  3 │ Wangwu │  50 │ Guangzhou │    90 │
└────┴────────┴─────┴───────────┴───────┘
clickhouse1 :)

假如共有10条数据，offset为0.8，sample为0.3；则会从数据的8/10(第9条)开始进行采样，采走3(10 * 0.3)条数据，但实际剩余只有2条数据，最终采样结果为2条数据

sample rows

clickhouse1 :)
clickhouse1 :) select * from select_table_test sample 2;
┌─id─┬─name─────┬─age─┬─city────┬─score─┐
│  1 │ Zhangsan │  30 │ Beijing │    70 │
└────┴──────────┴─────┴─────────┴───────┘
┌─id─┬─name─┬─age─┬─city─────┬─score─┐
│  2 │ Lisi │  40 │ Shanghai │    80 │
└────┴──────┴─────┴──────────┴───────┘
┌─id─┬─name───┬─age─┬─city──────┬─score─┐
│  3 │ Wangwu │  50 │ Guangzhou │    90 │
└────┴────────┴─────┴───────────┴───────┘
clickhouse1 :)

表示采样后有rows行数据
采样的rows参数必须大于1，当rows参数设置的比index_granularity索引粒度值小的多时，也会返回近似index_granularity条数据

5. array join

表A array join 表A的数组Array或嵌套类型Nested字段，将一行数据打平至多行
在一条select语句中，只能由一个array join(from子句除外)
可以直接array join nested_parent_column as n

准备数据

clickhouse1 :)
clickhouse1 :) create table array_join_table(
:-] fruit String,
:-] values Array(UInt8),
:-] city String,
:-] values2 Array(UInt8)
:-] ) engine = TinyLog;
clickhouse1 :) 
clickhouse1 :) insert into array_join_table(fruit, values, city, values2) values('banana', [1,2,3], 'Hainan', [18,19]);
clickhouse1 :) insert into array_join_table(fruit, values, city, values2) values('orange', [66,88], 'Hunan', [6,7,8,9]);
clickhouse1 :) insert into array_join_table(fruit, values, city, values2) values('apple',[], 'Shangxi', [11,12,13]);
clickhouse1 :) 
clickhouse1 :) select * from array_join_table;
┌─fruit──┬─values──┬─city────┬─values2────┐
│ banana │ [1,2,3] │ Hainan  │ [18,19]    │
│ orange │ [66,88] │ Hunan   │ [6,7,8,9]  │
│ apple  │ []      │ Shangxi │ [11,12,13] │
└────────┴─────────┴─────────┴────────────┘
clickhouse1 :)

inner array join(默认)

clickhouse1 :) 
clickhouse1 :) select fruit, values from array_join_table array join values;
┌─fruit──┬─values─┐
│ banana │      1 │
│ banana │      2 │
│ banana │      3 │
│ orange │     66 │
│ orange │     88 │
└────────┴────────┘
clickhouse1 :) 
clickhouse1 :) select fruit, values, v from array_join_table array join values as v;
┌─fruit──┬─values──┬──v─┐
│ banana │ [1,2,3] │  1 │
│ banana │ [1,2,3] │  2 │
│ banana │ [1,2,3] │  3 │
│ orange │ [66,88] │ 66 │
│ orange │ [66,88] │ 88 │
└────────┴─────────┴────┘
clickhouse1 :)

排除掉了空数组

left array join

clickhouse1 :) 
clickhouse1 :) select fruit, values, v from array_join_table left array join values as v;
┌─fruit──┬─values──┬──v─┐
│ banana │ [1,2,3] │  1 │
│ banana │ [1,2,3] │  2 │
│ banana │ [1,2,3] │  3 │
│ orange │ [66,88] │ 66 │
│ orange │ [66,88] │ 88 │
│ apple  │ []      │  0 │
└────────┴─────────┴────┘
clickhouse1 :) 
clickhouse1 :) select fruit, values, v, arrayMap(x -> x * 2, values) as map_values, m_v from array_join_table left array join values as v, map_values as m_v;
┌─fruit──┬─values──┬──v─┬─map_values─┬─m_v─┐
│ banana │ [1,2,3] │  1 │ [2,4,6]    │   2 │
│ banana │ [1,2,3] │  2 │ [2,4,6]    │   4 │
│ banana │ [1,2,3] │  3 │ [2,4,6]    │   6 │
│ orange │ [66,88] │ 66 │ [132,176]  │ 132 │
│ orange │ [66,88] │ 88 │ [132,176]  │ 176 │
│ apple  │ []      │  0 │ []         │   0 │
└────────┴─────────┴────┴────────────┴─────┘
clickhouse1 :)

数组为空的出现在结果中
与多个数组字段进行array join时，在一行数据中多个数组的元素个数要相同
，也是将一行数据打平至多行

6. join

6.1 数据准备

join_table_test1

clickhouse1 :) 
clickhouse1 :) create table join_table_test1(
:-] id UInt32,
:-] name String,
:-] datetime Datetime
:-] ) engine = Memory();
clickhouse1 :) 
clickhouse1 :) insert into join_table_test1(id, name, datetime) values(1, 'clickhouse', '2021-08-10 11:00:00');
clickhouse1 :) insert into join_table_test1(id, name, datetime) values(2, 'spark', '2021-08-10 12:01:00');
clickhouse1 :) insert into join_table_test1(id, name, datetime) values(2, 'flink', '2021-08-10 12:02:00');
clickhouse1 :) 
clickhouse1 :) select * from join_table_test1;
┌─id─┬─name───────┬────────────datetime─┐
│  1 │ clickhouse │ 2021-08-10 11:00:00 │
└────┴────────────┴─────────────────────┘
┌─id─┬─name──┬────────────datetime─┐
│  2 │ spark │ 2021-08-10 12:01:00 │
└────┴───────┴─────────────────────┘
┌─id─┬─name──┬────────────datetime─┐
│  2 │ flink │ 2021-08-10 12:02:00 │
└────┴───────┴─────────────────────┘
clickhouse1 :)

join_table_test2

clickhouse1 :) 
clickhouse1 :) create table join_table_test2(
:-] id UInt32,
:-] rate UInt8,
:-] datetime Datetime
:-] ) engine = Memory();
clickhouse1 :) 
clickhouse1 :) insert into join_table_test2(id, rate, datetime) values(2, 100, '2021-08-10 11:58:00');
clickhouse1 :) insert into join_table_test2(id, rate, datetime) values(2, 90, '2021-08-10 11:59:00');
clickhouse1 :) insert into join_table_test2(id, rate, datetime) values(3, 80, '2021-08-10 13:00:00');
clickhouse1 :) 
clickhouse1 :) select * from join_table_test2;
┌─id─┬─rate─┬────────────datetime─┐
│  2 │  100 │ 2021-08-10 11:58:00 │
└────┴──────┴─────────────────────┘
┌─id─┬─rate─┬────────────datetime─┐
│  2 │   90 │ 2021-08-10 11:59:00 │
└────┴──────┴─────────────────────┘
┌─id─┬─rate─┬────────────datetime─┐
│  3 │   80 │ 2021-08-10 13:00:00 │
└────┴──────┴─────────────────────┘
clickhouse1 :)

join_table_test3

clickhouse1 :) 
clickhouse1 :) create table join_table_test3(
:-] id UInt32,
:-] star UInt32
:-] ) engine = Memory();
clickhouse1 :) 
clickhouse1 :) insert into join_table_test3(id, star) values(1, 1000);
clickhouse1 :) insert into join_table_test3(id, star) values(2, 900);
clickhouse1 :) 
clickhouse1 :) select * from join_table_test3;
┌─id─┬─star─┐
│  1 │ 1000 │
└────┴──────┘
┌─id─┬─star─┐
│  2 │  900 │
└────┴──────┘
clickhouse1 :)

6.2 连接精度all配合连接类型(inner、left、right、full)

默认连接精度，可通过join_default_strictness配置参数修改
基准表一行数据与非基准表多行数据匹配，返回非基准表多行数据

clickhouse1 :) 
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a all inner join join_table_test2 b on a.id = b.id;
┌─id─┬─name──┬─rate─┐
│  2 │ spark │  100 │
│  2 │ spark │   90 │
└────┴───────┴──────┘
┌─id─┬─name──┬─rate─┐
│  2 │ flink │  100 │
│  2 │ flink │   90 │
└────┴───────┴──────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a all left join join_table_test2 b on a.id = b.id;
┌─id─┬─name───────┬─rate─┐
│  1 │ clickhouse │    0 │
└────┴────────────┴──────┘
┌─id─┬─name──┬─rate─┐
│  2 │ spark │  100 │
│  2 │ spark │   90 │
└────┴───────┴──────┘
┌─id─┬─name──┬─rate─┐
│  2 │ flink │  100 │
│  2 │ flink │   90 │
└────┴───────┴──────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a all right join join_table_test2 b on a.id = b.id;
┌─id─┬─name──┬─rate─┐
│  2 │ spark │  100 │
│  2 │ spark │   90 │
└────┴───────┴──────┘
┌─id─┬─name──┬─rate─┐
│  2 │ flink │  100 │
│  2 │ flink │   90 │
└────┴───────┴──────┘
┌─id─┬─name─┬─rate─┐
│  0 │      │   80 │
└────┴──────┴──────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a all full join join_table_test2 b on a.id = b.id;
┌─id─┬─name───────┬─rate─┐
│  1 │ clickhouse │    0 │
└────┴────────────┴──────┘
┌─id─┬─name──┬─rate─┐
│  2 │ spark │  100 │
│  2 │ spark │   90 │
└────┴───────┴──────┘
┌─id─┬─name──┬─rate─┐
│  2 │ flink │  100 │
│  2 │ flink │   90 │
└────┴───────┴──────┘
┌─id─┬─name─┬─rate─┐
│  0 │      │   80 │
└────┴──────┴──────┘
clickhouse1 :)

6.3 连接精度any配合连接类型(inner、left、right)

基准表一行数据与非基准表多行数据匹配，返回非基准表第一行数据
注意inner join时，只返回了一行数据

clickhouse1 :) 
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a any inner join join_table_test2 b on a.id = b.id;
┌─id─┬─name──┬─rate─┐
│  2 │ spark │  100 │
└────┴───────┴──────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a any left join join_table_test2 b on a.id = b.id;
┌─id─┬─name───────┬─rate─┐
│  1 │ clickhouse │    0 │
└────┴────────────┴──────┘
┌─id─┬─name──┬─rate─┐
│  2 │ spark │  100 │
└────┴───────┴──────┘
┌─id─┬─name──┬─rate─┐
│  2 │ flink │  100 │
└────┴───────┴──────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a any right join join_table_test2 b on a.id = b.id;
┌─id─┬─name──┬─rate─┐
│  2 │ spark │  100 │
│  2 │ spark │   90 │
└────┴───────┴──────┘
┌─id─┬─name─┬─rate─┐
│  0 │      │   80 │
└────┴──────┴──────┘
clickhouse1 :)

6.4 连接精度asof配合连接类型(inner、left)

join key一般都是等于(join key1 = join key2)匹配，asof允许定义一个或多个等于(join key1 = join key2)匹配 + 一个不等于(join key1 >、>=、<、<= join key2)匹配
join的步骤

join key等于匹配按all的连接精度关联后
再用join key不等于匹配取最近的一条；使用using时，最后一个字段为asof column，表示匹配右表asof column与左表asof column最近的值

asof column必须是整形、浮点型、日期型这种有序数据类型

clickhouse1 :) 
clickhouse1 :) select a.id, a.name, b.rate, a.datetime, b.datetime from join_table_test1 a asof inner join join_table_test2 b on a.id = b.id and a.datetime > b.datetime;
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│  2 │ spark │   90 │ 2021-08-10 12:01:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│  2 │ flink │   90 │ 2021-08-10 12:02:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate, a.datetime, b.datetime from join_table_test1 a asof left join join_table_test2 b on a.id = b.id and a.datetime > b.datetime;
┌─id─┬─name───────┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│  1 │ clickhouse │    0 │ 2021-08-10 11:00:00 │ 1970-01-01 08:00:00 │
└────┴────────────┴──────┴─────────────────────┴─────────────────────┘
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│  2 │ spark │   90 │ 2021-08-10 12:01:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│  2 │ flink │   90 │ 2021-08-10 12:02:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate, a.datetime, b.datetime from join_table_test1 a asof inner join join_table_test2 b using(id, datetime);
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│  2 │ spark │   90 │ 2021-08-10 12:01:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│  2 │ flink │   90 │ 2021-08-10 12:02:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
clickhouse1 :) 
clickhouse1 :) select a.id, a.name, b.rate, a.datetime, b.datetime from join_table_test1 a asof left join join_table_test2 b using(id, datetime);
┌─id─┬─name───────┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│  1 │ clickhouse │    0 │ 2021-08-10 11:00:00 │ 1970-01-01 08:00:00 │
└────┴────────────┴──────┴─────────────────────┴─────────────────────┘
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│  2 │ spark │   90 │ 2021-08-10 12:01:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│  2 │ flink │   90 │ 2021-08-10 12:02:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
clickhouse1 :)

6.5 连接类型cross join

clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a cross join join_table_test2 b;
┌─id─┬─name───────┬─rate─┐
│  1 │ clickhouse │  100 │
│  1 │ clickhouse │   90 │
│  1 │ clickhouse │   80 │
└────┴────────────┴──────┘
┌─id─┬─name──┬─rate─┐
│  2 │ spark │  100 │
│  2 │ spark │   90 │
│  2 │ spark │   80 │
└────┴───────┴──────┘
┌─id─┬─name──┬─rate─┐
│  2 │ flink │  100 │
│  2 │ flink │   90 │
│  2 │ flink │   80 │
└────┴───────┴──────┘
clickhouse1 :)

6.6 多表join、join的注意事项

多表join

对于3个表，先将前2个表进行join，再将它们的结果集与第3个表进行join; 其它情形以此类推

性能问题

join时请遵循左大右小，因为会将右表加载至内存，与左表关联
可以考虑使用join表引擎或字典表提示join性能

空值策略

默认是使用数据类型的默认值填充，可以将join_use_nulls参数设置为1，用Null值填充

7. where和prewhere

会根据where条件字段，判定select是否启用了索引
prewhere只能用于mergeTree系列，先对条件字段进行过滤，再对符合的行，进行select其它字段
当使用where的时候，clickhouse会根据where的条件字段，判断是否自动启用prewhere；但下列几种不会自动启用prewhere

常量表达式where 1 = 1
where a = 3, a字段为primary key或使用了alias默认值
select a ...... where a = 3, select的字段与where字段相同
...... from tb array join nest_field where nest_field.col1 = 10, select查询包含了array join、global in、global not in、 indexHint(函数)

8. group by

准备数据

clickhouse1 :) create table group_table_test(
:-] province String,
:-] city String,
:-] season String,
:-] sales Float64
:-] ) engine = Memory();
clickhouse1 :) 
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '长沙', '春', 1100);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '长沙', '夏', 1200);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '长沙', '秋', 1300);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '长沙', '冬', 1400);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '株洲', '春', 2100);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '株洲', '夏', 2200);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '株洲', '秋', 2300);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '株洲', '冬', 2400);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '湘潭', '春', 3100);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '湘潭', '夏', 3200);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '湘潭', '秋', 3300);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '湘潭', '冬', 3400);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '广州', '春', 4100);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '广州', '夏', 4200);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '广州', '秋', 4300);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '广州', '冬', 4400);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '东莞', '春', 5100);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '东莞', '夏', 5200);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '东莞', '秋', 5300);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '东莞', '冬', 5400);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '深圳', '春', 6100);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '深圳', '夏', 6200);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '深圳', '秋', 6300);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '深圳', '冬', 6400);
clickhouse1 :)

with rollup

group by province, city with rollup: 从右向左进行上钻，形成组合[province, city]、[province]、[ ]，再对sum(sales)进行计算

clickhouse1 :) 
clickhouse1 :) select province, city, sum(sales) from group_table_test group by province, city with rollup order by province;
┌─province─┬─city─┬─sum(sales)─┐
│          │      │      90000 │
│ 广东     │      │      63000 │
│ 广东     │ 深圳 │      25000 │
│ 广东     │ 广州 │      17000 │
│ 广东     │ 东莞 │      21000 │
│ 湖南     │      │      27000 │
│ 湖南     │ 株洲 │       9000 │
│ 湖南     │ 湘潭 │      13000 │
│ 湖南     │ 长沙 │       5000 │
└──────────┴──────┴────────────┘
clickhouse1 :)

with cube

group by province, city with cube: 对所有维度进行组合，形成组合[province, city]、[province]、[city]、[ ]，再对sum(sales)进行计算

clickhouse1 :)
clickhouse1 :) select province, city, sum(sales) from group_table_test group by province, city with cube order by province, city;
┌─province─┬─city─┬─sum(sales)─┐
│          │      │      90000 │
│          │ 东莞 │      21000 │
│          │ 广州 │      17000 │
│          │ 株洲 │       9000 │
│          │ 深圳 │      25000 │
│          │ 湘潭 │      13000 │
│          │ 长沙 │       5000 │
│ 广东     │      │      63000 │
│ 广东     │ 东莞 │      21000 │
│ 广东     │ 广州 │      17000 │
│ 广东     │ 深圳 │      25000 │
│ 湖南     │      │      27000 │
│ 湖南     │ 株洲 │       9000 │
│ 湖南     │ 湘潭 │      13000 │
│ 湖南     │ 长沙 │       5000 │
└──────────┴──────┴────────────┘
clickhouse1 :)

with totals

对所有的数据进行sum(sales)计算

clickhouse1 :)
clickhouse1 :) select province, city, sum(sales) from group_table_test group by province, city with totals order by province, city;
┌─province─┬─city─┬─sum(sales)─┐
│ 广东     │ 东莞 │      21000 │
│ 广东     │ 广州 │      17000 │
│ 广东     │ 深圳 │      25000 │
│ 湖南     │ 株洲 │       9000 │
│ 湖南     │ 湘潭 │      13000 │
│ 湖南     │ 长沙 │       5000 │
└──────────┴──────┴────────────┘

Totals:
┌─province─┬─city─┬─sum(sales)─┐
│          │      │      90000 │
└──────────┴──────┴────────────┘
clickhouse1 :)

9. having

对group by聚合之后的结果，再次进行过滤

clickhouse1 :) 
clickhouse1 :) select province, city, sum(sales) total_sales from group_table_test group by province, city having total_sales > 10000 order by province, city;
┌─province─┬─city─┬─total_sales─┐
│ 广东     │ 东莞 │       21000 │
│ 广东     │ 广州 │       17000 │
│ 广东     │ 深圳 │       25000 │
│ 湖南     │ 湘潭 │       13000 │
└──────────┴──────┴─────────────┘
clickhouse1 :)

10. order by

不同partition的数据不能进行排序

nulls first

clickhouse1 :)
clickhouse1 :) with arrayJoin([30, null, 0/0, 50]) as v
:-] select v order by v nulls first;
┌────v─┐
│ ᴺᵁᴸᴸ │
│  nan │
│   30 │
│   50 │
└──────┘
clickhouse1 :)

nulls last

默认操作

clickhouse1 :) 
clickhouse1 :) with arrayJoin([30, null, 0/0, 50]) as v
:-] select v order by v nulls last;
┌────v─┐
│   30 │
│   50 │
│  nan │
│ ᴺᵁᴸᴸ │
└──────┘
clickhouse1 :)

11. limit by

clickhouse1 :)
clickhouse1 :) select province, city, season, sales from group_table_test limit 1 offset 1 by province, city;
┌─province─┬─city─┬─season─┬─sales─┐
│ 湖南     │ 长沙 │ 夏     │  1200 │
└──────────┴──────┴────────┴───────┘
┌─province─┬─city─┬─season─┬─sales─┐
│ 湖南     │ 株洲 │ 夏     │  2200 │
└──────────┴──────┴────────┴───────┘
┌─province─┬─city─┬─season─┬─sales─┐
│ 湖南     │ 湘潭 │ 夏     │  3200 │
└──────────┴──────┴────────┴───────┘
┌─province─┬─city─┬─season─┬─sales─┐
│ 广东     │ 广州 │ 夏     │  4200 │
└──────────┴──────┴────────┴───────┘
┌─province─┬─city─┬─season─┬─sales─┐
│ 广东     │ 东莞 │ 夏     │  5200 │
└──────────┴──────┴────────┴───────┘
┌─province─┬─city─┬─season─┬─sales─┐
│ 广东     │ 深圳 │ 夏     │  6200 │
└──────────┴──────┴────────┴───────┘
clickhouse1 :)

执行于order by之后和limit之前
按province、city进行分组，在每组中，跳过1条数据，取1条数据
简写形式：limit 1, 1 by province, city

12. limit

clickhouse1 :)
clickhouse1 :) select * from select_table_test limit 1 offset 1;
┌─id─┬─name─┬─age─┬─city─────┬─score─┐
│  2 │ Lisi │  40 │ Shanghai │    80 │
└────┴──────┴─────┴──────────┴───────┘
clickhouse1 :)
clickhouse1 :) select * from select_table_test limit 1, 1;
┌─id─┬─name─┬─age─┬─city─────┬─score─┐
│  2 │ Lisi │  40 │ Shanghai │    80 │
└────┴──────┴─────┴──────────┴───────┘
clickhouse1 :)

如结果数据位于多个partition, 且未使用order by, 每次limit返回的数据可能不同

13. select

正则匹配查询字段

clickhouse1 :)
clickhouse1 :) select columns('^i'), columns('a') from select_table_test;
┌─id─┬─name─────┬─age─┐
│  1 │ Zhangsan │  30 │
│  2 │ Lisi     │  40 │
│  3 │ Wangwu   │  50 │
└────┴──────────┴─────┘
clickhouse1 :)

返回以i开头的字段，和包含a的字段

14. union all

clickhouse1 :) 
clickhouse1 :) select * from select_table_test where id = 1
:-] union all
:-] select * from select_table_test where id = 2;
┌─id─┬─name─────┬─age─┬─city────┬─score─┐
│  1 │ Zhangsan │  30 │ Beijing │    70 │
└────┴──────────┴─────┴─────────┴───────┘
┌─id─┬─name─┬─age─┬─city─────┬─score─┐
│  2 │ Lisi │  40 │ Shanghai │    80 │
└────┴──────┴─────┴──────────┴───────┘
clickhouse1 :)

union all后的字段名称以左表的为准

15. 执行计划

clickhouse1 :) 
clickhouse1 :) explain
:-] with 30 as fit_age
:-] select age, count() num from select_table_test
:-] where age = fit_age
:-] group by age 
:-] having num = 1 
:-] order by num 
:-] limit 1;
┌─explain─────────────────────────────────────────────────────────────────────────────────────┐
│ Expression (Projection)                                                                     │
│   Limit (preliminary LIMIT)                                                                 │
│     MergingSorted (Merge sorted streams for ORDER BY)                                       │
│       MergeSorting (Merge sorted blocks for ORDER BY)                                       │
│         PartialSorting (Sort each block for ORDER BY)                                       │
│           Expression (Before ORDER BY)                                                      │
│             Filter (HAVING)                                                                 │
│               Aggregating                                                                   │
│                 Expression (Before GROUP BY)                                                │
│                   Filter (WHERE)                                                            │
│                     SettingQuotaAndLimits (Set limits and quota after reading from storage) │
│                       ReadFromMergeTree                                                     │
└─────────────────────────────────────────────────────────────────────────────────────────────┘
clickhouse1 :)

也可以通过查看执行日志来查看具体的信息：

[root@clickhouse1 ~]# 
[root@clickhouse1 ~]# clickhouse-client -u default --password default123 --send_logs_level=trace <<< 'select * from select_table_test' > /dev/null
[clickhouse1] 2021.08.09 18:20:59.417657 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Debug> executeQuery: (from [::1]:39664, using production parser) select * from select_table_test 
[clickhouse1] 2021.08.09 18:20:59.418669 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Trace> ContextAccess (default): Access granted: SELECT(id, name, age, city, score) ON default.select_table_test
[clickhouse1] 2021.08.09 18:20:59.418786 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Debug> default.select_table_test (0d5feec0-6dbe-4e23-8d5f-eec06dbe4e23) (SelectExecutor): Key condition: unknown
[clickhouse1] 2021.08.09 18:20:59.418808 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Debug> default.select_table_test (0d5feec0-6dbe-4e23-8d5f-eec06dbe4e23) (SelectExecutor): Selected 1/1 parts by partition key, 1 parts by primary key, 1/1 marks by primary key, 1 marks to read from 1 ranges
[clickhouse1] 2021.08.09 18:20:59.418867 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
[clickhouse1] 2021.08.09 18:20:59.418954 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Debug> MergeTreeSelectProcessor: Reading 1 ranges from part all_1_3_1, approx. 3 rows starting from 0
[clickhouse1] 2021.08.09 18:20:59.420643 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Information> executeQuery: Read 3 rows, 135.00 B in 0.0028947 sec., 1036 rows/sec., 45.54 KiB/sec.
[clickhouse1] 2021.08.09 18:20:59.420674 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
[root@clickhouse1 ~]#

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。