目录
- 1. select完整语法
- 1.1 准备数据
- 2. with
- 3. from
- 4. sample
- 5. array join
- 6. join
- 6.1 数据准备
- 6.2 连接精度all配合连接类型(inner、left、right、full)
- 6.3 连接精度any配合连接类型(inner、left、right)
- 6.4 连接精度asof配合连接类型(inner、left)
- 6.5 连接类型cross join
- 6.6 多表join、join的注意事项
- 7. where和prewhere
- 8. group by
- 9. having
- 10. order by
- 11. limit by
- 12. limit
- 13. select
- 14. union all
- 15. 执行计划
1. select完整语法
[WITH expr |(subquery)]
SELECT [DISTINCT] expr
[FROM [db.]table | (subquery) | table_function] [FINAL]
[SAMPLE expr]
[[LEFT] ARRAY JOIN]
[GLOBAL] [ALL|ANY|ASOF] [INNER | CROSS | [LEFT|RIGHT|FULL [OUTER]] ] JOIN
(subquery)|table ON|USING columns_list
[PREWHERE expr]
[WHERE expr]
[GROUP BY expr] [WITH ROLLUP|CUBE|TOTALS]
[HAVING expr]
[ORDER BY expr]
[LIMIT [n[,m]]
[UNION ALL]
[INTO OUTFILE filename]
[FORMAT format]
[LIMIT [offset] n BY columns]
语法的解析大致是按此顺序进行的
1.1 准备数据
clickhouse1 :)
clickhouse1 :) create table select_table_test(
:-] id UInt32,
:-] name String,
:-] age UInt8,
:-] city String,
:-] score Float64
:-] ) engine = MergeTree()
:-] order by (id, intHash32(id))
:-] sample by intHash32(id);
clickhouse1 :)
clickhouse1 :) insert into select_table_test(id, name, age, city, score) values(1, 'Zhangsan', 30, 'Beijing', 70);
clickhouse1 :) insert into select_table_test(id, name, age, city, score) values(2, 'Lisi', 40, 'Shanghai', 80);
clickhouse1 :) insert into select_table_test(id, name, age, city, score) values(3, 'Wangwu', 50, 'Guangzhou', 90);
clickhouse1 :)
clickhouse1 :) select * from select_table_test;
┌─id─┬─name─────┬─age─┬─city────┬─score─┐
│ 1 │ Zhangsan │ 30 │ Beijing │ 70 │
└────┴──────────┴─────┴─────────┴───────┘
┌─id─┬─name─┬─age─┬─city─────┬─score─┐
│ 2 │ Lisi │ 40 │ Shanghai │ 80 │
└────┴──────┴─────┴──────────┴───────┘
┌─id─┬─name───┬─age─┬─city──────┬─score─┐
│ 3 │ Wangwu │ 50 │ Guangzhou │ 90 │
└────┴────────┴─────┴───────────┴───────┘
2. with
clickhouse1 :)
clickhouse1 :) with 40 as fit_age
:-] select * from select_table_test where age = fit_age;
┌─id─┬─name─┬─age─┬─city─────┬─score─┐
│ 2 │ Lisi │ 40 │ Shanghai │ 80 │
└────┴──────┴─────┴──────────┴───────┘
clickhouse1 :)
clickhouse1 :) with sum(score) as sum_score
:-] select city, toString(sum_score), toUInt16(sum_score) from select_table_test group by city;
┌─city──────┬─toString(sum_score)─┬─toUInt16(sum_score)─┐
│ Shanghai │ 80 │ 80 │
│ Beijing │ 70 │ 70 │
│ Guangzhou │ 90 │ 90 │
└───────────┴─────────────────────┴─────────────────────┘
clickhouse1 :)
clickhouse1 :) with (
:-] select age from select_table_test where city = 'Beijing'
:-] ) as fit_age
:-] select * from select_table_test where age = fit_age;
┌─id─┬─name─────┬─age─┬─city────┬─score─┐
│ 1 │ Zhangsan │ 30 │ Beijing │ 70 │
└────┴──────────┴─────┴─────────┴───────┘
clickhouse1 :)
- 也可在from子句中的select查询使用with
3. from
clickhouse1 :)
clickhouse1 :) select number from numbers(3);
┌─number─┐
│ 0 │
│ 1 │
│ 2 │
└────────┘
clickhouse1 :)
clickhouse1 :) select 10 from system.one;
┌─10─┐
│ 10 │
└────┘
clickhouse1 :)
clickhouse1 :) select 10;
┌─10─┐
│ 10 │
└────┘
clickhouse1 :)
-
select 10
会从虚拟表system.one查询 - from后的final关键字,表示先强制partition合并,再得到结果
4. sample
- 在数据相同,采样规则相同,采样结果是一样的
- create table的sample by int_type必须整数类型
- sample 0 ~ 1
clickhouse1 :)
clickhouse1 :) select name, _sample_factor from select_table_test sample 0.4;
┌─name─────┬─_sample_factor─┐
│ Zhangsan │ 2.5 │
└──────────┴────────────────┘
┌─name───┬─_sample_factor─┐
│ Wangwu │ 2.5 │
└────────┴────────────────┘
clickhouse1 :)
clickhouse1 :) select count(*) * any(_sample_factor) from select_table_test sample 4 / 10;
┌─multiply(count(), any(_sample_factor))─┐
│ 5 │
└────────────────────────────────────────┘
clickhouse1 :)
- 采样因子等于0时,表示不进行采用,返回所有数据
- sample 0 ~ 1 offset 0 ~ 1
clickhouse1 :)
clickhouse1 :) select * from select_table_test sample 0.3 offset 0.1;
┌─id─┬─name─────┬─age─┬─city────┬─score─┐
│ 1 │ Zhangsan │ 30 │ Beijing │ 70 │
└────┴──────────┴─────┴─────────┴───────┘
┌─id─┬─name───┬─age─┬─city──────┬─score─┐
│ 3 │ Wangwu │ 50 │ Guangzhou │ 90 │
└────┴────────┴─────┴───────────┴───────┘
clickhouse1 :)
- 假如共有10条数据,offset为0.8,sample为0.3;则会从数据的8/10(第9条)开始进行采样,采走3(10 * 0.3)条数据,但实际剩余只有2条数据,最终采样结果为2条数据
- sample rows
clickhouse1 :)
clickhouse1 :) select * from select_table_test sample 2;
┌─id─┬─name─────┬─age─┬─city────┬─score─┐
│ 1 │ Zhangsan │ 30 │ Beijing │ 70 │
└────┴──────────┴─────┴─────────┴───────┘
┌─id─┬─name─┬─age─┬─city─────┬─score─┐
│ 2 │ Lisi │ 40 │ Shanghai │ 80 │
└────┴──────┴─────┴──────────┴───────┘
┌─id─┬─name───┬─age─┬─city──────┬─score─┐
│ 3 │ Wangwu │ 50 │ Guangzhou │ 90 │
└────┴────────┴─────┴───────────┴───────┘
clickhouse1 :)
- 表示采样后有rows行数据
- 采样的rows参数必须大于1,当rows参数设置的比index_granularity索引粒度值小的多时,也会返回近似index_granularity条数据
5. array join
- 表A array join 表A的数组Array或嵌套类型Nested字段,将一行数据打平至多行
- 在一条select语句中,只能由一个array join(from子句除外)
- 可以直接
array join nested_parent_column as n
- 准备数据
clickhouse1 :)
clickhouse1 :) create table array_join_table(
:-] fruit String,
:-] values Array(UInt8),
:-] city String,
:-] values2 Array(UInt8)
:-] ) engine = TinyLog;
clickhouse1 :)
clickhouse1 :) insert into array_join_table(fruit, values, city, values2) values('banana', [1,2,3], 'Hainan', [18,19]);
clickhouse1 :) insert into array_join_table(fruit, values, city, values2) values('orange', [66,88], 'Hunan', [6,7,8,9]);
clickhouse1 :) insert into array_join_table(fruit, values, city, values2) values('apple',[], 'Shangxi', [11,12,13]);
clickhouse1 :)
clickhouse1 :) select * from array_join_table;
┌─fruit──┬─values──┬─city────┬─values2────┐
│ banana │ [1,2,3] │ Hainan │ [18,19] │
│ orange │ [66,88] │ Hunan │ [6,7,8,9] │
│ apple │ [] │ Shangxi │ [11,12,13] │
└────────┴─────────┴─────────┴────────────┘
clickhouse1 :)
- inner array join(默认)
clickhouse1 :)
clickhouse1 :) select fruit, values from array_join_table array join values;
┌─fruit──┬─values─┐
│ banana │ 1 │
│ banana │ 2 │
│ banana │ 3 │
│ orange │ 66 │
│ orange │ 88 │
└────────┴────────┘
clickhouse1 :)
clickhouse1 :) select fruit, values, v from array_join_table array join values as v;
┌─fruit──┬─values──┬──v─┐
│ banana │ [1,2,3] │ 1 │
│ banana │ [1,2,3] │ 2 │
│ banana │ [1,2,3] │ 3 │
│ orange │ [66,88] │ 66 │
│ orange │ [66,88] │ 88 │
└────────┴─────────┴────┘
clickhouse1 :)
- 排除掉了空数组
- left array join
clickhouse1 :)
clickhouse1 :) select fruit, values, v from array_join_table left array join values as v;
┌─fruit──┬─values──┬──v─┐
│ banana │ [1,2,3] │ 1 │
│ banana │ [1,2,3] │ 2 │
│ banana │ [1,2,3] │ 3 │
│ orange │ [66,88] │ 66 │
│ orange │ [66,88] │ 88 │
│ apple │ [] │ 0 │
└────────┴─────────┴────┘
clickhouse1 :)
clickhouse1 :) select fruit, values, v, arrayMap(x -> x * 2, values) as map_values, m_v from array_join_table left array join values as v, map_values as m_v;
┌─fruit──┬─values──┬──v─┬─map_values─┬─m_v─┐
│ banana │ [1,2,3] │ 1 │ [2,4,6] │ 2 │
│ banana │ [1,2,3] │ 2 │ [2,4,6] │ 4 │
│ banana │ [1,2,3] │ 3 │ [2,4,6] │ 6 │
│ orange │ [66,88] │ 66 │ [132,176] │ 132 │
│ orange │ [66,88] │ 88 │ [132,176] │ 176 │
│ apple │ [] │ 0 │ [] │ 0 │
└────────┴─────────┴────┴────────────┴─────┘
clickhouse1 :)
- 数组为空的出现在结果中
- 与多个数组字段进行array join时,在一行数据中多个数组的元素个数要相同
,也是将一行数据打平至多行
6. join
6.1 数据准备
join_table_test1
clickhouse1 :)
clickhouse1 :) create table join_table_test1(
:-] id UInt32,
:-] name String,
:-] datetime Datetime
:-] ) engine = Memory();
clickhouse1 :)
clickhouse1 :) insert into join_table_test1(id, name, datetime) values(1, 'clickhouse', '2021-08-10 11:00:00');
clickhouse1 :) insert into join_table_test1(id, name, datetime) values(2, 'spark', '2021-08-10 12:01:00');
clickhouse1 :) insert into join_table_test1(id, name, datetime) values(2, 'flink', '2021-08-10 12:02:00');
clickhouse1 :)
clickhouse1 :) select * from join_table_test1;
┌─id─┬─name───────┬────────────datetime─┐
│ 1 │ clickhouse │ 2021-08-10 11:00:00 │
└────┴────────────┴─────────────────────┘
┌─id─┬─name──┬────────────datetime─┐
│ 2 │ spark │ 2021-08-10 12:01:00 │
└────┴───────┴─────────────────────┘
┌─id─┬─name──┬────────────datetime─┐
│ 2 │ flink │ 2021-08-10 12:02:00 │
└────┴───────┴─────────────────────┘
clickhouse1 :)
join_table_test2
clickhouse1 :)
clickhouse1 :) create table join_table_test2(
:-] id UInt32,
:-] rate UInt8,
:-] datetime Datetime
:-] ) engine = Memory();
clickhouse1 :)
clickhouse1 :) insert into join_table_test2(id, rate, datetime) values(2, 100, '2021-08-10 11:58:00');
clickhouse1 :) insert into join_table_test2(id, rate, datetime) values(2, 90, '2021-08-10 11:59:00');
clickhouse1 :) insert into join_table_test2(id, rate, datetime) values(3, 80, '2021-08-10 13:00:00');
clickhouse1 :)
clickhouse1 :) select * from join_table_test2;
┌─id─┬─rate─┬────────────datetime─┐
│ 2 │ 100 │ 2021-08-10 11:58:00 │
└────┴──────┴─────────────────────┘
┌─id─┬─rate─┬────────────datetime─┐
│ 2 │ 90 │ 2021-08-10 11:59:00 │
└────┴──────┴─────────────────────┘
┌─id─┬─rate─┬────────────datetime─┐
│ 3 │ 80 │ 2021-08-10 13:00:00 │
└────┴──────┴─────────────────────┘
clickhouse1 :)
join_table_test3
clickhouse1 :)
clickhouse1 :) create table join_table_test3(
:-] id UInt32,
:-] star UInt32
:-] ) engine = Memory();
clickhouse1 :)
clickhouse1 :) insert into join_table_test3(id, star) values(1, 1000);
clickhouse1 :) insert into join_table_test3(id, star) values(2, 900);
clickhouse1 :)
clickhouse1 :) select * from join_table_test3;
┌─id─┬─star─┐
│ 1 │ 1000 │
└────┴──────┘
┌─id─┬─star─┐
│ 2 │ 900 │
└────┴──────┘
clickhouse1 :)
6.2 连接精度all配合连接类型(inner、left、right、full)
- 默认连接精度,可通过join_default_strictness配置参数修改
- 基准表一行数据与非基准表多行数据匹配,返回非基准表多行数据
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a all inner join join_table_test2 b on a.id = b.id;
┌─id─┬─name──┬─rate─┐
│ 2 │ spark │ 100 │
│ 2 │ spark │ 90 │
└────┴───────┴──────┘
┌─id─┬─name──┬─rate─┐
│ 2 │ flink │ 100 │
│ 2 │ flink │ 90 │
└────┴───────┴──────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a all left join join_table_test2 b on a.id = b.id;
┌─id─┬─name───────┬─rate─┐
│ 1 │ clickhouse │ 0 │
└────┴────────────┴──────┘
┌─id─┬─name──┬─rate─┐
│ 2 │ spark │ 100 │
│ 2 │ spark │ 90 │
└────┴───────┴──────┘
┌─id─┬─name──┬─rate─┐
│ 2 │ flink │ 100 │
│ 2 │ flink │ 90 │
└────┴───────┴──────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a all right join join_table_test2 b on a.id = b.id;
┌─id─┬─name──┬─rate─┐
│ 2 │ spark │ 100 │
│ 2 │ spark │ 90 │
└────┴───────┴──────┘
┌─id─┬─name──┬─rate─┐
│ 2 │ flink │ 100 │
│ 2 │ flink │ 90 │
└────┴───────┴──────┘
┌─id─┬─name─┬─rate─┐
│ 0 │ │ 80 │
└────┴──────┴──────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a all full join join_table_test2 b on a.id = b.id;
┌─id─┬─name───────┬─rate─┐
│ 1 │ clickhouse │ 0 │
└────┴────────────┴──────┘
┌─id─┬─name──┬─rate─┐
│ 2 │ spark │ 100 │
│ 2 │ spark │ 90 │
└────┴───────┴──────┘
┌─id─┬─name──┬─rate─┐
│ 2 │ flink │ 100 │
│ 2 │ flink │ 90 │
└────┴───────┴──────┘
┌─id─┬─name─┬─rate─┐
│ 0 │ │ 80 │
└────┴──────┴──────┘
clickhouse1 :)
6.3 连接精度any配合连接类型(inner、left、right)
- 基准表一行数据与非基准表多行数据匹配,返回非基准表第一行数据
- 注意inner join时,只返回了一行数据
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a any inner join join_table_test2 b on a.id = b.id;
┌─id─┬─name──┬─rate─┐
│ 2 │ spark │ 100 │
└────┴───────┴──────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a any left join join_table_test2 b on a.id = b.id;
┌─id─┬─name───────┬─rate─┐
│ 1 │ clickhouse │ 0 │
└────┴────────────┴──────┘
┌─id─┬─name──┬─rate─┐
│ 2 │ spark │ 100 │
└────┴───────┴──────┘
┌─id─┬─name──┬─rate─┐
│ 2 │ flink │ 100 │
└────┴───────┴──────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a any right join join_table_test2 b on a.id = b.id;
┌─id─┬─name──┬─rate─┐
│ 2 │ spark │ 100 │
│ 2 │ spark │ 90 │
└────┴───────┴──────┘
┌─id─┬─name─┬─rate─┐
│ 0 │ │ 80 │
└────┴──────┴──────┘
clickhouse1 :)
6.4 连接精度asof配合连接类型(inner、left)
- join key一般都是等于(join key1 = join key2)匹配,asof允许定义一个或多个等于(join key1 = join key2)匹配 + 一个不等于(join key1 >、>=、<、<= join key2)匹配
- join的步骤
- join key等于匹配按all的连接精度关联后
- 再用join key不等于匹配取最近的一条;使用using时,最后一个字段为asof column,表示匹配右表asof column与左表asof column最近的值
- asof column必须是整形、浮点型、日期型这种有序数据类型
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate, a.datetime, b.datetime from join_table_test1 a asof inner join join_table_test2 b on a.id = b.id and a.datetime > b.datetime;
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│ 2 │ spark │ 90 │ 2021-08-10 12:01:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│ 2 │ flink │ 90 │ 2021-08-10 12:02:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate, a.datetime, b.datetime from join_table_test1 a asof left join join_table_test2 b on a.id = b.id and a.datetime > b.datetime;
┌─id─┬─name───────┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│ 1 │ clickhouse │ 0 │ 2021-08-10 11:00:00 │ 1970-01-01 08:00:00 │
└────┴────────────┴──────┴─────────────────────┴─────────────────────┘
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│ 2 │ spark │ 90 │ 2021-08-10 12:01:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│ 2 │ flink │ 90 │ 2021-08-10 12:02:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate, a.datetime, b.datetime from join_table_test1 a asof inner join join_table_test2 b using(id, datetime);
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│ 2 │ spark │ 90 │ 2021-08-10 12:01:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│ 2 │ flink │ 90 │ 2021-08-10 12:02:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate, a.datetime, b.datetime from join_table_test1 a asof left join join_table_test2 b using(id, datetime);
┌─id─┬─name───────┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│ 1 │ clickhouse │ 0 │ 2021-08-10 11:00:00 │ 1970-01-01 08:00:00 │
└────┴────────────┴──────┴─────────────────────┴─────────────────────┘
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│ 2 │ spark │ 90 │ 2021-08-10 12:01:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
┌─id─┬─name──┬─rate─┬────────────datetime─┬──────────b.datetime─┐
│ 2 │ flink │ 90 │ 2021-08-10 12:02:00 │ 2021-08-10 11:59:00 │
└────┴───────┴──────┴─────────────────────┴─────────────────────┘
clickhouse1 :)
6.5 连接类型cross join
clickhouse1 :)
clickhouse1 :) select a.id, a.name, b.rate from join_table_test1 a cross join join_table_test2 b;
┌─id─┬─name───────┬─rate─┐
│ 1 │ clickhouse │ 100 │
│ 1 │ clickhouse │ 90 │
│ 1 │ clickhouse │ 80 │
└────┴────────────┴──────┘
┌─id─┬─name──┬─rate─┐
│ 2 │ spark │ 100 │
│ 2 │ spark │ 90 │
│ 2 │ spark │ 80 │
└────┴───────┴──────┘
┌─id─┬─name──┬─rate─┐
│ 2 │ flink │ 100 │
│ 2 │ flink │ 90 │
│ 2 │ flink │ 80 │
└────┴───────┴──────┘
clickhouse1 :)
6.6 多表join、join的注意事项
- 多表join
对于3个表,先将前2个表进行join,再将它们的结果集与第3个表进行join; 其它情形以此类推
- 性能问题
- join时请遵循左大右小,因为会将右表加载至内存,与左表关联
- 可以考虑使用join表引擎或字典表提示join性能
- 空值策略
- 默认是使用数据类型的默认值填充,可以将join_use_nulls参数设置为1,用Null值填充
7. where和prewhere
- 会根据where条件字段,判定select是否启用了索引
- prewhere只能用于mergeTree系列,先对条件字段进行过滤,再对符合的行,进行select其它字段
- 当使用where的时候,clickhouse会根据where的条件字段,判断是否自动启用prewhere;但下列几种不会自动启用prewhere
- 常量表达式
where 1 = 1
-
where a = 3
, a字段为primary key或使用了alias默认值 -
select a ...... where a = 3
, select的字段与where字段相同 -
...... from tb array join nest_field where nest_field.col1 = 10
, select查询包含了array join、global in、global not in、 indexHint(函数)
8. group by
- 准备数据
clickhouse1 :) create table group_table_test(
:-] province String,
:-] city String,
:-] season String,
:-] sales Float64
:-] ) engine = Memory();
clickhouse1 :)
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '长沙', '春', 1100);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '长沙', '夏', 1200);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '长沙', '秋', 1300);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '长沙', '冬', 1400);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '株洲', '春', 2100);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '株洲', '夏', 2200);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '株洲', '秋', 2300);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '株洲', '冬', 2400);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '湘潭', '春', 3100);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '湘潭', '夏', 3200);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '湘潭', '秋', 3300);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('湖南', '湘潭', '冬', 3400);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '广州', '春', 4100);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '广州', '夏', 4200);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '广州', '秋', 4300);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '广州', '冬', 4400);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '东莞', '春', 5100);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '东莞', '夏', 5200);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '东莞', '秋', 5300);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '东莞', '冬', 5400);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '深圳', '春', 6100);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '深圳', '夏', 6200);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '深圳', '秋', 6300);
clickhouse1 :) insert into group_table_test(province, city, season, sales) values('广东', '深圳', '冬', 6400);
clickhouse1 :)
- with rollup
-
group by province, city with rollup
: 从右向左进行上钻,形成组合[province, city]、[province]、[ ],再对sum(sales)进行计算
clickhouse1 :)
clickhouse1 :) select province, city, sum(sales) from group_table_test group by province, city with rollup order by province;
┌─province─┬─city─┬─sum(sales)─┐
│ │ │ 90000 │
│ 广东 │ │ 63000 │
│ 广东 │ 深圳 │ 25000 │
│ 广东 │ 广州 │ 17000 │
│ 广东 │ 东莞 │ 21000 │
│ 湖南 │ │ 27000 │
│ 湖南 │ 株洲 │ 9000 │
│ 湖南 │ 湘潭 │ 13000 │
│ 湖南 │ 长沙 │ 5000 │
└──────────┴──────┴────────────┘
clickhouse1 :)
- with cube
-
group by province, city with cube
: 对所有维度进行组合,形成组合[province, city]、[province]、[city]、[ ],再对sum(sales)进行计算
clickhouse1 :)
clickhouse1 :) select province, city, sum(sales) from group_table_test group by province, city with cube order by province, city;
┌─province─┬─city─┬─sum(sales)─┐
│ │ │ 90000 │
│ │ 东莞 │ 21000 │
│ │ 广州 │ 17000 │
│ │ 株洲 │ 9000 │
│ │ 深圳 │ 25000 │
│ │ 湘潭 │ 13000 │
│ │ 长沙 │ 5000 │
│ 广东 │ │ 63000 │
│ 广东 │ 东莞 │ 21000 │
│ 广东 │ 广州 │ 17000 │
│ 广东 │ 深圳 │ 25000 │
│ 湖南 │ │ 27000 │
│ 湖南 │ 株洲 │ 9000 │
│ 湖南 │ 湘潭 │ 13000 │
│ 湖南 │ 长沙 │ 5000 │
└──────────┴──────┴────────────┘
clickhouse1 :)
- with totals
- 对所有的数据进行sum(sales)计算
clickhouse1 :)
clickhouse1 :) select province, city, sum(sales) from group_table_test group by province, city with totals order by province, city;
┌─province─┬─city─┬─sum(sales)─┐
│ 广东 │ 东莞 │ 21000 │
│ 广东 │ 广州 │ 17000 │
│ 广东 │ 深圳 │ 25000 │
│ 湖南 │ 株洲 │ 9000 │
│ 湖南 │ 湘潭 │ 13000 │
│ 湖南 │ 长沙 │ 5000 │
└──────────┴──────┴────────────┘
Totals:
┌─province─┬─city─┬─sum(sales)─┐
│ │ │ 90000 │
└──────────┴──────┴────────────┘
clickhouse1 :)
9. having
对group by聚合之后的结果,再次进行过滤
clickhouse1 :)
clickhouse1 :) select province, city, sum(sales) total_sales from group_table_test group by province, city having total_sales > 10000 order by province, city;
┌─province─┬─city─┬─total_sales─┐
│ 广东 │ 东莞 │ 21000 │
│ 广东 │ 广州 │ 17000 │
│ 广东 │ 深圳 │ 25000 │
│ 湖南 │ 湘潭 │ 13000 │
└──────────┴──────┴─────────────┘
clickhouse1 :)
10. order by
- 不同partition的数据不能进行排序
- nulls first
clickhouse1 :)
clickhouse1 :) with arrayJoin([30, null, 0/0, 50]) as v
:-] select v order by v nulls first;
┌────v─┐
│ ᴺᵁᴸᴸ │
│ nan │
│ 30 │
│ 50 │
└──────┘
clickhouse1 :)
- nulls last
- 默认操作
clickhouse1 :)
clickhouse1 :) with arrayJoin([30, null, 0/0, 50]) as v
:-] select v order by v nulls last;
┌────v─┐
│ 30 │
│ 50 │
│ nan │
│ ᴺᵁᴸᴸ │
└──────┘
clickhouse1 :)
11. limit by
clickhouse1 :)
clickhouse1 :) select province, city, season, sales from group_table_test limit 1 offset 1 by province, city;
┌─province─┬─city─┬─season─┬─sales─┐
│ 湖南 │ 长沙 │ 夏 │ 1200 │
└──────────┴──────┴────────┴───────┘
┌─province─┬─city─┬─season─┬─sales─┐
│ 湖南 │ 株洲 │ 夏 │ 2200 │
└──────────┴──────┴────────┴───────┘
┌─province─┬─city─┬─season─┬─sales─┐
│ 湖南 │ 湘潭 │ 夏 │ 3200 │
└──────────┴──────┴────────┴───────┘
┌─province─┬─city─┬─season─┬─sales─┐
│ 广东 │ 广州 │ 夏 │ 4200 │
└──────────┴──────┴────────┴───────┘
┌─province─┬─city─┬─season─┬─sales─┐
│ 广东 │ 东莞 │ 夏 │ 5200 │
└──────────┴──────┴────────┴───────┘
┌─province─┬─city─┬─season─┬─sales─┐
│ 广东 │ 深圳 │ 夏 │ 6200 │
└──────────┴──────┴────────┴───────┘
clickhouse1 :)
- 执行于order by之后和limit之前
- 按province、city进行分组,在每组中,跳过1条数据,取1条数据
- 简写形式:
limit 1, 1 by province, city
12. limit
clickhouse1 :)
clickhouse1 :) select * from select_table_test limit 1 offset 1;
┌─id─┬─name─┬─age─┬─city─────┬─score─┐
│ 2 │ Lisi │ 40 │ Shanghai │ 80 │
└────┴──────┴─────┴──────────┴───────┘
clickhouse1 :)
clickhouse1 :) select * from select_table_test limit 1, 1;
┌─id─┬─name─┬─age─┬─city─────┬─score─┐
│ 2 │ Lisi │ 40 │ Shanghai │ 80 │
└────┴──────┴─────┴──────────┴───────┘
clickhouse1 :)
- 如结果数据位于多个partition, 且未使用order by, 每次limit返回的数据可能不同
13. select
正则匹配查询字段
clickhouse1 :)
clickhouse1 :) select columns('^i'), columns('a') from select_table_test;
┌─id─┬─name─────┬─age─┐
│ 1 │ Zhangsan │ 30 │
│ 2 │ Lisi │ 40 │
│ 3 │ Wangwu │ 50 │
└────┴──────────┴─────┘
clickhouse1 :)
- 返回以i开头的字段,和包含a的字段
14. union all
clickhouse1 :)
clickhouse1 :) select * from select_table_test where id = 1
:-] union all
:-] select * from select_table_test where id = 2;
┌─id─┬─name─────┬─age─┬─city────┬─score─┐
│ 1 │ Zhangsan │ 30 │ Beijing │ 70 │
└────┴──────────┴─────┴─────────┴───────┘
┌─id─┬─name─┬─age─┬─city─────┬─score─┐
│ 2 │ Lisi │ 40 │ Shanghai │ 80 │
└────┴──────┴─────┴──────────┴───────┘
clickhouse1 :)
- union all后的字段名称以左表的为准
15. 执行计划
clickhouse1 :)
clickhouse1 :) explain
:-] with 30 as fit_age
:-] select age, count() num from select_table_test
:-] where age = fit_age
:-] group by age
:-] having num = 1
:-] order by num
:-] limit 1;
┌─explain─────────────────────────────────────────────────────────────────────────────────────┐
│ Expression (Projection) │
│ Limit (preliminary LIMIT) │
│ MergingSorted (Merge sorted streams for ORDER BY) │
│ MergeSorting (Merge sorted blocks for ORDER BY) │
│ PartialSorting (Sort each block for ORDER BY) │
│ Expression (Before ORDER BY) │
│ Filter (HAVING) │
│ Aggregating │
│ Expression (Before GROUP BY) │
│ Filter (WHERE) │
│ SettingQuotaAndLimits (Set limits and quota after reading from storage) │
│ ReadFromMergeTree │
└─────────────────────────────────────────────────────────────────────────────────────────────┘
clickhouse1 :)
也可以通过查看执行日志来查看具体的信息:
[root@clickhouse1 ~]#
[root@clickhouse1 ~]# clickhouse-client -u default --password default123 --send_logs_level=trace <<< 'select * from select_table_test' > /dev/null
[clickhouse1] 2021.08.09 18:20:59.417657 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Debug> executeQuery: (from [::1]:39664, using production parser) select * from select_table_test
[clickhouse1] 2021.08.09 18:20:59.418669 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Trace> ContextAccess (default): Access granted: SELECT(id, name, age, city, score) ON default.select_table_test
[clickhouse1] 2021.08.09 18:20:59.418786 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Debug> default.select_table_test (0d5feec0-6dbe-4e23-8d5f-eec06dbe4e23) (SelectExecutor): Key condition: unknown
[clickhouse1] 2021.08.09 18:20:59.418808 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Debug> default.select_table_test (0d5feec0-6dbe-4e23-8d5f-eec06dbe4e23) (SelectExecutor): Selected 1/1 parts by partition key, 1 parts by primary key, 1/1 marks by primary key, 1 marks to read from 1 ranges
[clickhouse1] 2021.08.09 18:20:59.418867 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
[clickhouse1] 2021.08.09 18:20:59.418954 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Debug> MergeTreeSelectProcessor: Reading 1 ranges from part all_1_3_1, approx. 3 rows starting from 0
[clickhouse1] 2021.08.09 18:20:59.420643 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Information> executeQuery: Read 3 rows, 135.00 B in 0.0028947 sec., 1036 rows/sec., 45.54 KiB/sec.
[clickhouse1] 2021.08.09 18:20:59.420674 [ 1608 ] {9f16732f-dd99-454d-8878-f6be5bba8345} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
[root@clickhouse1 ~]#