数据库中的视图(View) 指的是通过一张或多张表查询出来的逻辑表 ,本身只是一段 SQL 的封装并不存储数据。
对于ClickHouse来说,视图分为普通视图和物化视图两类。其中普通视图只是一种简单查询映射,类似于linux的软连接;而物化视图数据独立存储,类似于linux的硬链接。
普通视图:不存储数据,仅存储指定的 SELECT 查询(基表的查询映射)
物化视图:数据独立存储(持久化),有表结构,有引擎
准备环境:
假设有一个表test来记录用户下载的信息,并且可以追踪用户每天下的信息:
clickhouse-client -m
create database test;
use test;
CREATE TABLE test (
when DateTime,
userid UInt32,
bytes Float32
) ENGINE=MergeTree
PARTITION BY toYYYYMM(when)
ORDER BY (userid, when);
INSERT INTO test
SELECT
now() - number * 60 as when,
25,
rand() % 100000000
FROM system.numbers
LIMIT 5000;
SELECT count(*)
FROM test;
┌─count()─┐
│ 5000 │
└─────────┘
SELECT *
FROM test
LIMIT 10
┌────────────────when─┬─userid─┬────bytes─┐
│ 2021-04-11 06:32:27 │ 25 │ 71684700 │
│ 2021-04-11 06:33:27 │ 25 │ 91794480 │
│ 2021-04-11 06:34:27 │ 25 │ 9945427 │
│ 2021-04-11 06:35:27 │ 25 │ 19805956 │
│ 2021-04-11 06:36:27 │ 25 │ 58563650 │
│ 2021-04-11 06:37:27 │ 25 │ 89094024 │
│ 2021-04-11 06:38:27 │ 25 │ 16388096 │
│ 2021-04-11 06:39:27 │ 25 │ 44223570 │
│ 2021-04-11 06:40:27 │ 25 │ 33782450 │
│ 2021-04-11 06:41:27 │ 25 │ 73357980 │
└─────────────────────┴────────┴──────────┘
举例:应用场景-求和
计算:每个用户每天下载的次数和流量:
SELECT
toStartOfDay(when) AS day,
userid,
count() as downloads,
sum(bytes) AS bytes
FROM test
GROUP BY userid, day
ORDER BY userid, day;
┌─────────────────day─┬─userid─┬─downloads─┬───────bytes─┐
│ 2021-04-11 00:00:00 │ 25 │ 1048 │ 51955725222 │
│ 2021-04-12 00:00:00 │ 25 │ 1440 │ 71811987127 │
│ 2021-04-13 00:00:00 │ 25 │ 1440 │ 71116881712 │
│ 2021-04-14 00:00:00 │ 25 │ 1072 │ 54787389274 │
└─────────────────────┴────────┴───────────┴─────────────┘
创建普通视图
对于普通视图每次运行查询来交互式地计算这些每日总计,对于大型表而言,会出现计算浪费资源,速度慢等问题。
CREATE VIEW test_view AS
SELECT
toStartOfDay(when) AS day,
userid,
count() AS downloads,
sum(bytes) AS bytes
FROM test
GROUP BY
userid,
day
ORDER BY
userid ASC,
day ASC;
SELECT *
FROM test_view;
┌─────────────────day─┬─userid─┬─downloads─┬───────bytes─┐
│ 2021-04-11 00:00:00 │ 25 │ 1048 │ 51955725222 │
│ 2021-04-12 00:00:00 │ 25 │ 1440 │ 71811987127 │
│ 2021-04-13 00:00:00 │ 25 │ 1440 │ 71116881712 │
│ 2021-04-14 00:00:00 │ 25 │ 1072 │ 54787389274 │
└─────────────────────┴────────┴───────────┴─────────────┘
创建物化视图
物化视图每次运行查询来交互式地这些每日总计,将结果放在单独的表格中,该表格可以连续跟踪每天每个用户的下载总数。
创建语法:
CREATE [MATERIALIZED] VIEW [IF NOT EXISTS] [db.]table_name [TO[db.]name]
[ENGINE = engine]
[POPULATE]
AS SELECT ...
CREATE MATERIALIZED VIEW test_mv
ENGINE = SummingMergeTree
PARTITION BY toYYYYMM(day) ORDER BY (userid, day)
POPULATE
AS SELECT
toStartOfDay(when) AS day,
userid,
count() as downloads,
sum(bytes) AS bytes
FROM test
GROUP BY userid, day;
SELECT *
FROM test_mv
ORDER BY
day,
userid
LIMIT 5;
┌─────────────────day─┬─userid─┬─downloads─┬───────bytes─┐
│ 2021-04-11 00:00:00 │ 25 │ 1048 │ 51955725222 │
│ 2021-04-12 00:00:00 │ 25 │ 1440 │ 71811987127 │
│ 2021-04-13 00:00:00 │ 25 │ 1440 │ 71116881712 │
│ 2021-04-14 00:00:00 │ 25 │ 1072 │ 54787389274 │
└─────────────────────┴────────┴───────────┴─────────────┘
增加物化视图后的架构如下图所示
物化视图说明:
1.使用了简化总和计数的ClickHouse引擎:SummingMergeTree。
2.视图定义包括关键字POPULATE,决定了物化视图的更新策略:
若有POPULATE 则在创建视图的过程会将源表已经存在的数据一并导入,类似于 create table … as
若无POPULATE 则物化视图在创建之后没有数据
3.视图定义包括SELECT语句,在加载视图时如何转换数据。该查询在表中的新数据上运行,以计算每天的下载量和每个用户ID的总字节数。可以跳过排序,因为视图定义已经确保了排序顺序。
模拟产生新的数据:
INSERT INTO test SELECT
now() - (number * 60) AS when,
22,
rand() % 100000000
FROM system.numbers
LIMIT 5000
SELECT
toStartOfMonth(day) AS month,
userid,
sum(downloads),
sum(bytes)
FROM test_mv
GROUP BY
userid,
month
WITH TOTALS
ORDER BY
userid ASC,
month ASC
┌──────month─┬─userid─┬─sum(downloads)─┬───sum(bytes)─┐
│ 2021-04-01 │ 22 │ 10000 │ 502557007867 │
│ 2021-04-01 │ 25 │ 10000 │ 499000207535 │
└────────────┴────────┴────────────────┴──────────────┘
SELECT *
FROM test_mv
ORDER BY
userid ASC,
day ASC
┌─────────────────day─┬─userid─┬─downloads─┬───────bytes─┐
│ 2021-04-11 00:00:00 │ 22 │ 1025 │ 52647926147 │
│ 2021-04-12 00:00:00 │ 22 │ 1440 │ 72949246117 │
│ 2021-04-13 00:00:00 │ 22 │ 1440 │ 72346659092 │
│ 2021-04-14 00:00:00 │ 22 │ 1095 │ 55468665883 │
└─────────────────────┴────────┴───────────┴─────────────┘
┌─────────────────day─┬─userid─┬─downloads─┬───────bytes─┐
│ 2021-04-14 00:00:00 │ 22 │ 351 │ 16904055296 │
│ 2021-04-15 00:00:00 │ 22 │ 1440 │ 73164486383 │
│ 2021-04-16 00:00:00 │ 22 │ 1440 │ 71056228447 │
│ 2021-04-17 00:00:00 │ 22 │ 1440 │ 71659611401 │
│ 2021-04-18 00:00:00 │ 22 │ 329 │ 16360129101 │
└─────────────────────┴────────┴───────────┴─────────────┘
┌─────────────────day─┬─userid─┬─downloads─┬───────bytes─┐
│ 2021-04-11 00:00:00 │ 25 │ 1025 │ 50765199966 │
│ 2021-04-11 00:00:00 │ 25 │ 1048 │ 51955725222 │
│ 2021-04-12 00:00:00 │ 25 │ 1440 │ 73344565998 │
│ 2021-04-12 00:00:00 │ 25 │ 1440 │ 71811987127 │
│ 2021-04-13 00:00:00 │ 25 │ 1440 │ 71120216456 │
│ 2021-04-13 00:00:00 │ 25 │ 1440 │ 71116881712 │
│ 2021-04-14 00:00:00 │ 25 │ 1095 │ 54098241780 │
│ 2021-04-14 00:00:00 │ 25 │ 1072 │ 54787389274 │
└─────────────────────┴────────┴───────────┴─────────────
使用物化视图盲区:
A materialized view is implemented as follows: when inserting data to
the table specified in SELECT, part of the inserted data is converted
by this SELECT query, and the result is inserted in the view.clickhouse官档:https://clickhouse.tech/docs/en/sql-reference/statements/create/view/#materialized
Important Materialized views in ClickHouse are implemented more like
insert triggers. If there’s some aggregation in the view query, it’s
applied only to the batch of freshly inserted data. Any changes to
existing data of source table (like update, delete, drop partition,
etc.) doesn’t change the materialized view.
clickhouse官档:https://clickhouse.tech/docs/en/sql-reference/statements/create/view/#materialized
当向SELECT中指定的表插入数据时,部分插入的数据会被这个SELECT查询转换,结果会插入到视图中。
物化视图本质就像insert语句的触发器;如果有什么集合的运算,他会应用于最新插入的数据当中;对于其他原表的变化,比如说,更新,删除,删除分区,都不会影响到物化视图的变化。
A SELECT query can contain DISTINCT, GROUP BY, ORDER BY, LIMIT… Note that the corresponding conversions are performed independently on each block of inserted data. For example, if GROUP BY is set, data is aggregated during insertion, but only within a single packet of inserted data. The data won’t be further aggregated.
https://clickhouse.tech/docs/en/sql-reference/statements/create/view/#materialized
SELECT查询可以包含DISTINCT, GROUP BY, ORDER BY, LIMIT等等,特别注意这些相关联的约束只能应用于每个新插入的数据块中;比如说,如果设置了group by ,这些语句只会应用于新插入的的数据当中,不会作用于已经插入的分区当中。