位图索引的一大优势就是可以进行and or等逻辑运算。星型转换的实现便是依赖了位图索引的这种特性。
星型转换多用于数据仓库环境中,之所以称之为星型是因为其适用的模型通过图形展现出来像一个海星。这种模型多数是由一个或者多个事实表,多个维度表构成。事实表的数据量较大,而维度表的数据量较小而且往往存在严格的过滤条件,因此,理想的情况下,数据库应该首先方位维度表过滤掉大部分数据,然后关联查询事实表,从而提高查询效率。
在oracle 的example schema 用户sh中,存在很好的演示星型转换的表结构:
SQL> show user
USER 为 "SH"
SQL> select * from tab;
TNAME TABTYPE CLUSTERID
------------------------------ ------- ----------
CAL_MONTH_SALES_MV TABLE
CHANNELS TABLE
COSTS TABLE
COUNTRIES TABLE
CUSTOMERS TABLE
DIMENSION_EXCEPTIONS TABLE
DR$SUP_TEXT_IDX$I TABLE
DR$SUP_TEXT_IDX$K TABLE
DR$SUP_TEXT_IDX$N TABLE
DR$SUP_TEXT_IDX$R TABLE
FWEEK_PSCAT_SALES_MV TABLE
TNAME TABTYPE CLUSTERID
------------------------------ ------- ----------
PRODUCTS TABLE
PROFITS VIEW
PROMOTIONS TABLE
SALES TABLE
SALES_TRANSACTIONS_EXT TABLE
SUPPLEMENTARY_DEMOGRAPHICS TABLE
TIMES TABLE
如何启用星型查询
星型转换是由start_transformation_eanbled参数控制的,该参数具有三个选项:
true:开启星型转换,优化器会根据情况决定是否使用星型转换
flase:关闭星型转换
temp_disable:开启星型转换,但是在星型转换中禁止使用临时表来缓存维度表的过滤结构(关于临时表,在后面我们会详细解释)。
星型转换没有启动的情况
默认情况下,星型转换并没启用
SQL> show parameter transform
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
star_transformation_enabled string FALSE
这时我们看一下查询语句的执行计划:
SQL> l
1 select c.cust_city,
2 t.calendar_quarter_desc,
3 sum(s.amount_sold) sales_amount
4 from sales s,times t,customers c,channels ch
5 where s.time_id=t.time_id
6 and s.cust_id=c.cust_id
7 and s.channel_id = ch.channel_id
8 and c.cust_state_province='FL'
9 and ch.channel_desc='Direct Sales'
10 and t.calendar_quarter_desc in ('2000-01','2000-02','1999-12')
11* GROUP BY c.cust_city ,t.calendar_quarter_desc
SQL> /
CUST_CITY CALENDA SALES_AMOUNT
------------------------------ ------- ------------
Cypress Gardens 2000-01 3545.82
Candler 2000-01 4166.32
Sanibel 2000-02 17908.67
Ocala 2000-02 7081.73
Molino 2000-01 18765.25
Ocala 2000-01 7146.73
Palmdale 2000-02 25829.24
Palmdale 2000-01 37793.44
Molino 2000-02 17144.7
Saint Marks 2000-01 55781.37
Noma 2000-01 33572.55
CUST_CITY CALENDA SALES_AMOUNT
------------------------------ ------- ------------
Evinston 2000-02 62657.21
Candler 2000-02 6493.94
Winter Springs 2000-02 20
Sugarloaf Key 2000-01 12027.66
Saint Marks 2000-02 48858.7
Blountstown 2000-02 38052.58
Sugarloaf Key 2000-02 9659.44
Cypress Gardens 2000-02 4928.93
Evinston 2000-01 53509.69
Blountstown 2000-01 27024.7
Sanibel 2000-01 15870.34
CUST_CITY CALENDA SALES_AMOUNT
------------------------------ ------- ------------
Winter Springs 2000-01 31.46
Noma 2000-02 23903.58
已选择24行。
执行计划
----------------------------------------------------------
Plan hash value: 1865285285
------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 607 | 46132 | 964 (3)| 00:00:12 | | |
| 1 | HASH GROUP BY | | 607 | 46132 | 964 (3)| 00:00:12 | | |
|* 2 | HASH JOIN | | 2337 | 173K| 963 (3)| 00:00:12 | | |
| 3 | PART JOIN FILTER CREATE | :BF0000 | 274 | 4384 | 18 (0)| 00:00:01 | | |
|* 4 | TABLE ACCESS FULL | TIMES | 274 | 4384 | 18 (0)| 00:00:01 | | |
|* 5 | HASH JOIN | | 12456 | 729K| 944 (3)| 00:00:12 | | |
| 6 | MERGE JOIN CARTESIAN | | 383 | 14937 | 409 (1)| 00:00:05 | | |
|* 7 | TABLE ACCESS FULL | CHANNELS | 1 | 13 | 3 (0)| 00:00:01 | | |
| 8 | BUFFER SORT | | 383 | 9958 | 406 (1)| 00:00:05 | | |
|* 9 | TABLE ACCESS FULL | CUSTOMERS | 383 | 9958 | 406 (1)| 00:00:05 | | |
| 10 | PARTITION RANGE JOIN-FILTER| | 918K| 18M| 530 (3)| 00:00:07 |:BF0000|:BF0000|
| 11 | TABLE ACCESS FULL | SALES | 918K| 18M| 530 (3)| 00:00:07 |:BF0000|:BF0000|
------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("S"."TIME_ID"="T"."TIME_ID")
4 - filter("T"."CALENDAR_QUARTER_DESC"='1999-12' OR "T"."CALENDAR_QUARTER_DESC"='2000-01' OR
"T"."CALENDAR_QUARTER_DESC"='2000-02')
5 - access("S"."CUST_ID"="C"."CUST_ID" AND "S"."CHANNEL_ID"="CH"."CHANNEL_ID")
7 - filter("CH"."CHANNEL_DESC"='Direct Sales')
9 - filter("C"."CUST_STATE_PROVINCE"='FL')
统计信息
----------------------------------------------------------
1 recursive calls
0 db block gets
1724 consistent gets
1663 physical reads
0 redo size
1502 bytes sent via SQL*Net to client
531 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
24 rows processed
SQL> select count(*) from sales;
COUNT(*)
----------
918843
从这里可以看出,优化器对表sales选择了全表扫描,这是我们不希望看到的,因为sales 表中记录数量非常大,而查询结构的基数却是非常小的。
下面看看开启星型转换的情况:
SQL> conn / as sysdba
已连接。
SQL> alter system set star_transformation_enabled=true;
SQL> select c.cust_city,
2 t.calendar_quarter_desc,
3 sum(s.amount_sold) sales_amount
4 from sales s,times t,customers c,channels ch
5 where s.time_id=t.time_id
6 and s.cust_id=c.cust_id
7 and s.channel_id = ch.channel_id
8 and c.cust_state_province='FL'
9 and ch.channel_desc='Direct Sales'
10 and t.calendar_quarter_desc in ('2000-01','2000-02','1999-12')
11 GROUP BY c.cust_city ,t.calendar_quarter_desc;
CUST_CITY CALENDA SALES_AMOUNT
------------------------------ ------- ------------
Cypress Gardens 2000-01 3545.82
Candler 2000-01 4166.32
Sanibel 2000-02 17908.67
Ocala 2000-02 7081.73
Molino 2000-01 18765.25
Ocala 2000-01 7146.73
Palmdale 2000-02 25829.24
Palmdale 2000-01 37793.44
Molino 2000-02 17144.7
Saint Marks 2000-01 55781.37
Noma 2000-01 33572.55
CUST_CITY CALENDA SALES_AMOUNT
------------------------------ ------- ------------
Evinston 2000-02 62657.21
Candler 2000-02 6493.94
Winter Springs 2000-02 20
Sugarloaf Key 2000-01 12027.66
Saint Marks 2000-02 48858.7
Blountstown 2000-02 38052.58
Sugarloaf Key 2000-02 9659.44
Cypress Gardens 2000-02 4928.93
Evinston 2000-01 53509.69
Blountstown 2000-01 27024.7
Sanibel 2000-01 15870.34
CUST_CITY CALENDA SALES_AMOUNT
------------------------------ ------- ------------
Winter Springs 2000-01 31.46
Noma 2000-02 23903.58
已选择24行。
执行计划
----------------------------------------------------------
Plan hash value: 1814704031
--------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 238 | 13566 | 551 (1)| 00:00:07 | | |
| 1 | TEMP TABLE TRANSFORMATION | | | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D6609_28CC46 | | | | | | |
|* 3 | TABLE ACCESS FULL | CUSTOMERS | 383 | 9958 | 406 (1)| 00:00:05 | | |
| 4 | HASH GROUP BY | | 238 | 13566 | 145 (3)| 00:00:02 | | |
|* 5 | HASH JOIN | | 238 | 13566 | 144 (2)| 00:00:02 | | |
|* 6 | HASH JOIN | | 238 | 9996 | 141 (1)| 00:00:02 | | |
|* 7 | TABLE ACCESS FULL | TIMES | 274 | 4384 | 18 (0)| 00:00:01 | | |
| 8 | VIEW | VW_ST_A3F94988 | 238 | 6188 | 123 (1)| 00:00:02 | | |
| 9 | NESTED LOOPS | | 238 | 13566 | 100 (1)| 00:00:02 | | |
| 10 | PARTITION RANGE SUBQUERY | | 237 | 6660 | 56 (2)| 00:00:01 |KEY(SQ)|KEY(SQ)|
| 11 | BITMAP CONVERSION TO ROWIDS| | 237 | 6660 | 56 (2)| 00:00:01 | | |
| 12 | BITMAP AND | | | | | | | |
| 13 | BITMAP MERGE | | | | | | | |
| 14 | BITMAP KEY ITERATION | | | | | | | |
| 15 | BUFFER SORT | | | | | | | |
|* 16 | TABLE ACCESS FULL | CHANNELS | 1 | 13 | 3 (0)| 00:00:01 | | |
|* 17 | BITMAP INDEX RANGE SCAN| SALES_CHANNEL_BIX | | | | |KEY(SQ)|KEY(SQ)|
| 18 | BITMAP MERGE | | | | | | | |
| 19 | BITMAP KEY ITERATION | | | | | | | |
| 20 | BUFFER SORT | | | | | | | |
|* 21 | TABLE ACCESS FULL | TIMES | 274 | 4384 | 18 (0)| 00:00:01 | | |
|* 22 | BITMAP INDEX RANGE SCAN| SALES_TIME_BIX | | | | |KEY(SQ)|KEY(SQ)|
| 23 | BITMAP MERGE | | | | | | | |
| 24 | BITMAP KEY ITERATION | | | | | | | |
| 25 | BUFFER SORT | | | | | | | |
| 26 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6609_28CC46 | 383 | 1915 | 2 (0)| 00:00:01 | | |
|* 27 | BITMAP INDEX RANGE SCAN| SALES_CUST_BIX | | | | |KEY(SQ)|KEY(SQ)|
| 28 | TABLE ACCESS BY USER ROWID | SALES | 1 | 29 | 67 (0)| 00:00:01 | ROWID | ROWID |
| 29 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6609_28CC46 | 383 | 5745 | 2 (0)| 00:00:01 | | |
--------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("C"."CUST_STATE_PROVINCE"='FL')
5 - access("ITEM_1"="C0")
6 - access("ITEM_2"="T"."TIME_ID")
7 - filter("T"."CALENDAR_QUARTER_DESC"='1999-12' OR "T"."CALENDAR_QUARTER_DESC"='2000-01' OR
"T"."CALENDAR_QUARTER_DESC"='2000-02')
16 - filter("CH"."CHANNEL_DESC"='Direct Sales')
17 - access("S"."CHANNEL_ID"="CH"."CHANNEL_ID")
21 - filter("T"."CALENDAR_QUARTER_DESC"='1999-12' OR "T"."CALENDAR_QUARTER_DESC"='2000-01' OR
"T"."CALENDAR_QUARTER_DESC"='2000-02')
22 - access("S"."TIME_ID"="T"."TIME_ID")
27 - access("S"."CUST_ID"="C0")
Note
-----
-- - star transformation used for this statement
统计信息
----------------------------------------------------------
134 recursive calls
14 db block gets
12496 consistent gets
1461 physical reads
576 redo size
1502 bytes sent via SQL*Net to client
531 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
4 sorts (memory)
0 sorts (disk)
24 rows processed
通过10053事件查看星型转换的过程,可以发现,优化器执行了类似如下的语句转换:
SELECT "C"."CUST_CITY" "CUST_CITY",
"T"."CALENDAR_QUARTER_DESC" "CALENDAR_QUARTER_DESC",
SUM("S"."AMOUNT_SOLD") "SALES_AMOUNT"
FROM "SH"."SALES" "S",
"SH"."TIMES" "T",
"SH"."CUSTOMERS" "C",
"SH"."CHANNELS" "CH"
WHERE "S"."CHANNEL_ID"=ANY
(SELECT "CH"."CHANNEL_ID" "ITEM_1"
FROM "SH"."CHANNELS" "CH"
WHERE "CH"."CHANNEL_DESC"='Direct Sales'
)
AND "S"."CUST_ID"=ANY
(SELECT "C"."CUST_ID" "ITEM_1"
FROM "SH"."CUSTOMERS" "C"
WHERE "C"."CUST_STATE_PROVINCE"='FL'
)
AND "S"."TIME_ID"=ANY
(SELECT "T"."TIME_ID" "ITEM_1"
FROM "SH"."TIMES" "T"
WHERE "T"."CALENDAR_QUARTER_DESC"='1999-12'
OR "T"."CALENDAR_QUARTER_DESC" ='2000-01'
OR "T"."CALENDAR_QUARTER_DESC" ='2000-02'
)
AND "S"."TIME_ID" ="T"."TIME_ID"
AND "S"."CUST_ID" ="C"."CUST_ID"
AND "S"."CHANNEL_ID" ="CH"."CHANNEL_ID"
AND "C"."CUST_STATE_PROVINCE" ='FL'
AND "CH"."CHANNEL_DESC" ='Direct Sales'
AND ("T"."CALENDAR_QUARTER_DESC"='2000-01'
OR "T"."CALENDAR_QUARTER_DESC" ='2000-02'
OR "T"."CALENDAR_QUARTER_DESC" ='1999-12')
GROUP BY "C"."CUST_CITY",
"T"."CALENDAR_QUARTER_DESC"
省略重复链接
从上文可以看出,star transformation 往往会伴随着事实表对维度表的重复链接,在某些情况下,我们可以避免这种重复(同时满足这些条件):
- 维度表上所有的谓词都是半连接谓词子查询的一部分
- 由子查询检索到的列均唯一(unique)
- 维度表的列不被select或group by涉及
临时表的使用:
若在已知星型转换中重复连接维度表无法被省略的话,Oracle可以将对维度表的子查询结果集存储到内存中的全局临时表(global temporary table)上以避免重复扫描维度表。通用通过10053可以观察到临时表的使用
SELECT "T1"."C1" "CUST_CITY",
"T"."CALENDAR_QUARTER_DESC" "CALENDAR_QUARTER_DESC",
SUM("VW_ST_A3F94988"."ITEM_3") "SALES_AMOUNT"
FROM
(SELECT
/*+ ORDERED USE_NL ("SYS_CP_S") NOPARALLEL ("S") */
"SYS_CP_S"."CUST_ID" "ITEM_1",
"SYS_CP_S"."TIME_ID" "ITEM_2",
"SYS_CP_S"."AMOUNT_SOLD" "ITEM_3"
FROM "SH"."SALES" "S",
"SH"."SALES" "SYS_CP_S"
WHERE "S".ROWID ="SYS_CP_S".ROWID
AND "S"."CUST_ID"=ANY
(SELECT
/*+ SEMIJOIN_DRIVER OPT_ESTIMATE (TABLE "T1" MIN=2438.000000 ) CACHE_TEMP_TABLE ("T1") */
"T1"."C0" "C0"
-- FROM "SYS"."SYS_TEMP_0FD9D660E_293CEA" "T1"
)
AND "S"."CHANNEL_ID"=ANY
(SELECT
/*+ SEMIJOIN_DRIVER */
"CH"."CHANNEL_ID" "ITEM_1"
FROM "SH"."CHANNELS" "CH"
WHERE "CH"."CHANNEL_DESC"='Direct Sales'
)
AND "S"."TIME_ID"=ANY
(SELECT
/*+ SEMIJOIN_DRIVER */
"T"."TIME_ID" "ITEM_1"
FROM "SH"."TIMES" "T"
WHERE "T"."CALENDAR_QUARTER_DESC"='2000-01'
OR "T"."CALENDAR_QUARTER_DESC" ='2000-02'
OR "T"."CALENDAR_QUARTER_DESC" ='1999-12'
)
) "VW_ST_A3F94988",
"SH"."TIMES" "T",
-- "SYS"."SYS_TEMP_0FD9D660E_293CEA" "T1"
WHERE "VW_ST_A3F94988"."ITEM_2" ="T"."TIME_ID"
AND "VW_ST_A3F94988"."ITEM_1" ="T1"."C0"
AND ("T"."CALENDAR_QUARTER_DESC"='2000-01'
OR "T"."CALENDAR_QUARTER_DESC" ='2000-02'
OR "T"."CALENDAR_QUARTER_DESC" ='1999-12')
GROUP BY "T1"."C1",
"T"."CALENDAR_QUARTER_DESC"