注:low-cardinality是指该列或者列的组合具有的不同值的个数较少,即该列有很多重复值。high-cardinality是指该列或者列的组合具有不同的值的个数较多,即该列有很少的重复值。

理解每种索引的适用场合将对性能产生重大影响。

传统观念认为位图索引最适用于拥有很少不同值的列 ---- 例如GENDER, MARITAL_STATUS,和RELATION。但是,这种假设是不准确的。实际上,对于大多非频繁更新的并发系统,位图索引也是适用的。事实上,下面将会展示,对以一个具有100%唯一值的列(主键的候选列)来说,位图索引和B树索引一样有效。

本文将提供一些例子以及优化建议,它们对于low-cardinality和high-cardinality上的两种索引都是通用的。这些例子将帮助DBA理解位图索引不是依赖于cardinality而是依赖于程序自身。

索引比较

在唯一列上使用位图索引有一些缺点 --- 其中一个是需要足够的空间(Oracle不推荐使用)。但是,位图索引的大小不但与位图索引列的cardinality有关,还与数据的分布有关。因此,GENDER列上的位图索引比B树索引要小,相反,EMPNO上的位图索引比B树索引大的多。但是,相对于OLTP系统来说,决策支持系统只有很少的用户访问,因而对于这些系统,资源不是问题。

为了阐明这个观点,我创建了两个表,test_normal和test_random。用PL/SQL块在test_normal中插入100万条记录,然后在test_random表中随机插入相同的记录。

1. Create table test_normal (empno number(10), ename varchar2(30), sal number(10)); 
2. 
3. Begin 
4. For i in
5. Loop 
6. Insert into test_normal 
7. values(i, dbms_random.string('U',30), dbms_random.value(1000,7000)); 
8. If mod(i, 10000) = 0 then 
9. Commit; 
10. End if; 
11. End loop; 
12. End; 
13. / 
14. 
15. Create table test_random 
16. as
17. select /*+ append */
18. 
19. SQL> select count(*) "Total Rows"
20. 
21. Total Rows 
22. ---------- 
23. 1000000 
24. 
25. Elapsed: 00:00:01.09 
26. 
27. SQL> select count(distinct empno) "Distinct Values"
28. 
29. Distinct Values 
30. --------------- 
31. 1000000 
32. 
33. Elapsed: 00:00:06.09 
34. SQL> select count(*) "Total Rows"
35. 
36. Total Rows 
37. ---------- 
38. 1000000 
39. 
40. Elapsed: 00:00:03.05 
41. SQL> select count(distinct empno) "Distinct Values"
42. 
43. Distinct Values 
44. --------------- 
45. 1000000 
46. 
47. Elapsed: 00:00:12.07 
 
Create table test_normal (empno number(10), ename varchar2(30), sal number(10));

Begin
For i in 1..1000000
Loop
   Insert into test_normal 
   values(i, dbms_random.string('U',30), dbms_random.value(1000,7000));
   If mod(i, 10000) = 0 then
   Commit;
  End if;
End loop;
End;
/
  
Create table test_random 
as 
select /*+ append */ * from test_normal order by dbms_random.random;

SQL> select count(*) "Total Rows" from test_normal;

Total Rows
----------
   1000000

Elapsed: 00:00:01.09

SQL> select count(distinct empno) "Distinct Values" from test_normal;

Distinct Values
---------------
        1000000

Elapsed: 00:00:06.09
SQL> select count(*) "Total Rows" from test_random;

Total Rows
----------
   1000000

Elapsed: 00:00:03.05
SQL> select count(distinct empno) "Distinct Values" from test_random;

Distinct Values
---------------
        1000000

Elapsed: 00:00:12.07

注意,test_normal表是组织良好的,test_random表是随机创建的,因此,其中的数据是无组织的。在上面的表中,EMPNO列上的值完全不同,因此可以作为候选主键。如果你把该列定义为主键,oracle将会建立一个B树索引,因为Oracle不支持主键位图索引。

为了分析这些索引的行为,我们执行下面的步骤:

  1. 在表test_normal上:
  1. 在EMPNO列上建立一个位图索引,并执行一些相等性查询。
  2. 在EMPNO列上建立一个B树索引,执行一些相等性查询,并且比较获得不同结果集所执行的查询需要的物理I/O和逻辑I/O的次数。
  1. 在表test_random表上:
  1. 和1.1相同的步骤
  2. 和1.2相同的步骤
  1. 在表test_normal上:
  1. 和1.1相同的步骤,但是执行范围查询。
  2. 和1.2相同的步骤,但是执行范围查询。比较统计结果。
  1. 在表test_random表上:
  1. 和3.1相同的步骤。
  2. 和3.2相同的步骤
  1. 在表test_normal上:
  1. 在SAL列上建立一个位图索引,并且执行一些相等性查询和范围查询。
  2. 在SAL列上建立一个B树索引,并且执行一些相等性查询和范围查询(和5.1相同的结果集),比较获取结果执行的I/O次数。
  1. 在两个表中添加GENDER列,并且把该列更新为3个可能的值:M(女性), F(男性), null(未知)。根据一些条件更新该列的值。
  2. 在该列上建立一个位图索引并且执行一些相等性查询。
  3. 在GENDER列上建立一个B树索引并且执行一些相等性查询,和步骤7的结果比较。

步骤1到4涉及一个high-cardinality列(完全不同),步骤5是一个normal-cardinality列,步骤7和8是一个low-cardinality列。

步骤1.1(在表test_normal上)

在该步中,我们在表test_normal上建立一个位图索引,然后检查索引的大小、聚簇因子(clustering factor)和表的大小。然后执行一些相等性查询并且查看使用位图索引时查询需要的I/O次数。


1. SQL> create bitmap index normal_empno_bmx on test_normal(empno); 
2. 
3. Index created. 
4. 
5. Elapsed: 00:00:29.06 
6. SQL> analyze table test_normal compute statistics for table for all indexes for
7. 
8. 
9. Table analyzed. 
10. 
11. Elapsed: 00:00:19.01 
12. SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024  "Size in MB"
13. 2 from user_segments 
14. 3* where segment_name in ('TEST_NORMAL','NORMAL_EMPNO_BMX'); 
15. 
16. SEGMENT_NAME Size in
17. ------------------------------------ ---------------  
18. TEST_NORMAL 50 
19. NORMAL_EMPNO_BMX 28 
20. 
21. Elapsed: 00:00:02.00 
22. SQL> select index_name, clustering_factor from user_indexes; 
23. 
24. INDEX_NAME CLUSTERING_FACTOR 
25. ------------------------------ --------------------------------- 
26. NORMAL_EMPNO_BMX 1000000 
27. 
28. Elapsed: 00:00:00.00 
 
SQL> create bitmap index normal_empno_bmx on test_normal(empno);

Index created.

Elapsed: 00:00:29.06
SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;


Table analyzed.

Elapsed: 00:00:19.01
SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2  from user_segments
  3* where segment_name in ('TEST_NORMAL','NORMAL_EMPNO_BMX');
 
SEGMENT_NAME                               Size in MB
------------------------------------       ---------------
TEST_NORMAL                                50
NORMAL_EMPNO_BMX                           28

Elapsed: 00:00:02.00
SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME                             CLUSTERING_FACTOR
------------------------------         ---------------------------------
NORMAL_EMPNO_BMX                       1000000

Elapsed: 00:00:00.00
可以看到,表上索引的大小是28M并且聚簇因子的大小等于表中的行数。现在我们为不同的结果集执行一些相等性查询:
 
 
 
1. SQL> set
2. SQL> select * from test_normal where empno=&empno; 
3. Enter value for
4. old 1: select * from test_normal where empno=&empno 
5. new
6. 
7. Elapsed: 00:00:00.01 
8. 
9. Execution Plan 
10. ---------------------------------------------------------- 
11. 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34) 
12. 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL'
13. d=1 Bytes=34) 
14. 2 1 BITMAP CONVERSION (TO ROWIDS) 
15. 3 2 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_EMPNO_BMX'
16. 
17. Statistics 
18. ---------------------------------------------------------- 
19. 0 recursive calls 
20. 0 db block gets 
21. 5 consistent gets 
22. 0 physical reads 
23. 0 redo size 
24. 515 bytes sent via SQL*Net to client 
25. 499 bytes received via SQL*Net from client 
26. 2 SQL*Net roundtrips to/from client 
27. 0 sorts (memory) 
28. 0 sorts (disk) 
29. 1 rows processed 
 
SQL> set autotrace only
SQL> select * from test_normal where empno=&empno;
Enter value for empno: 1000
old   1: select * from test_normal where empno=&empno
new   1: select * from test_normal where empno=1000

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car
          d=1 Bytes=34)
   2    1     BITMAP CONVERSION (TO ROWIDS)
   3    2       BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_EMPNO_BMX'

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
        515  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed
步骤1.2(在表test_normal上)

现在删除表中EMPNO列上的位图索引并创建一个B树索引。像前面一样我们查看索引的大小、聚簇因子的大小并且执行相同的查询,比较I/O的次数。




1. SQL> drop index NORMAL_EMPNO_BMX; 
2. 
3. Index dropped. 
4. 
5. SQL> create index normal_empno_idx on test_normal(empno);  
6. 
7. Index created. 
8. 
9. SQL> analyze table test_normal compute statistics for table for all indexes for
10. 
11. Table analyzed. 
12. 
13. SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
14. 2 from user_segments 
15. 3 where segment_name in ('TEST_NORMAL','NORMAL_EMPNO_IDX'); 
16. 
17. SEGMENT_NAME Size in
18. ---------------------------------- --------------- 
19. TEST_NORMAL 50 
20. NORMAL_EMPNO_IDX 18 
21. 
22. SQL> select index_name, clustering_factor from user_indexes; 
23. 
24. INDEX_NAME CLUSTERING_FACTOR 
25. ---------------------------------- ---------------------------------- 
26. NORMAL_EMPNO_IDX 6210 
 
SQL> drop index NORMAL_EMPNO_BMX;

Index dropped.

SQL> create index normal_empno_idx on test_normal(empno);

Index created.

SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;

Table analyzed.

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2  from user_segments
  3  where segment_name in ('TEST_NORMAL','NORMAL_EMPNO_IDX');

SEGMENT_NAME                               Size in MB
----------------------------------         ---------------
TEST_NORMAL                                50
NORMAL_EMPNO_IDX                           18

SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME                            CLUSTERING_FACTOR
----------------------------------    ----------------------------------
NORMAL_EMPNO_IDX                      6210

很明显,在该表的EMPNO列上,B树索引比位图索引要小。B树索引上的聚簇因子接近于表中的数据块数;因此B树索引对于范围查询更有效。

现在,我们使用B树索引执行相同的查询。


1. SQL> set
2. SQL> select * from test_normal where empno=&empno; 
3. Enter value for
4. old 1: select * from test_normal where empno=&empno 
5. new
6. 
7. Elapsed: 00:00:00.01 
8. 
9. Execution Plan 
10. ---------------------------------------------------------- 
11. 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34) 
12. 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL'
13. d=1 Bytes=34) 
14. 2 1 INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX'
15. ost=3 Card=1) 
16. 
17. Statistics 
18. ---------------------------------------------------------- 
19. 29 recursive calls 
20. 0 db block gets 
21. 5 consistent gets 
22. 0 physical reads 
23. 0 redo size 
24. 515 bytes sent via SQL*Net to client 
25. 499 bytes received via SQL*Net from client 
26. 2 SQL*Net roundtrips to/from client 
27. 0 sorts (memory) 
28. 0 sorts (disk) 
29. 1 rows processed 
 
SQL> set autot trace
SQL> select * from test_normal where empno=&empno;
Enter value for empno: 1000
old   1: select * from test_normal where empno=&empno
new   1: select * from test_normal where empno=1000

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=4 Car
          d=1 Bytes=34)
   2    1     INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE) (C
          ost=3 Card=1)

Statistics
----------------------------------------------------------
         29  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
        515  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

可以看到,对于相同的结果集,在唯一列上的位图索引和B树索引需要相同的物理和逻辑读取次数。

BITMAP(位图)

EMPNO

B-TREE(B树)

Consistent Reads

Physical Reads

Consistent Reads

Physical Reads

5

0

1000

5

0

5

2

2398

5

2

5

2

8545

5

2

5

2

98008

5

2

5

2

85342

5

2

5

2

128444

5

2

5

2

858

5

2

步骤2.1(在表test_random上)

现在,在test_random表上执行相同的操作:



1. SQL> create bitmap index random_empno_bmx on test_random(empno); 
2. 
3. Index created. 
4. 
5. SQL> analyze table test_random compute statistics for table for all indexes for
6. 
7. Table analyzed. 
8. 
9. SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
10. 2 from user_segments 
11. 3* where segment_name in ('TEST_RANDOM','RANDOM_EMPNO_BMX'); 
12. 
13. SEGMENT_NAME Size in
14. ------------------------------------ --------------- 
15. TEST_RANDOM 50 
16. RANDOM_EMPNO_BMX 28 
17. 
18. SQL> select index_name, clustering_factor from user_indexes; 
19. 
20. INDEX_NAME CLUSTERING_FACTOR 
21. ------------------------------ --------------------------------- 
22. RANDOM_EMPNO_BMX 1000000 
 
SQL> create bitmap index random_empno_bmx on test_random(empno);

Index created.

SQL> analyze table test_random compute statistics for table for all indexes for all indexed columns;

Table analyzed.

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2  from user_segments
  3* where segment_name in ('TEST_RANDOM','RANDOM_EMPNO_BMX');
 
SEGMENT_NAME                               Size in MB
------------------------------------       ---------------
TEST_RANDOM                                50
RANDOM_EMPNO_BMX                           28

SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME                             CLUSTERING_FACTOR
------------------------------         ---------------------------------
RANDOM_EMPNO_BMX                       1000000

再次,索引上的统计结果(大小和聚簇因子)和在表test_normal中是相同的:



1. SQL> select * from test_random where empno=&empno;  
2. Enter value for
3. old 1: select * from test_random where empno=&empno  
4. new
5. 
6. Elapsed: 00:00:00.01 
7. 
8. Execution Plan 
9. ----------------------------------------------------------  
10. 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34) 
11. 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM'
12. 2 1 BITMAP CONVERSION (TO ROWIDS) 
13. 3 2 BITMAP INDEX (SINGLE VALUE) OF 'RANDOM_EMPNO_BMX'
14. 
15. Statistics 
16. ---------------------------------------------------------- 
17. 0 recursive calls 
18. 0 db block gets 
19. 5 consistent gets 
20. 0 physical reads 
21. 0 redo size 
22. 515 bytes sent via SQL*Net to client 
23. 499 bytes received via SQL*Net from client 
24. 2 SQL*Net roundtrips to/from client 
25. 0 sorts (memory) 
26. 0 sorts (disk) 
27. 1 rows processed 
 
SQL> select * from test_random where empno=&empno;
Enter value for empno: 1000
old   1: select * from test_random where empno=&empno
new   1: select * from test_random where empno=1000

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4 Card=1 Bytes=34)
   2    1     BITMAP CONVERSION (TO ROWIDS)
   3    2       BITMAP INDEX (SINGLE VALUE) OF 'RANDOM_EMPNO_BMX'

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
        515  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

步骤2.2(在表test_random上)

现在,和步骤1.2一样,我们删除EMPNO列上的位图索引并且创建一个B树索引。


1. SQL> drop index RANDOM_EMPNO_BMX; 
2. 
3. Index dropped. 
4. 
5. SQL> create index random_empno_idx on test_random(empno);  
6. 
7. Index created. 
8. 
9. SQL> analyze table test_random compute statistics for table for all indexes for
10. 
11. Table analyzed. 
12. 
13. SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
14. 2 from user_segments 
15. 3 where segment_name in ('TEST_RANDOM','RANDOM_EMPNO_IDX'); 
16. 
17. SEGMENT_NAME Size in
18. ---------------------------------- --------------- 
19. TEST_RANDOM 50 
20. RANDOM_EMPNO_IDX 18 
21. 
22. SQL> select index_name, clustering_factor from user_indexes; 
23. 
24. INDEX_NAME CLUSTERING_FACTOR 
25. ---------------------------------- ---------------------------------- 
26. RANDOM_EMPNO_IDX 999830 
 
SQL> drop index RANDOM_EMPNO_BMX;

Index dropped.

SQL> create index random_empno_idx on test_random(empno);

Index created.

SQL> analyze table test_random compute statistics for table for all indexes for all indexed columns;

Table analyzed.

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2  from user_segments
  3  where segment_name in ('TEST_RANDOM','RANDOM_EMPNO_IDX');

SEGMENT_NAME                               Size in MB
----------------------------------         ---------------
TEST_RANDOM                                50
RANDOM_EMPNO_IDX                           18

SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME                            CLUSTERING_FACTOR
----------------------------------    ----------------------------------
RANDOM_EMPNO_IDX                      999830

该表的索引大小和表test_normal是一样的,但是聚簇因子更接近于行数,这就使得该索引对于范围查询不再高效。该聚簇因子不影响相等性查询,因为该列的值是唯一的,每个键对应1行记录。

现在,在相同的结果集上执行相等性查询。



1. SQL> select * from test_random where empno=&empno;  
2. Enter value for
3. old 1: select * from test_random where empno=&empno  
4. new
5. 
6. Elapsed: 00:00:00.01 
7. 
8. Execution Plan 
9. ----------------------------------------------------------  
10. 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34) 
11. 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM'
12. 2 1 INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_IDX'
13. 
14. Statistics 
15. ----------------------------------------------------------  
16. 0 recursive calls 
17. 0 db block gets 
18. 5 consistent gets 
19. 0 physical reads 
20. 0 redo size 
21. 515 bytes sent via SQL*Net to client 
22. 499 bytes received via SQL*Net from client 
23. 2 SQL*Net roundtrips to/from client 
24. 0 sorts (memory) 
25. 0 sorts (disk) 
26. 1 rows processed 
 
SQL> select * from test_random where empno=&empno;
Enter value for empno: 1000
old   1: select * from test_random where empno=&empno
new   1: select * from test_random where empno=1000

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=1 Bytes=34)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=4 Card=1 Bytes=34)
   2    1     INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_IDX' (NON-UNIQUE) (Cost=3 Card=1)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
        515  bytes sent via SQL*Net to client
        499  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

再次表明,结果和步骤1.1和1.2几乎相同。对于唯一列来说,数据分布不影响逻辑和物理I/O。

步骤3.1(在表test_normal上)

在该步中,我们将创建一个位图索引。我们知道索引的聚簇因子大小和表中的行数相同。现在我们执行一些范围查询。



1. SQL> select * from test_normal where empno between &range1 and &range2; 
2. Enter value for
3. Enter value for
4. old 1: select * from test_normal where empno between &range1 and &range2 
5. new
6. 
7. 2300 rows selected. 
8. 
9. Elapsed: 00:00:00.03 
10. 
11. Execution Plan 
12. ---------------------------------------------------------- 
13. 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=451 Card=2299 Bytes=78166) 
14. 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL'
15. 2 1 BITMAP CONVERSION (TO ROWIDS) 
16. 3 2 BITMAP INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_BMX'
17. 
18. Statistics 
19. ----------------------------------------------------------  
20. 0 recursive calls 
21. 0 db block gets 
22. 331 consistent gets 
23. 0 physical reads 
24. 0 redo size 
25. 111416 bytes sent via SQL*Net to client 
26. 2182 bytes received via SQL*Net from client 
27. 155 SQL*Net roundtrips to/from client 
28. 0 sorts (memory) 
29. 0 sorts (disk) 
30. 2300 rows processed 
 
SQL> select * from test_normal where empno between &range1 and &range2;
Enter value for range1: 1
Enter value for range2: 2300
old   1: select * from test_normal where empno between &range1 and &range2
new   1: select * from test_normal where empno between 1 and 2300

2300 rows selected.

Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=451 Card=2299 Bytes=78166)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=451 Card=2299 Bytes=78166)
   2    1     BITMAP CONVERSION (TO ROWIDS)
   3    2       BITMAP INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_BMX'

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        331  consistent gets
          0  physical reads
          0  redo size
     111416  bytes sent via SQL*Net to client
       2182  bytes received via SQL*Net from client
        155  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       2300  rows processed

步骤3.2(在表test_normal上)

该步中,我们在test_normal的B树索引上执行查询。


1. SQL> select * from test_normal where empno between &range1 and &range2; 
2. Enter value for
3. Enter value for
4. old 1: select * from test_normal where empno between &range1 and &range2 
5. new
6. 
7. 2300 rows selected. 
8. 
9. Elapsed: 00:00:00.02 
10. 
11. Execution Plan 
12. ---------------------------------------------------------- 
13. 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=23 Card=2299 Bytes=78166) 
14. 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL'
15. 2 1 INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX'
16. 
17. Statistics 
18. ---------------------------------------------------------- 
19. 0 recursive calls 
20. 0 db block gets 
21. 329 consistent gets 
22. 15 physical reads 
23. 0 redo size 
24. 111416 bytes sent via SQL*Net to client 
25. 2182 bytes received via SQL*Net from client 
26. 155 SQL*Net roundtrips to/from client 
27. 0 sorts (memory) 
28. 0 sorts (disk) 
29. 2300 rows processed 
 
SQL> select * from test_normal where empno between &range1 and &range2;
Enter value for range1: 1
Enter value for range2: 2300
old   1: select * from test_normal where empno between &range1 and &range2
new   1: select * from test_normal where empno between 1 and 2300

2300 rows selected.

Elapsed: 00:00:00.02

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=23 Card=2299 Bytes=78166)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=23 Card=2299 Bytes=78166)
   2    1     INDEX (RANGE SCAN) OF 'NORMAL_EMPNO_IDX' (NON-UNIQUE) (Cost=8 Card=2299)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        329  consistent gets
         15  physical reads
          0  redo size
     111416  bytes sent via SQL*Net to client
       2182  bytes received via SQL*Net from client
        155  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       2300  rows processed

在不同的范围上执行的查询结果如下:

BITMAP

EMPNO (Range)

B-TREE

Consistent Reads

Physical Reads

Consistent Reads

Physical Reads

331

0

1-2300

329

0

285

0

8-1980

283

0

346

19

1850-4250

344

16

427

31

28888-31850

424

28

371

27

82900-85478

367

23

2157

149

984888-1000000

2139

35

可以看到,在两种索引上需要的逻辑和物理IO基本上是相同的。最后一个范围(984888-1000000)差不多返回了15,000行,是所有范围查询中最大的。当我们执行全表扫描时(通过/*+ full(test_normal) */

步骤4.1(在表test_random上)

在该步中,我们将在表test_random的位图索引上执行范围查询,在这儿,你将看到聚簇因子的影响。


1. SQL>select * from test_random where empno between &range1 and &range2; 
2. Enter value for
3. Enter value for
4. old 1: select * from test_random where empno between &range1 and &range2 
5. new
6. 
7. 2300 rows selected. 
8. 
9. Elapsed: 00:00:08.01 
10. 
11. Execution Plan 
12. ---------------------------------------------------------- 
13. 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=453 Card=2299 Bytes=78166) 
14. 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM'
15. 2 1 BITMAP CONVERSION (TO ROWIDS) 
16. 3 2 BITMAP INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_BMX'
17. 
18. Statistics 
19. ----------------------------------------------------------  
20. 0 recursive calls 
21. 0 db block gets 
22. 2463 consistent gets 
23. 1200 physical reads 
24. 0 redo size 
25. 111416 bytes sent via SQL*Net to client 
26. 2182 bytes received via SQL*Net from client 
27. 155 SQL*Net roundtrips to/from client 
28. 0 sorts (memory) 
29. 0 sorts (disk) 
30. 2300 rows processed 
 
SQL>select * from test_random where empno between &range1 and &range2;
Enter value for range1: 1
Enter value for range2: 2300
old   1: select * from test_random where empno between &range1 and &range2
new   1: select * from test_random where empno between 1 and 2300

2300 rows selected.

Elapsed: 00:00:08.01

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=453 Card=2299 Bytes=78166)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_RANDOM' (Cost=453 Card=2299 Bytes=78166)
   2    1     BITMAP CONVERSION (TO ROWIDS)
   3    2       BITMAP INDEX (RANGE SCAN) OF 'RANDOM_EMPNO_BMX'

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       2463  consistent gets
       1200  physical reads
          0  redo size
     111416  bytes sent via SQL*Net to client
       2182  bytes received via SQL*Net from client
        155  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       2300  rows processed

步骤4.2(在表test_random上)

在该步中,我们将在test_random的B树索引上执行范围查询。回想一下,该索引上的聚簇因子接近于表中记录的行数。下面是优化器的输出:


1. SQL> select * from test_random where empno between &range1 and &range2; 
2. Enter value for
3. Enter value for
4. old 1: select * from test_random where empno between &range1 and &range2 
5. new
6. 
7. 2300 rows selected. 
8. 
9. Elapsed: 00:00:03.04 
10. 
11. Execution Plan 
12. ---------------------------------------------------------- 
13. 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=613 Card=2299 Bytes=78166) 
14. 1 0 TABLE ACCESS (FULL) OF 'TEST_RANDOM'
15. 
16. Statistics 
17. ----------------------------------------------------------  
18. 0 recursive calls 
19. 0 db block gets 
20. 6415 consistent gets 
21. 4910 physical reads 
22. 0 redo size 
23. 111416 bytes sent via SQL*Net to client 
24. 2182 bytes received via SQL*Net from client 
25. 155 SQL*Net roundtrips to/from client 
26. 0 sorts (memory) 
27. 0 sorts (disk) 
28. 2300 rows processed 
 
SQL> select * from test_random where empno between &range1 and &range2;
Enter value for range1: 1
Enter value for range2: 2300
old   1: select * from test_random where empno between &range1 and &range2
new   1: select * from test_random where empno between 1 and 2300

2300 rows selected.

Elapsed: 00:00:03.04

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=613 Card=2299 Bytes=78166)
   1    0   TABLE ACCESS (FULL) OF 'TEST_RANDOM' (Cost=613 Card=2299 Bytes=78166)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       6415  consistent gets
       4910  physical reads
          0  redo size
     111416  bytes sent via SQL*Net to client
       2182  bytes received via SQL*Net from client
        155  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       2300  rows processed
因为聚簇因子的缘故,优化器选择了全表扫描而不是使用索引:

BITMAP

EMPNO (Range)

B-TREE

Consistent Reads

Physical Reads

Consistent Reads

Physical Reads

2463

1200

1-2300

6415

4910

2114

31

8-1980

6389

4910

2572

1135

1850-4250

6418

4909

3173

1620

28888-31850

6456

4909

2762

1358

82900-85478

6431

4909

7254

3329

984888-1000000

7254

4909

仅对于最后一个范围(984888-1000000),对于位图索引优化器选择了全表扫描。然而,对于B树索引,全部使用全表扫描。引起这种差异的原因是聚簇因子:优化器在产生执行计划时不考虑位图索引的聚簇因子,但是对于B树索引来说,则需要 考虑聚簇因子。在上面的情况中,位图索引比B树索引更有效。

下面的步骤揭示了这些索引更有趣的方面。

步骤5.1(在表test_normal上)

在表test_normal的SAL列上建立一个位图索引,该列拥有普通的cardinality。


1. SQL> create bitmap index normal_sal_bmx on test_normal(sal); 
2. 
3. Index created. 
4. 
5. SQL> analyze table test_normal compute statistics for table for all indexes for
6. 
7. Table analyzed. 
 
SQL> create bitmap index normal_sal_bmx on test_normal(sal);

Index created.

SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;

Table analyzed.
得到索引的大小和聚簇因子:
 
 
 
1. SQL>select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
2. 2* from user_segments 
3. 3* where segment_name in ('TEST_NORMAL','NORMAL_SAL_BMX'); 
4. 
5. SEGMENT_NAME Size in
6. ------------------------------ -------------- 
7. TEST_NORMAL 50 
8. NORMAL_SAL_BMX 4 
9. 
10. SQL> select index_name, clustering_factor from user_indexes; 
11. 
12. INDEX_NAME CLUSTERING_FACTOR 
13. ------------------------------ ---------------------------------- 
14. NORMAL_SAL_BMX 6001 
 
SQL>select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2* from user_segments
  3* where segment_name in ('TEST_NORMAL','NORMAL_SAL_BMX');

SEGMENT_NAME                                Size in MB
------------------------------              --------------
TEST_NORMAL                                 50
NORMAL_SAL_BMX                              4

SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME                             CLUSTERING_FACTOR
------------------------------         ----------------------------------
NORMAL_SAL_BMX                         6001

下面执行查询,首先执行相等性查询:



1. SQL> set
2. SQL> select * from test_normal where sal=&sal; 
3. Enter value for
4. old 1: select * from test_normal where sal=&sal 
5. new
6. 
7. 164 rows selected. 
8. 
9. Elapsed: 00:00:00.08 
10. 
11. Execution Plan 
12. ---------------------------------------------------------- 
13. 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=39 Card=168 Bytes=4032) 
14. 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL'
15. 2 1 BITMAP CONVERSION (TO ROWIDS) 
16. 3 2 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
17. 
18. Statistics 
19. ----------------------------------------------------------  
20. 0 recursive calls 
21. 0 db block gets 
22. 165 consistent gets 
23. 0 physical reads 
24. 0 redo size 
25. 8461 bytes sent via SQL*Net to client 
26. 609 bytes received via SQL*Net from client 
27. 12 SQL*Net roundtrips to/from client 
28. 0 sorts (memory) 
29. 0 sorts (disk) 
30. 164 rows processed 
 
SQL> set autot trace
SQL> select * from test_normal where sal=&sal;
Enter value for sal: 1869
old   1: select * from test_normal where sal=&sal
new   1: select * from test_normal where sal=1869

164 rows selected.

Elapsed: 00:00:00.08

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=39 Card=168 Bytes=4032)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=39 Card=168 Bytes=4032)
   2    1     BITMAP CONVERSION (TO ROWIDS)
   3    2       BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        165  consistent gets
          0  physical reads
          0  redo size
       8461  bytes sent via SQL*Net to client
        609  bytes received via SQL*Net from client
         12  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        164  rows processed
接下来是范围查询:
 
 
 
1. SQL> select * from test_normal where sal between &sal1 and &sal2; 
2. Enter value for
3. Enter value for
4. old 1: select * from test_normal where sal between &sal1 and &sal2  
5. new
6. 
7. 83743 rows selected. 
8. 
9. Elapsed: 00:00:05.00 
10. 
11. Execution Plan 
12. ---------------------------------------------------------- 
13. 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes 
14. =2001024) 
15. 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL'
16. Bytes=2001024) 
17. 
18. Statistics 
19. ----------------------------------------------------------  
20. 0 recursive calls 
21. 0 db block gets 
22. 11778 consistent gets 
23. 5850 physical reads 
24. 0 redo size 
25. 4123553 bytes sent via SQL*Net to client 
26. 61901 bytes received via SQL*Net from client 
27. 5584 SQL*Net roundtrips to/from client 
28. 0 sorts (memory) 
29. 0 sorts (disk) 
30. 83743 rows processed 
 
SQL> select * from test_normal where sal between &sal1 and &sal2;
Enter value for sal1: 1500
Enter value for sal2: 2000
old   1: select * from test_normal where sal between &sal1 and &sal2
new   1: select * from test_normal where sal between 1500 and 2000

83743 rows selected.

Elapsed: 00:00:05.00

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes
          =2001024)
   1    0   TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376
          Bytes=2001024)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      11778  consistent gets
       5850  physical reads
          0  redo size
    4123553  bytes sent via SQL*Net to client
      61901  bytes received via SQL*Net from client
       5584  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      83743  rows processed
现在,删除test_normal上的位图索引并且建立一个B树索引。


1. SQL> create index normal_sal_idx on test_normal(sal); 
2. 
3. Index created. 
4. 
5. SQL> analyze table test_normal compute statistics for table for all indexes for
6. 
7. Table analyzed. 
 
SQL> create index normal_sal_idx on test_normal(sal);

Index created.

SQL> analyze table test_normal compute statistics for table for all indexes for all indexed columns;

Table analyzed.
查看索引大小和聚簇因子:
 
 
 
1. SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
2. 2 from user_segments 
3. 3 where segment_name in ('TEST_NORMAL','NORMAL_SAL_IDX'); 
4. 
5. SEGMENT_NAME Size in
6. ------------------------------ --------------- 
7. TEST_NORMAL 50 
8. NORMAL_SAL_IDX 17 
9. 
10. SQL> select index_name, clustering_factor from user_indexes; 
11. 
12. INDEX_NAME CLUSTERING_FACTOR 
13. ------------------------------ ---------------------------------- 
14. NORMAL_SAL_IDX 986778 
 
SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2  from user_segments
  3  where segment_name in ('TEST_NORMAL','NORMAL_SAL_IDX');

SEGMENT_NAME                         Size in MB
------------------------------       ---------------
TEST_NORMAL                          50
NORMAL_SAL_IDX                       17

SQL> select index_name, clustering_factor from user_indexes;

INDEX_NAME                           CLUSTERING_FACTOR
------------------------------       ----------------------------------
NORMAL_SAL_IDX                       986778

从上表可以看出,B树索引大于相同列上的位图索引,它的聚簇因子接近于表中的行数。

现在,先执行相等性查询:


1. SQL> set
2. SQL> select * from test_normal where sal=&sal; 
3. Enter value for
4. old 1: select * from test_normal where sal=&sal 
5. new
6. 
7. 164 rows selected. 
8. 
9. Elapsed: 00:00:00.01 
10. 
11. Execution Plan 
12. ---------------------------------------------------------- 
13. 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=169 Card=168 Bytes=4032) 
14. 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL'
15. 2 1 INDEX (RANGE SCAN) OF 'NORMAL_SAL_IDX'
16. 
17. Statistics 
18. ---------------------------------------------------------- 
19. 0 recursive calls 
20. 0 db block gets 
21. 177 consistent gets 
22. 0 physical reads 
23. 0 redo size 
24. 8461 bytes sent via SQL*Net to client 
25. 609 bytes received via SQL*Net from client 
26. 12 SQL*Net roundtrips to/from client 
27. 0 sorts (memory) 
28. 0 sorts (disk) 
29. 164 rows processed 
 
SQL> set autot trace
SQL> select * from test_normal where sal=&sal;
Enter value for sal: 1869
old   1: select * from test_normal where sal=&sal
new   1: select * from test_normal where sal=1869

164 rows selected.

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=169 Card=168 Bytes=4032)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=169 Card=168 Bytes=4032)
   2    1     INDEX (RANGE SCAN) OF 'NORMAL_SAL_IDX' (NON-UNIQUE) (Cost=3 Card=168)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        177  consistent gets
          0  physical reads
          0  redo size
       8461  bytes sent via SQL*Net to client
        609  bytes received via SQL*Net from client
         12  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        164  rows processed

接下来是范围查询:



1. SQL> select * from test_normal where sal between &sal1 and &sal2; 
2. Enter value for
3. Enter value for
4. old 1: select * from test_normal where sal between &sal1 and &sal2  
5. new
6. 
7. 83743 rows selected. 
8. 
9. Elapsed: 00:00:04.03 
10. 
11. Execution Plan 
12. ---------------------------------------------------------- 
13. 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes 
14. =2001024) 
15. 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL'
16. Bytes=2001024) 
17. 
18. 
19. Statistics 
20. ---------------------------------------------------------- 
21. 0 recursive calls 
22. 0 db block gets 
23. 11778 consistent gets 
24. 3891 physical reads 
25. 0 redo size 
26. 4123553 bytes sent via SQL*Net to client 
27. 61901 bytes received via SQL*Net from client 
28. 5584 SQL*Net roundtrips to/from client 
29. 0 sorts (memory) 
30. 0 sorts (disk) 
31. 83743 rows processed 
 
SQL> select * from test_normal where sal between &sal1 and &sal2;
Enter value for sal1: 1500
Enter value for sal2: 2000
old   1: select * from test_normal where sal between &sal1 and &sal2
new   1: select * from test_normal where sal between 1500 and 2000

83743 rows selected.

Elapsed: 00:00:04.03

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=83376 Bytes
          =2001024)
   1    0   TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=83376
          Bytes=2001024)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      11778  consistent gets
       3891  physical reads
          0  redo size
    4123553  bytes sent via SQL*Net to client
      61901  bytes received via SQL*Net from client
       5584  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
      83743  rows processed

在不同的数据集上执行查询的结果如下,可以看出逻辑和物理I/O的次数基本上是相同的。

BITMAP

SAL (Equality)

B-TREE

Rows Fetched

Consistent Reads

Physical Reads

Consistent Reads

Physical Reads

165

0

1869

177

164

 

169

163

3548

181

167

 

174

166

6500

187

172

 

75

69

7000

81

73

 

177

163

2500

190

175

 

BITMAP

SAL (Range)

B-TREE

Rows Fetched

Consistent Reads

Physical Reads

Consistent Reads

Physical Reads

11778

5850

1500-2000

11778

3891

83743

11765

5468

2000-2500

11765

3879

83328

11753

5471

2500-3000

11753

3884

83318

17309

5472

3000-4000

17309

3892

166999

39398

5454

4000-7000

39398

3973

500520

对于范围查询,优化器选择了全表扫描,根本没有使用索引。但是对于相等性查询,优化器使用了索引。再次,逻辑和物理I/O是相同的。

因此,可以得出结论,对于一个具有normal-cardinality的列来说,优化器对于两种类型的索引的选择是相同的,并且没有明显的I/O差异。

步骤6(增加GENDER列)

在测试low-cardinality列之前,我们先增加一个GENDER列并且把它的值更新成M,F或者null。



1. SQL> alter table test_normal add GENDER varchar2(1);  
2. 
3. Table altered. 
4. 
5. SQL> select GENDER, count(*) from test_normal group by GENDER; 
6. 
7. S COUNT(*) 
8. - ---------- 
9. F 333769 
10. M 499921 
11. 166310 
12. 
13. 3 rows selected. 
 
SQL> alter table test_normal add GENDER varchar2(1);

Table altered.

SQL> select GENDER, count(*) from test_normal group by GENDER;

S     COUNT(*)
-     ----------
F     333769
M     499921
      166310

3 rows selected.

该列上位图索引的大小大约为570KB,如下表所示:



1. SQL> create bitmap index normal_GENDER_bmx on test_normal(GENDER); 
2. 
3. Index created. 
4. 
5. Elapsed: 00:00:02.08 
6. 
7. SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
8. 2 from user_segments 
9. 3 where segment_name in ('TEST_NORMAL','NORMAL_GENDER_BMX'); 
10. 
11. SEGMENT_NAME Size in
12. ------------------------------ --------------- 
13. TEST_NORMAL 50 
14. NORMAL_GENDER_BMX .5625 
15. 
16. 2 rows selected. 
 
SQL> create bitmap index normal_GENDER_bmx on test_normal(GENDER);

Index created.

Elapsed: 00:00:02.08

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2  from user_segments
  3  where segment_name in ('TEST_NORMAL','NORMAL_GENDER_BMX');

SEGMENT_NAME                        Size in MB
------------------------------      ---------------
TEST_NORMAL                         50
NORMAL_GENDER_BMX                   .5625

2 rows selected.
相对而言,改列上的B树索引的大小为13M,比位图索引大的多。
 
 
 
1. SQL> create index normal_GENDER_idx on test_normal(GENDER); 
2. 
3. Index created. 
4. 
5. SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
6. 2 from user_segments 
7. 3 where segment_name in ('TEST_NORMAL','NORMAL_GENDER_IDX'); 
8. 
9. SEGMENT_NAME Size in
10. ------------------------------ --------------- 
11. TEST_NORMAL 50 
12. NORMAL_GENDER_IDX 13 
13. 
14. 2 rows selected. 
 
SQL> create index normal_GENDER_idx on test_normal(GENDER);

Index created.

SQL> select substr(segment_name,1,30) segment_name, bytes/1024/1024 "Size in MB"
  2  from user_segments
  3  where segment_name in ('TEST_NORMAL','NORMAL_GENDER_IDX');

SEGMENT_NAME                       Size in MB
------------------------------     ---------------
TEST_NORMAL                        50
NORMAL_GENDER_IDX                  13

2 rows selected.

现在,执行相等性查询,优化器将不使用该索引,不论是位图索引还是B树索引,它将使用全部扫描。



1. SQL> select * from test_normal where GENDER is null; 
2. 
3. 166310 rows selected. 
4. 
5. Elapsed: 00:00:06.08 
6. 
7. Execution Plan 
8. ---------------------------------------------------------- 
9. 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=166310 Bytes=4157750) 
10. 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL'
11. 
12. SQL> select * from test_normal where GENDER='M'; 
13. 
14. 499921 rows selected. 
15. 
16. Elapsed: 00:00:16.07 
17. 
18. Execution Plan 
19. ----------------------------------------------------------  
20. 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=499921 Bytes=12498025) 
21. 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL'
22. 
23. SQL>select * from test_normal where GENDER='F'
24. / 
25. 
26. 333769 rows selected. 
27. 
28. Elapsed: 00:00:12.02 
29. 
30. Execution Plan 
31. ----------------------------------------------------------  
32. 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=333769 Byte 
33. s=8344225) 
34. 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL'
35. Bytes=8344225) 
 
SQL> select * from test_normal where GENDER is null;

166310 rows selected.

Elapsed: 00:00:06.08

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=166310 Bytes=4157750)
   1    0   TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=166310 Bytes=4157750)

SQL> select * from test_normal where GENDER='M';

499921 rows selected.

Elapsed: 00:00:16.07

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=499921 Bytes=12498025)
   1    0   TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=499921Bytes=12498025)

SQL>select * from test_normal where GENDER='F'
 /

333769 rows selected.

Elapsed: 00:00:12.02

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=333769 Byte
          s=8344225)
   1    0   TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=333769
           Bytes=8344225)

结论

现在我们了解了优化器对于这些技术做出的反应,现在我们来看对于位图索引和B树索引最适合的程序。

保持GENDER列上的位图索引,在SAL列上再建立一个位图索引然后执行一些查询。对这些列上的B树索引执行同样的查询。

在表test_normal中,你需要所有工资等于下列值的所有女性雇员的雇员号码:

1000
 1500
 2000
 2500
 3000
 3500
 4000
 4500因此:
 
 
 
1. SQL>select * from test_normal 
2. where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M'; 
 
SQL>select * from test_normal 
where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M';

这是一个典型的数据仓库查询,绝对不要在OLTP系统中执行该查询。下面是两列上具有位图索引时的结果:


1. SQL>select * from test_normal 
2. where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M'; 
3. 
4. 1453 rows selected. 
5. 
6. Elapsed: 00:00:02.03 
7. 
8. Execution Plan 
9. ----------------------------------------------------------  
10. 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=198 Card=754 Bytes=18850)  
11. 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL'
12. 2 1 BITMAP CONVERSION (TO ROWIDS) 
13. 3 2 BITMAP AND 
14. 4 3 BITMAP OR 
15. 5 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
16. 6 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
17. 7 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
18. 8 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
19. 9 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
20. 10 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
21. 11 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
22. 12 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
23. 13 4 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
24. 14 3 BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_GENDER_BMX'
25. 
26. Statistics 
27. ----------------------------------------------------------  
28. 0 recursive calls 
29. 0 db block gets 
30. 1353 consistent gets 
31. 920 physical reads 
32. 0 redo size 
33. 75604 bytes sent via SQL*Net to client 
34. 1555 bytes received via SQL*Net from client 
35. 98 SQL*Net roundtrips to/from client 
36. 0 sorts (memory) 
37. 0 sorts (disk) 
38. 1453 rows processed 
 
SQL>select * from test_normal 
where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M';

1453 rows selected.

Elapsed: 00:00:02.03

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=198 Card=754 Bytes=18850)
   1    0   TABLE ACCESS (BY INDEX ROWID) OF 'TEST_NORMAL' (Cost=198 Card=754 Bytes=18850)
   2    1     BITMAP CONVERSION (TO ROWIDS)
   3    2       BITMAP AND
   4    3         BITMAP OR
   5    4           BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
   6    4           BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
   7    4           BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
   8    4           BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
   9    4           BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
  10    4           BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
  11    4           BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
  12    4           BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
  13    4           BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_SAL_BMX'
  14    3         BITMAP INDEX (SINGLE VALUE) OF 'NORMAL_GENDER_BMX'

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1353  consistent gets
        920  physical reads
          0  redo size
      75604  bytes sent via SQL*Net to client
       1555  bytes received via SQL*Net from client
         98  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       1453  rows processed
下面是B树索引时的结果:




1. SQL>select * from test_normal 
2. where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M'; 
3. 
4. 1453 rows selected. 
5. 
6. Elapsed: 00:00:03.01 
7. 
8. Execution Plan 
9. ----------------------------------------------------------  
10. 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=754 Bytes=18850)  
11. 1 0 TABLE ACCESS (FULL) OF 'TEST_NORMAL'
12. 
13. Statistics 
14. ---------------------------------------------------------- 
15. 0 recursive calls 
16. 0 db block gets 
17. 6333 consistent gets 
18. 4412 physical reads 
19. 0 redo size 
20. 75604 bytes sent via SQL*Net to client 
21. 1555 bytes received via SQL*Net from client 
22. 98 SQL*Net roundtrips to/from client 
23. 0 sorts (memory) 
24. 0 sorts (disk) 
25. 1453 rows processed 
 
SQL>select * from test_normal 
where sal in (1000,1500,2000,2500,3000,3500,4000,4500,5000) and GENDER='M';

1453 rows selected.

Elapsed: 00:00:03.01

Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=601 Card=754 Bytes=18850)
   1    0   TABLE ACCESS (FULL) OF 'TEST_NORMAL' (Cost=601 Card=754 Bytes=18850)

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       6333  consistent gets
       4412  physical reads
          0  redo size
      75604  bytes sent via SQL*Net to client
       1555  bytes received via SQL*Net from client
         98  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
       1453  row

s processed

可以看出,如果使用B树索引,优化器使用全表扫描;而对于位图索引,则使用索引。从获取结果需要的I/O次数可以推断出性能。

总之,基于如下的原因,位图索引适用于决策支持系统,而不管cardinality的高低:

  • 使用位图索引,优化器可以高效地执行包含AND,OR或者XOR的查询。
  • 使用位图索引,优化器可以回答对null的查询和计数。null值在位图索引时同样被加上索引(不像B树索引)。
  • 最重要的是,在决策支持系统中,位图索引支持特殊的查询,但B树索引则不能。具体来说,如果你有一个包含50列的表,用户经常查询其中的10列 ---- 10列的组合或者有时是其中一列,创建B树索引会比价困难。如果你在这些列上建立10个位图索引,这些查询都可以通过索引回答,不论你查询的是全部10列,还是10列中的4列或者6列,或者其中的一列。

相反,B树索引非常适合于OLTP系统,其用户执行的都是常规的查询。因为在OLTP系统中,数据会频繁地更新和删除,如果使用位图索引将会引起严重的锁定性能问题。

两种索引都有一个共同的目的:尽快地得到结果。但是你应该依据程序的类型来选择其中之一,而不是根据cardinality水平。