在我们一般的观点中,总是认为使用EXISTS(或NOT EXISTS)通常将提高查询的效率,所以一般推荐使用exists来代替in。但实际情况是不是这个样子呢?我们用实际的例子来看一下,我们分别在两种不同的优化器模式下来看。
create table test1 as select * from dba_objects ;
create table test2 as select * from dba_tables ;
create index idx_object_name on test1(object_name) ;
create index idx_table_name on test2(table_name) ;
exec dbms_stats.gather_table_stats('XIGUA','TEST1') ;
exec dbms_stats.gather_table_stats('XIGUA','TEST2') ;
在CBO模式下:
select * from test1
where object_name in
( select table_name from test2 where table_name like 'M%') ;
419 rows selected.
Elapsed: 00:00:00.09
Execution Plan
----------------------------------------------------------
Plan hash value: 2580328806
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Tim
e |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 423 | 47376 | 163 (2)| 00:
00:02 |
|* 1 | HASH JOIN RIGHT SEMI| | 423 | 47376 | 163 (2)| 00:
00:02 |
|* 2 | INDEX RANGE SCAN | IDX_TABLE_NAME | 420 | 7980 | 3 (0)| 00:
00:01 |
|* 3 | TABLE ACCESS FULL | TEST1 | 1125 | 102K| 160 (2)| 00:
00:02 |
-------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("OBJECT_NAME"="TABLE_NAME")
2 - access("TABLE_NAME" LIKE 'M%')
filter("TABLE_NAME" LIKE 'M%')
3 - filter("OBJECT_NAME" LIKE 'M%')
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
727 consistent gets
0 physical reads
0 redo size
22152 bytes sent via SQL*Net to client
682 bytes received via SQL*Net from client
29 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
419 rows processed
select * from test1
where exists
( select 1 from test2 where test1.object_name = test2.table_name and test2.table_name like 'M%') ;
419 rows selected.
Elapsed: 00:00:00.08
Execution Plan
----------------------------------------------------------
Plan hash value: 2580328806
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Tim
e |
-------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 423 | 47376 | 163 (2)| 00:
00:02 |
|* 1 | HASH JOIN RIGHT SEMI| | 423 | 47376 | 163 (2)| 00:
00:02 |
|* 2 | INDEX RANGE SCAN | IDX_TABLE_NAME | 420 | 7980 | 3 (0)| 00:
00:01 |
|* 3 | TABLE ACCESS FULL | TEST1 | 1125 | 102K| 160 (2)| 00:
00:02 |
-------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("TEST1"."OBJECT_NAME"="TEST2"."TABLE_NAME")
2 - access("TEST2"."TABLE_NAME" LIKE 'M%')
filter("TEST2"."TABLE_NAME" LIKE 'M%')
3 - filter("TEST1"."OBJECT_NAME" LIKE 'M%')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
727 consistent gets
0 physical reads
0 redo size
22152 bytes sent via SQL*Net to client
682 bytes received via SQL*Net from client
29 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
419 rows processed
在CBO模式下,我们可以看到这两者的执行计划完全相同,统计数据也相同。
我们再来看一下RBO模式下的情况,这种情况相对复杂一些。
SQL> select /*+ rule */ * from test1
2 where object_name in
3 ( select table_name from test2 where table_name like 'M%') ;
419 rows selected.
Elapsed: 00:00:00.13
Execution Plan
----------------------------------------------------------
Plan hash value: 618497113
-------------------------------------------------------
| Id | Operation | Name |
-------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST1 |
| 2 | NESTED LOOPS | |
| 3 | VIEW | VW_NSO_1 |
| 4 | SORT UNIQUE | |
|* 5 | INDEX RANGE SCAN | IDX_TABLE_NAME |
|* 6 | INDEX RANGE SCAN | IDX_OBJECT_NAME |
-------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("TABLE_NAME" LIKE 'M%')
filter("TABLE_NAME" LIKE 'M%')
6 - access("OBJECT_NAME"="$nso_col_1")
Note
-----
- rule based optimizer used (consider using cbo)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
646 consistent gets
0 physical reads
0 redo size
27884 bytes sent via SQL*Net to client
682 bytes received via SQL*Net from client
29 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
419 rows processed
SQL> select /*+ rule */ * from test1
2 where exists
3 ( select 1 from test2 where test1.object_name = test2.table_name and test2.table_name like 'M%') ;
419 rows selected.
Elapsed: 00:00:00.87
Execution Plan
----------------------------------------------------------
Plan hash value: 1694944701
---------------------------------------------
| Id | Operation | Name |
---------------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | FILTER | |
| 2 | TABLE ACCESS FULL| TEST1 |
|* 3 | INDEX RANGE SCAN | IDX_TABLE_NAME |
---------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM "TEST2" "TEST2" WHERE
"TEST2"."TABLE_NAME"=:B1 AND "TEST2"."TABLE_NAME" LIKE 'M%'))
3 - access("TEST2"."TABLE_NAME"=:B1)
filter("TEST2"."TABLE_NAME" LIKE 'M%')
Note
-----
- rule based optimizer used (consider using cbo)
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
70840 consistent gets
0 physical reads
0 redo size
22152 bytes sent via SQL*Net to client
682 bytes received via SQL*Net from client
29 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
419 rows processed
在这里,我们可以看到实际上,使用in效率比exists效率更高。我们可以这样来理解这种情况:
对于in,rule优化器选择的内存查询的结果作为驱动表来进行nest loops连接,所以当内存查询的结果集比较小的时候,这个in的效率还是比较高的。
对于exists,则是利用外查询表的全表扫描结果集过滤内查询的结果集,当外查询的表比较大的时候,相对效率比较低。
再来看一下内查询结果集比较大,而外查询较小的时候的情况。
SQL> select /*+ rule */ * from test2
where exists
2 3 ( select 1 from test1 where test1.object_name = test2.table_name and test1.object_name like 'S%') ;
172 rows selected.
Elapsed: 00:00:00.08
Execution Plan
----------------------------------------------------------
Plan hash value: 833525739
----------------------------------------------
| Id | Operation | Name |
----------------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | FILTER | |
| 2 | TABLE ACCESS FULL| TEST2 |
|* 3 | INDEX RANGE SCAN | IDX_OBJECT_NAME |
----------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM "TEST1" "TEST1" WHERE
"TEST1"."OBJECT_NAME"=:B1 AND "TEST1"."OBJECT_NAME" LIKE 'S%'))
3 - access("TEST1"."OBJECT_NAME"=:B1)
filter("TEST1"."OBJECT_NAME" LIKE 'S%')
Note
-----
- rule based optimizer used (consider using cbo)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
4837 consistent gets
0 physical reads
2596 redo size
13642 bytes sent via SQL*Net to client
506 bytes received via SQL*Net from client
13 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
172 rows processed
SQL> select /*+ rule */ * from test2
2 where table_name in
3 ( select object_name from test1 where object_name like 'S%') ;
172 rows selected.
Elapsed: 00:00:00.12
Execution Plan
----------------------------------------------------------
Plan hash value: 2497755124
-------------------------------------------------------
| Id | Operation | Name |
-------------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST2 |
| 2 | NESTED LOOPS | |
| 3 | VIEW | VW_NSO_1 |
| 4 | SORT UNIQUE | |
|* 5 | INDEX RANGE SCAN | IDX_OBJECT_NAME |
|* 6 | INDEX RANGE SCAN | IDX_TABLE_NAME |
-------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("OBJECT_NAME" LIKE 'S%')
filter("OBJECT_NAME" LIKE 'S%')
6 - access("TABLE_NAME"="$nso_col_1")
Note
-----
- rule based optimizer used (consider using cbo)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
2214 consistent gets
0 physical reads
7556 redo size
13378 bytes sent via SQL*Net to client
506 bytes received via SQL*Net from client
13 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
172 rows processed
从理论上分析,当内存查询的结果集比较大,而外查询比较小的时候,应该增加in的cost,对于exists的filter来说,也应该效率有所提高,所以再这种情况下,我们应该看到exists效率更高。但是在实际的测试中,我们可以看到,这两者并不存在明显的性能上的差异,甚至于用in的效果比exists更好。
总结:
1、在CBO模式下,两者效率一致。
2、在RBO模式下,当外层数据集远大于内层数据集时,使用in的效率比较高。
3、在RBO模式下,当外层数据集远小于内层数据集时,两者效率相差不大。
可能我的测试数据选择不太理想,欢迎大家讨论。
==========
补充not exists和not in的测试数据
先来看一下CBO模式下的情况:
SQL> select * from test2
2 where table_name not in
3 ( select object_name from test1 where object_name like 'A%') ;
2950 rows selected.
Elapsed: 00:00:13.93
Execution Plan
----------------------------------------------------------
Plan hash value: 833525739
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time
|
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1576 | 315K| 6364 (1)| 00:0
1:17 |
|* 1 | FILTER | | | | |
|
| 2 | TABLE ACCESS FULL| TEST2 | 1577 | 315K| 14 (0)| 00:0
0:01 |
|* 3 | INDEX RANGE SCAN | IDX_OBJECT_NAME | 1 | 25 | 8 (0)| 00:0
0:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "TEST1" "TEST1" WHERE
"OBJECT_NAME" LIKE 'A%' AND LNNVL("OBJECT_NAME"<>:B1)))
3 - access("OBJECT_NAME" LIKE 'A%')
filter("OBJECT_NAME" LIKE 'A%' AND LNNVL("OBJECT_NAME"<>:B1))
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
60806 consistent gets
29 physical reads
1104 redo size
165661 bytes sent via SQL*Net to client
2541 bytes received via SQL*Net from client
198 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2950 rows processed
SQL> select * from test2
2 where not exists
3 ( select 1 from test1 where test1.object_name = test2.table_name and test1.object_name like 'A%') ;
2950 rows selected.
Elapsed: 00:00:00.19
Execution Plan
----------------------------------------------------------
Plan hash value: 349475409
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Ti
me |
--------------------------------------------------------------------------------
--------
| 0 | SELECT STATEMENT | | 1538 | 345K| 23 (5)| 00
:00:01 |
|* 1 | HASH JOIN RIGHT ANTI| | 1538 | 345K| 23 (5)| 00
:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_OBJECT_NAME | 1259 | 31475 | 8 (0)| 00
:00:01 |
| 3 | TABLE ACCESS FULL | TEST2 | 1577 | 315K| 14 (0)| 00
:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("TEST1"."OBJECT_NAME"="TEST2"."TABLE_NAME")
2 - access("TEST1"."OBJECT_NAME" LIKE 'A%')
filter("TEST1"."OBJECT_NAME" LIKE 'A%')
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
314 consistent gets
0 physical reads
0 redo size
165661 bytes sent via SQL*Net to client
2541 bytes received via SQL*Net from client
198 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
2950 rows processed
在CBO模式下,not in采用的是filter,对外层表做全表扫描,再去filter内层查询结果集。not exists采用内层查询结果集作hash join连接。
从结果明显可以看到,not exists效率比较高。把内层和外层的表对换一下,结果也是相同的,not exists的效率明显高很多。
在rule模式下:
SQL> select /*+ rule */ * from test1
2 where object_name not in
3 ( select table_name from test2 where table_name like 'A%' ) ;
229092 rows selected.
Elapsed: 00:00:22.34
Execution Plan
----------------------------------------------------------
Plan hash value: 1694944701
---------------------------------------------
| Id | Operation | Name |
---------------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | FILTER | |
| 2 | TABLE ACCESS FULL| TEST1 |
|* 3 | INDEX RANGE SCAN | IDX_TABLE_NAME |
---------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( NOT EXISTS (SELECT 0 FROM "TEST2" "TEST2" WHERE
LNNVL("TABLE_NAME"<>:B1) AND "TABLE_NAME" LIKE 'A%'))
3 - access("TABLE_NAME" LIKE 'A%')
filter(LNNVL("TABLE_NAME"<>:B1) AND "TABLE_NAME" LIKE 'A%')
Note
-----
- rule based optimizer used (consider using cbo)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
348779 consistent gets
0 physical reads
0 redo size
11059645 bytes sent via SQL*Net to client
168377 bytes received via SQL*Net from client
15274 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
229092 rows processed
SQL>
SQL> select /*+ rule */ * from test1
2 where not exists
3 ( select 1 from test2 where test1.object_name = test2.table_name and test2.table_name like 'A%' ) ;
229092 rows selected.
Elapsed: 00:00:08.00
Execution Plan
----------------------------------------------------------
Plan hash value: 1694944701
---------------------------------------------
| Id | Operation | Name |
---------------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | FILTER | |
| 2 | TABLE ACCESS FULL| TEST1 |
|* 3 | INDEX RANGE SCAN | IDX_TABLE_NAME |
---------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( NOT EXISTS (SELECT 0 FROM "TEST2" "TEST2" WHERE
"TEST2"."TABLE_NAME"=:B1 AND "TEST2"."TABLE_NAME" LIKE 'A%'))
3 - access("TEST2"."TABLE_NAME"=:B1)
filter("TEST2"."TABLE_NAME" LIKE 'A%')
Note
-----
- rule based optimizer used (consider using cbo)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
352897 consistent gets
0 physical reads
0 redo size
11059645 bytes sent via SQL*Net to client
168377 bytes received via SQL*Net from client
15274 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
229092 rows processed
在RBO模式下,两者的执行计划完全相同。
总结:
1、CBO模式下,not exists效率较高。
2、RBO模式下,两者效率差不多。
[本帖最后由 BTxigua 于 2008-10-14 10:42 编辑]