在我们一般的观点中,总是认为使用EXISTS(或NOT EXISTS)通常将提高查询的效率,所以一般推荐使用exists来代替in。但实际情况是不是这个样子呢?我们用实际的例子来看一下,我们分别在两种不同的优化器模式下来看。

create table test1 as select * from dba_objects ;
create table test2 as select * from dba_tables ;
create index idx_object_name on test1(object_name) ;
create index idx_table_name on test2(table_name) ;
exec dbms_stats.gather_table_stats('XIGUA','TEST1') ;
exec dbms_stats.gather_table_stats('XIGUA','TEST2') ;

在CBO模式下:

select * from test1
where object_name in
( select table_name from test2 where table_name like 'M%') ;
419 rows selected.
Elapsed: 00:00:00.09
Execution Plan
----------------------------------------------------------
Plan hash value: 2580328806
-------------------------------------------------------------------------------
| Id  | Operation            | Name           | Rows  | Bytes | Cost (%CPU)| Tim
e     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                |   423 | 47376 |   163   (2)| 00:
00:02 |
|*  1 |  HASH JOIN RIGHT SEMI|                |   423 | 47376 |   163   (2)| 00:
00:02 |
|*  2 |   INDEX RANGE SCAN   | IDX_TABLE_NAME |   420 |  7980 |     3   (0)| 00:
00:01 |
|*  3 |   TABLE ACCESS FULL  | TEST1          |  1125 |   102K|   160   (2)| 00:
00:02 |
-------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("OBJECT_NAME"="TABLE_NAME")
2 - access("TABLE_NAME" LIKE 'M%')
filter("TABLE_NAME" LIKE 'M%')
3 - filter("OBJECT_NAME" LIKE 'M%')
Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
727  consistent gets
0  physical reads
0  redo size
22152  bytes sent via SQL*Net to client
682  bytes received via SQL*Net from client
29  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
419  rows processed
select * from test1
where exists
( select 1 from test2 where test1.object_name = test2.table_name and test2.table_name like 'M%') ;
419 rows selected.
Elapsed: 00:00:00.08
Execution Plan
----------------------------------------------------------
Plan hash value: 2580328806
| Id  | Operation            | Name           | Rows  | Bytes | Cost (%CPU)| Tim
e     |
-------------------------------------------------------------------------------|   0 | SELECT STATEMENT     |                |   423 | 47376 |   163   (2)| 00:
00:02 |
|*  1 |  HASH JOIN RIGHT SEMI|                |   423 | 47376 |   163   (2)| 00:
00:02 |
|*  2 |   INDEX RANGE SCAN   | IDX_TABLE_NAME |   420 |  7980 |     3   (0)| 00:
00:01 |
|*  3 |   TABLE ACCESS FULL  | TEST1          |  1125 |   102K|   160   (2)| 00:
00:02 |
-------------------------------------------------------------------------------Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("TEST1"."OBJECT_NAME"="TEST2"."TABLE_NAME")
2 - access("TEST2"."TABLE_NAME" LIKE 'M%')
filter("TEST2"."TABLE_NAME" LIKE 'M%')
3 - filter("TEST1"."OBJECT_NAME" LIKE 'M%')
Statistics
----------------------------------------------------------
1  recursive calls
0  db block gets
727  consistent gets
0  physical reads
0  redo size
22152  bytes sent via SQL*Net to client
682  bytes received via SQL*Net from client
29  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
419  rows processed

在CBO模式下,我们可以看到这两者的执行计划完全相同,统计数据也相同。

我们再来看一下RBO模式下的情况,这种情况相对复杂一些。

SQL> select /*+ rule */ * from test1
2  where object_name in
3        ( select table_name from test2 where table_name like 'M%') ;
419 rows selected.
Elapsed: 00:00:00.13
Execution Plan
----------------------------------------------------------
Plan hash value: 618497113
-------------------------------------------------------
| Id  | Operation                   | Name            |
-------------------------------------------------------
|   0 | SELECT STATEMENT            |                 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST1           |
|   2 |   NESTED LOOPS              |                 |
|   3 |    VIEW                     | VW_NSO_1        |
|   4 |     SORT UNIQUE             |                 |
|*  5 |      INDEX RANGE SCAN       | IDX_TABLE_NAME  |
|*  6 |    INDEX RANGE SCAN         | IDX_OBJECT_NAME |
-------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("TABLE_NAME" LIKE 'M%')
filter("TABLE_NAME" LIKE 'M%')
6 - access("OBJECT_NAME"="$nso_col_1")
Note
-----
- rule based optimizer used (consider using cbo)
Statistics
----------------------------------------------------------
1  recursive calls
0  db block gets
646  consistent gets
0  physical reads
0  redo size
27884  bytes sent via SQL*Net to client
682  bytes received via SQL*Net from client
29  SQL*Net roundtrips to/from client
1  sorts (memory)
0  sorts (disk)
419  rows processed
SQL> select /*+ rule */ * from test1
2  where exists
3        ( select 1 from test2 where test1.object_name = test2.table_name and test2.table_name like 'M%') ;
419 rows selected.
Elapsed: 00:00:00.87
Execution Plan
----------------------------------------------------------
Plan hash value: 1694944701
---------------------------------------------
| Id  | Operation          | Name           |
---------------------------------------------
|   0 | SELECT STATEMENT   |                |
|*  1 |  FILTER            |                |
|   2 |   TABLE ACCESS FULL| TEST1          |
|*  3 |   INDEX RANGE SCAN | IDX_TABLE_NAME |
---------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM "TEST2" "TEST2" WHERE
"TEST2"."TABLE_NAME"=:B1 AND "TEST2"."TABLE_NAME" LIKE 'M%'))
3 - access("TEST2"."TABLE_NAME"=:B1)
filter("TEST2"."TABLE_NAME" LIKE 'M%')
Note
-----
- rule based optimizer used (consider using cbo)
Statistics
----------------------------------------------------------
1  recursive calls
0  db block gets
70840  consistent gets
0  physical reads
0  redo size
22152  bytes sent via SQL*Net to client
682  bytes received via SQL*Net from client
29  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
419  rows processed

在这里,我们可以看到实际上,使用in效率比exists效率更高。我们可以这样来理解这种情况:

对于in,rule优化器选择的内存查询的结果作为驱动表来进行nest loops连接,所以当内存查询的结果集比较小的时候,这个in的效率还是比较高的。

对于exists,则是利用外查询表的全表扫描结果集过滤内查询的结果集,当外查询的表比较大的时候,相对效率比较低。

再来看一下内查询结果集比较大,而外查询较小的时候的情况。

SQL> select /*+ rule */ * from test2
where exists
2    3        ( select 1 from test1 where test1.object_name = test2.table_name and test1.object_name like 'S%') ;
172 rows selected.
Elapsed: 00:00:00.08
Execution Plan
----------------------------------------------------------
Plan hash value: 833525739
----------------------------------------------
| Id  | Operation          | Name            |
----------------------------------------------
|   0 | SELECT STATEMENT   |                 |
|*  1 |  FILTER            |                 |
|   2 |   TABLE ACCESS FULL| TEST2           |
|*  3 |   INDEX RANGE SCAN | IDX_OBJECT_NAME |
----------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( EXISTS (SELECT 0 FROM "TEST1" "TEST1" WHERE
"TEST1"."OBJECT_NAME"=:B1 AND "TEST1"."OBJECT_NAME" LIKE 'S%'))
3 - access("TEST1"."OBJECT_NAME"=:B1)
filter("TEST1"."OBJECT_NAME" LIKE 'S%')
Note
-----
- rule based optimizer used (consider using cbo)
Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
4837  consistent gets
0  physical reads
2596  redo size
13642  bytes sent via SQL*Net to client
506  bytes received via SQL*Net from client
13  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
172  rows processed
SQL> select /*+ rule */ * from test2
2  where table_name in
3        ( select object_name from test1 where object_name like 'S%') ;
172 rows selected.
Elapsed: 00:00:00.12
Execution Plan
----------------------------------------------------------
Plan hash value: 2497755124
-------------------------------------------------------
| Id  | Operation                   | Name            |
-------------------------------------------------------
|   0 | SELECT STATEMENT            |                 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TEST2           |
|   2 |   NESTED LOOPS              |                 |
|   3 |    VIEW                     | VW_NSO_1        |
|   4 |     SORT UNIQUE             |                 |
|*  5 |      INDEX RANGE SCAN       | IDX_OBJECT_NAME |
|*  6 |    INDEX RANGE SCAN         | IDX_TABLE_NAME  |
-------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("OBJECT_NAME" LIKE 'S%')
filter("OBJECT_NAME" LIKE 'S%')
6 - access("TABLE_NAME"="$nso_col_1")
Note
-----
- rule based optimizer used (consider using cbo)
Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
2214  consistent gets
0  physical reads
7556  redo size
13378  bytes sent via SQL*Net to client
506  bytes received via SQL*Net from client
13  SQL*Net roundtrips to/from client
1  sorts (memory)
0  sorts (disk)
172  rows processed

从理论上分析,当内存查询的结果集比较大,而外查询比较小的时候,应该增加in的cost,对于exists的filter来说,也应该效率有所提高,所以再这种情况下,我们应该看到exists效率更高。但是在实际的测试中,我们可以看到,这两者并不存在明显的性能上的差异,甚至于用in的效果比exists更好。

总结:

1、在CBO模式下,两者效率一致。

2、在RBO模式下,当外层数据集远大于内层数据集时,使用in的效率比较高。

3、在RBO模式下,当外层数据集远小于内层数据集时,两者效率相差不大。

可能我的测试数据选择不太理想,欢迎大家讨论。

==========

补充not exists和not in的测试数据

先来看一下CBO模式下的情况:

SQL> select * from test2
2  where table_name not in
3        ( select object_name from test1 where object_name like 'A%') ;
2950 rows selected.
Elapsed: 00:00:13.93
Execution Plan
----------------------------------------------------------
Plan hash value: 833525739
---------------------------------------------------------------------------
| Id  | Operation          | Name            | Rows  | Bytes | Cost (%CPU)| Time
|
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                 |  1576 |   315K|  6364   (1)| 00:0
1:17 |
|*  1 |  FILTER            |                 |       |       |            |
|
|   2 |   TABLE ACCESS FULL| TEST2           |  1577 |   315K|    14   (0)| 00:0
0:01 |
|*  3 |   INDEX RANGE SCAN | IDX_OBJECT_NAME |     1 |    25 |     8   (0)| 00:0
0:01 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( NOT EXISTS (SELECT /*+ */ 0 FROM "TEST1" "TEST1" WHERE
"OBJECT_NAME" LIKE 'A%' AND LNNVL("OBJECT_NAME"<>:B1)))
3 - access("OBJECT_NAME" LIKE 'A%')
filter("OBJECT_NAME" LIKE 'A%' AND LNNVL("OBJECT_NAME"<>:B1))
Statistics
----------------------------------------------------------
1  recursive calls
0  db block gets
60806  consistent gets
29  physical reads
1104  redo size
165661  bytes sent via SQL*Net to client
2541  bytes received via SQL*Net from client
198  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
2950  rows processed
SQL> select * from test2
2  where not exists
3        ( select 1 from test1 where test1.object_name = test2.table_name and test1.object_name like 'A%') ;
2950 rows selected.
Elapsed: 00:00:00.19
Execution Plan
----------------------------------------------------------
Plan hash value: 349475409
-------------------------------------------------------------------------------
| Id  | Operation            | Name            | Rows  | Bytes | Cost (%CPU)| Ti
me     |
--------------------------------------------------------------------------------
--------
|   0 | SELECT STATEMENT     |                 |  1538 |   345K|    23   (5)| 00
:00:01 |
|*  1 |  HASH JOIN RIGHT ANTI|                 |  1538 |   345K|    23   (5)| 00
:00:01 |
|*  2 |   INDEX RANGE SCAN   | IDX_OBJECT_NAME |  1259 | 31475 |     8   (0)| 00
:00:01 |
|   3 |   TABLE ACCESS FULL  | TEST2           |  1577 |   315K|    14   (0)| 00
:00:01 |
-------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("TEST1"."OBJECT_NAME"="TEST2"."TABLE_NAME")
2 - access("TEST1"."OBJECT_NAME" LIKE 'A%')
filter("TEST1"."OBJECT_NAME" LIKE 'A%')
Statistics
----------------------------------------------------------
1  recursive calls
0  db block gets
314  consistent gets
0  physical reads
0  redo size
165661  bytes sent via SQL*Net to client
2541  bytes received via SQL*Net from client
198  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
2950  rows processed

在CBO模式下,not in采用的是filter,对外层表做全表扫描,再去filter内层查询结果集。not exists采用内层查询结果集作hash join连接。

从结果明显可以看到,not exists效率比较高。把内层和外层的表对换一下,结果也是相同的,not exists的效率明显高很多。

在rule模式下:

SQL> select /*+ rule */ * from test1
2  where object_name not in
3        ( select table_name from test2 where table_name like 'A%' ) ;
229092 rows selected.
Elapsed: 00:00:22.34
Execution Plan
----------------------------------------------------------
Plan hash value: 1694944701
---------------------------------------------
| Id  | Operation          | Name           |
---------------------------------------------
|   0 | SELECT STATEMENT   |                |
|*  1 |  FILTER            |                |
|   2 |   TABLE ACCESS FULL| TEST1          |
|*  3 |   INDEX RANGE SCAN | IDX_TABLE_NAME |
---------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( NOT EXISTS (SELECT 0 FROM "TEST2" "TEST2" WHERE
LNNVL("TABLE_NAME"<>:B1) AND "TABLE_NAME" LIKE 'A%'))
3 - access("TABLE_NAME" LIKE 'A%')
filter(LNNVL("TABLE_NAME"<>:B1) AND "TABLE_NAME" LIKE 'A%')
Note
-----
- rule based optimizer used (consider using cbo)
Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
348779  consistent gets
0  physical reads
0  redo size
11059645  bytes sent via SQL*Net to client
168377  bytes received via SQL*Net from client
15274  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
229092  rows processed
SQL>
SQL> select /*+ rule */ * from test1
2  where not exists
3        ( select 1 from test2 where test1.object_name = test2.table_name and test2.table_name like 'A%' ) ;
229092 rows selected.
Elapsed: 00:00:08.00
Execution Plan
----------------------------------------------------------
Plan hash value: 1694944701
---------------------------------------------
| Id  | Operation          | Name           |
---------------------------------------------
|   0 | SELECT STATEMENT   |                |
|*  1 |  FILTER            |                |
|   2 |   TABLE ACCESS FULL| TEST1          |
|*  3 |   INDEX RANGE SCAN | IDX_TABLE_NAME |
---------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( NOT EXISTS (SELECT 0 FROM "TEST2" "TEST2" WHERE
"TEST2"."TABLE_NAME"=:B1 AND "TEST2"."TABLE_NAME" LIKE 'A%'))
3 - access("TEST2"."TABLE_NAME"=:B1)
filter("TEST2"."TABLE_NAME" LIKE 'A%')
Note
-----
- rule based optimizer used (consider using cbo)
Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
352897  consistent gets
0  physical reads
0  redo size
11059645  bytes sent via SQL*Net to client
168377  bytes received via SQL*Net from client
15274  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
229092  rows processed

在RBO模式下,两者的执行计划完全相同。

总结:

1、CBO模式下,not exists效率较高。

2、RBO模式下,两者效率差不多。

[本帖最后由 BTxigua 于 2008-10-14 10:42 编辑]