这周和同事讨论技术问题时,他告诉我客户的一套11.1.0.6的数据库中某个本地管理表空间上存在大量的Extents Fragment区间碎片,这些连续的Extents没有正常合并为一个大的Extent,他怀疑这是由于11.1.0.6上的bug造成了LMT上存在大量碎片。
同事判断该表空间上有碎片的依据是从dba_free_space视图中查询到大量连续的Free Extents:
SQL> select tablespace_name,EXTENT_MANAGEMENT,ALLOCATION_TYPE from dba_tablespaces where tablespace_name='FRAGMENT';
TABLESPACE_NAME EXTENT_MAN ALLOCATIO
------------------------------ ---------- ---------
FRAGMENT LOCAL SYSTEM
SQL> select block_id,blocks from dba_free_space where tablespace_name='FRAGMENT' and rownum<10;
BLOCK_ID BLOCKS
---------- ----------
40009 222136
25 8
9 8
17 8
33 8
41 8
49 8
57 8
65 8
..............
SQL> select count(*),blocks from dba_free_space where tablespace_name='FRAGMENT' and blocks=8 group by blocks;
COUNT(*) BLOCKS
---------- ----------
5000 8
以上可以看到FRAGMENT表空间使用autoallocate的Local Extent Management,的确存在大量的连续Extents没有合并。在DMT即字典管理表空间模式下需要SMON进程定期维护FET$基表将tablespace上的连续空闲Extents合并为更大的一个Extents。而在LMT模式下因为采用数据文件头上(datafile header 3-8 blocks in 10g)的位图管理区间,所以无需某个后台进程特意去合并区间。
为什么LMT下连续空闲Extents没有合并而造成碎片呢?因为这套库采用11gr1较不稳定的11.1.0.6版本,所以把问题归咎为某个bug似乎可以讲得通。一开始我较为认同同事的bug论,且和同事一起查询了Metalink上11gr1上一些已知的bug,但并没有发现症状匹配的bug note。
这让我反思这个问题,过早的将cause定位到bug过于主观了,并不是所有我们预期外的情况(unexpected)都属于bug。
实际上dba_free_space所显示的信息可能并不"真实",这种幻象往往由10g以后出现的flashback table特性引起:
SQL> select text from dba_views where view_name='DBA_FREE_SPACE';
TEXT
--------------------------------------------------------------------------------
======DMT REAL FREE EXTENTS=============
select ts.name, fi.file#, f.block#,
f.length * ts.blocksize, f.length, f.file#
from sys.ts$ ts, sys.fet$ f, sys.file$ fi
where ts.ts# = f.ts#
and f.ts# = fi.ts#
and f.file# = fi.relfile#
and ts.bitmapped = 0
union all
======LMT REAL FREE EXTENTS=============
select /*+ ordered use_nl(f) use_nl(fi) */
ts.name, fi.file#, f.ktfbfebno,
f.ktfbfeblks * ts.blocksize, f.ktfbfeblks, f.ktfbfefno
from sys.ts$ ts, sys.x$ktfbfe f, sys.file$ fi
where ts.ts# = f.ktfbfetsn
and f.ktfbfetsn = fi.ts#
and f.ktfbfefno = fi.relfile#
and ts.bitmapped <> 0 and ts.online$ in (1,4) and ts.contents$ = 0
union all
======LMT RECYCLEBIN FREE EXTENTS=============
select /*+ ordered use_nl(u) use_nl(fi) */
ts.name, fi.file#, u.ktfbuebno,
u.ktfbueblks * ts.blocksize, u.ktfbueblks, u.ktfbuefno
from sys.recyclebin$ rb, sys.ts$ ts, sys.x$ktfbue u, sys.file$ fi
where ts.ts# = rb.ts#
and rb.ts# = fi.ts#
and u.ktfbuefno = fi.relfile#
and u.ktfbuesegtsn = rb.ts#
and u.ktfbuesegfno = rb.file#
and u.ktfbuesegbno = rb.block#
and ts.bitmapped <> 0 and ts.online$ in (1,4) and ts.contents$ = 0
union all
======DMT RECYCLEBIN FREE EXTENTS=============
select ts.name, fi.file#, u.block#,
u.length * ts.blocksize, u.length, u.file#
from sys.ts$ ts, sys.uet$ u, sys.file$ fi, sys.recyclebin$ rb
where ts.ts# = u.ts#
and u.ts# = fi.ts#
and u.segfile# = fi.relfile#
and u.ts# = rb.ts#
and u.segfile# = rb.file#
and u.segblock# = rb.block#
and ts.bitmapped = 0
以上我们通过解析10g中的dba_free_space视图可以了解到该视图所显示的Free Extents由以下四个部分组成:
- LMT表空间上真正空闲的Extents
- DMT表空间上真正空闲的Extents
- LMT表空间上被RECYCLEBIN中对象占用的Extents
- DMT表空间上被RECYCLEBIN中对象占用的Extents
而在10g以前的版本中因为没有recyclebin特性的"干扰",所以dba_free_space所显示的Free Extents由前2个部分组成,因此我们可以在10g中创建一个兼容视图以实现对真正空闲空间的查询:
create view dba_free_space_pre10g as
select ts.name TABLESPACE_NAME,
fi.file# FILE_ID,
f.block# BLOCK_ID,
f.length * ts.blocksize BYTES,
f.length BLOCKS,
f.file# RELATIVE_FNO
from sys.ts$ ts, sys.fet$ f, sys.file$ fi
where ts.ts# = f.ts#
and f.ts# = fi.ts#
and f.file# = fi.relfile#
and ts.bitmapped = 0
union all
select /*+ ordered use_nl(f) use_nl(fi) */
ts.name TABLESPACE_NAME,
fi.file# FILE_ID,
f.ktfbfebno BLOCK_ID,
f.ktfbfeblks * ts.blocksize BYTES,
f.ktfbfeblks BLOCKS,
f.ktfbfefno RELATIVE_FNO
from sys.ts$ ts, sys.x$ktfbfe f, sys.file$ fi
where ts.ts# = f.ktfbfetsn
and f.ktfbfetsn = fi.ts#
and f.ktfbfefno = fi.relfile#
and ts.bitmapped <> 0
and ts.online$ in (1, 4)
and ts.contents$ = 0
/
create view dba_free_space_recyclebin as
select /*+ ordered use_nl(u) use_nl(fi) */
ts.name TABLESPACE_NAME,
fi.file# FILE_ID,
u.ktfbuebno BLOCK_ID,
u.ktfbueblks * ts.blocksize BYTES,
u.ktfbueblks BLOCKS,
u.ktfbuefno RELATIVE_FNO
from sys.recyclebin$ rb, sys.ts$ ts, sys.x$ktfbue u, sys.file$ fi
where ts.ts# = rb.ts#
and rb.ts# = fi.ts#
and u.ktfbuefno = fi.relfile#
and u.ktfbuesegtsn = rb.ts#
and u.ktfbuesegfno = rb.file#
and u.ktfbuesegbno = rb.block#
and ts.bitmapped <> 0
and ts.online$ in (1, 4)
and ts.contents$ = 0
union all
select ts.name TABLESPACE_NAME,
fi.file# FILE_ID,
u.block# BLOCK_ID,
u.length * ts.blocksize BYTES,
u.length BLOCKS,
u.file# RELATIVE_FNO
from sys.ts$ ts, sys.uet$ u, sys.file$ fi, sys.recyclebin$ rb
where ts.ts# = u.ts#
and u.ts# = fi.ts#
and u.segfile# = fi.relfile#
and u.ts# = rb.ts#
and u.segfile# = rb.file#
and u.segblock# = rb.block#
and ts.bitmapped = 0
/
通过以上创建的dba_free_space_pre10g和dba_free_space_recyclebin视图,我们可以很明确地区分表空间上空闲Extents。
针对本例中的LMT上存在大量连续的空闲Extent碎片,可以直接从上述视图中得到答案:
SQL> select * from dba_free_space_pre10g where tablespace_name='FRAGMENT';
TABLESPACE_NAME FILE_ID BLOCK_ID BYTES BLOCKS RELATIVE_FNO
------------------------------ ---------- ---------- ---------- ---------- ------------
FRAGMENT 13 40009 1819738112 222136 13
SQL> select count(*),blocks from dba_free_space_recyclebin where tablespace_name='FRAGMENT' group by blocks;
COUNT(*) BLOCKS
---------- ----------
5000 8
显然是RECYCLEBIN中存在大量的小"对象"从而造成了LMT上出现大量碎片的假象
SQL> select space,count(*) from dba_recyclebin where ts_name='FRAGMENT' group by space;
SPACE COUNT(*)
---------- ----------
8 5000
我们可以通过purge recyclebin来"合并"这些Extents碎片
SQL> purge dba_recyclebin;
DBA Recyclebin purged.
SQL> select count(*),blocks from dba_free_space where tablespace_name='FRAGMENT' group by blocks;
COUNT(*) BLOCKS
---------- ----------
1 262136
如果应用程序创建大量的小型堆(heap)表来存放临时数据,在不再需要这些数据时将这些堆表drop掉,那么就可能造成上述LMT"碎片"问题。我们在实际处理10g以后的这类空间问题时一定搞清楚,哪些是真正的Free Extents,而哪些是来自RECYCLEBIN的Extents。
另一方面这个case还告诉我们不要一遇到预料外的行为方式(unexpected behavior)就将问题定位到bug,这样会过早僵化我们的诊断预期。为了尽可能地发散思维,我们有必要如围棋中所提倡的"保留变化"那样来安排诊断步骤。