CLUSTERING_FACTOR对于执行计划的影响
本案例的产品环境是AIX 5.2操作系统,数据库版本为 Oracle 9.2.0.7。在一次常规的表分析后,系统出现异常,主机CPU 平均使用率达到了70%,极大影响了系统的使用与服务的提供。
以前系统的CPU使用率从未如此之高,而分析系统的变化,仅仅是执行过一个例行的表分析脚本,使用同样脚本做表分析从没有发生过这样的事情,该脚本被认为是安全的。
根据定期收集的Statspack数据生成报告,发现数据库的具体体现为CPU Time很高,这与主机体现吻合,Latch free紧跟CPU Time列在第二位,这也是一个高CPU消耗的数据库事件(采样周期为15分钟):
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time(s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time 4,598 60.61
latch free 183,374 1,905 25.11
log file sync 131,171 345 4.55
db file sequential read 30,745 295 3.88
direct path write 126,462 123 1.62
-------------------------------------------------------------
在随后的Top SQL中发现一条异常的SQL(了解一个稳定系统中常见SQL的性能非常重要),其中Hash Value=2011675572,这条SQL执行一个单表访问,每次执行的逻辑读大约在316次左右:
SQL ordered by Gets forDB:
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- -------------------------- ------ -------- --------- ----------
9,219,425 29,158 316.2 15.9 304.47 425.922011675572
Module: InSiteXMLServer.exe
SELECT MfgLineEntry.MfgLineEntryIdFROM mesw.MfgLineEntry WHERE
MfgLineEntry.MfgLineId = :"SYS_B_0"ORDER BY MfgLineEntry.Priority
从全局来看,这个SQL的逻辑读数量不算太高,但是根据经验以及对于数据库的了解,我们知道这个SQL的执行可能出现了问题。通过V$SQL_PLAN查看这个SQL的执行计划,发现数据库使用了全表扫描来执行这个SQL:
SQL> select operation,options,object_name,optimizer
2 from v$sql_plan wherehash_value='2011675572';
OPERATION OPTIONS OBJECT_NAME OPTIMIZER
----------------- ----------- --------------- ----------
SELECT STATEMENT CHOOSE
SORT ORDERBY
TABLE ACCESS FULL MFGLINEENTRY ANALYZED
格式化一下前面的SQL便于分析:
SELECT mfglineentry.mfglineentryid
FROM mesw.mfglineentry WHEREmfglineentry.mfglineid = :"SYS_B_0"
ORDER BY mfglineentry.priority;
查询引用了MFGLINEID字段,而表MFGLINEENTRY上实际上有一个MFGLINEID和PRIORITY列的组合索引,在查询中,这个索引没有被用到:
SQL> select dbms_metadata.get_ddl('INDEX','MFGLINEENTRY2')from dual;
DBMS_METADATA.GET_DDL('INDEX',
--------------------------------------------------------------------------------
CREATE INDEX "MESW"."MFGLINEENTRY2"ON "MESW"."MFGLINEENTRY" ("MFGLINEID", "PRIORITY")
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGING COMPUTESTATISTICS
STORAGE(INITIAL 1048576 NEXT 1048576 MINEXTENTS1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOLDEFAULT)
TABLESPACE"MESI_8K_1M" PARALLEL 8
这个表记录数在1万行左右,使用极其频繁。与开发人员联系,得知正常情况下这个SQL返回最多100行左右,按理说不应该走全表扫描;使用autotrace查看返回57行时的执行计划及逻辑读如下:
SQL> SELECTmfglineentry.mfglineentryid
2 FROM mfglineentry
3 WHERE mfglineentry.mfglineid = '480013800006c7f2'
4 ORDER BY mfglineentry.priority;
57 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=27 Card=56 Bytes=2128)
1 0 SORT (ORDER BY) (Cost=27Card=56 Bytes=2128)
2 1 TABLE ACCESS (FULL) OF'MFGLINEENTRY' (Cost=21 Card=56 Bytes=2128)
Statistics
----------------------------------------------------------
0 recursivecalls
0 db block gets
315 consistentgets
0 physical reads
0 redo size
1434 bytes sent via SQL*Net to client
298 bytesreceived via SQL*Net from client
5 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
57 rowsprocessed
发现查询在返回57行的时候也是全表扫描,逻辑读315次(这和statspack中的结果一致),为了分析原因,我们首先尝试使用hints强制使用索引MFGLINEENTRY2,看看SQL的执行效率:
SQL> SELECT/*+INDEX(mfglineentry, MFGLINEENTRY2 )*/ mfglineentry.mfglineentryid
2 FROM mfglineentry
3 WHERE mfglineentry.mfglineid = '480013800006c7f2'
4 ORDERBY mfglineentry.priority;
57 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=35 Card=56 Bytes=2128)
1 0 TABLE ACCESS (BY INDEX ROWID)OF 'MFGLINEENTRY' (Cost=35 Card=56 Bytes=2128)
2 1 INDEX (RANGE SCAN) OF'MFGLINEENTRY2' (NON-UNIQUE) (Cost=2 Card=56)
Statistics
----------------------------------------------------------
0 recursivecalls
0 db block gets
51 consistentgets
0 physical reads
0 redo size
1466 bytessent via SQL*Net to client
298 bytesreceived via SQL*Net from client
5 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts(disk)
57 rowsprocessed
此时的逻辑读只有51次,在效率上比全表扫描要好;那么为什么CBO选择了TABLE ACCESS FULL呢?
查看一下USER_TABLES中记录的表统计数据:
SQL> SELECT BLOCKS,NUM_ROWSFROM USER_TABLES WHERE TABLE_NAME='MFGLINEENTRY';
BLOCKS NUM_ROWS
---------- ----------
312 10649
SQL> SELECT CLUSTERING_FACTORFROM USER_INDEXES WHERE INDEX_NAME='MFGLINEENTRY2';
CLUSTERING_FACTOR
-----------------
5758
从这里可以看出由于MFGLINEENTRY2的CLUSTERING_FACTOR的值很大,导致CBO计算通过索引返表查询的成本过高,因而使用了TABLE ACCESSFULL。
在测试环境中人为修改了CLUSTERING_FACTOR证实了这个想法:
SQL> EXEC DBMS_STATS.set_index_stats(ownname=> 'MESW',indname => 'MFGLINEENTRY2',clstfct => 1000);
PL/SQLprocedure successfully completed
SQL>set autotrace traceonly
SQL>SELECT mfglineentry.mfglineentryid
2 FROM mfglineentry
3 WHERE mfglineentry.mfglineid= '480013800006c7f2'
4 ORDER BY mfglineentry.priority
5 /
57rows selected.
ExecutionPlan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=35 Card=56 Bytes=2128)
1 0 TABLE ACCESS (BY INDEX ROWID) OF 'MFGLINEENTRY'(Cost=35 Card=56 Bytes=2128)
2 1 INDEX (RANGE SCAN) OF 'MFGLINEENTRY2' (NON-UNIQUE)(Cost=2 Card=56)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
51 consistentgets
0 physical reads
0 redo size
1466 bytes sent via SQL*Net to client
298 bytes receivedvia SQL*Net from client
5 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
57 rows processed
联系开发人员得知,前一段时间这张表加载了一些比较随机的数据,导致CLUSTERING_FACTOR变大,最近的表分析使统计信息发生了变化,所以执行计划变成了全表扫描。
为安全起见,这个表增加了索引:
SQL> selectdbms_metadata.get_ddl('INDEX','MFGLINEENTRY3') from dual;
DBMS_METADATA.GET_DDL('INDEX',
--------------------------------------------------------------------------------
CREATE INDEX"MESW"."MFGLINEENTRY3" ON"MESW"."MFGLINEENTRY"
("MFGLINEID", "PRIORITY","MFGLINEENTRYID")
PCTFREE 10 INITRANS 2 MAXTRANS 255 NOLOGGINGCOMPUTE STATISTICS
STORAGE(INITIAL 1048576 NEXT 1048576MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1BUFFER_POOL DEFAULT)
TABLESPACE "MESI_8K_1M" PARALLEL 8
使这个SQL的执行计划变为不返表不排序的MFGLINEENTRY3 INDEXRANGE SCAN来减少逻辑读。
SQL> SELECTmfglineentry.mfglineentryid
2 FROM mfglineentry
3 WHEREmfglineentry.mfglineid = '480013800006c7f2'
4 ORDERBY mfglineentry.priority
5 /
57 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=56 Bytes=2128)
1 0 INDEX (RANGE SCAN) OF'MFGLINEENTRY3' (NON-UNIQUE) (Cost=3 Card=56 Bytes=2128)
Statistics
----------------------------------------------------------
0 recursivecalls
0 db block gets
9 consistent gets
0 physical reads
0 redo size
1434 bytessent via SQL*Net to client
298 bytesreceived via SQL*Net from client
5 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
57 rowsprocessed
建立索引后CPU使用率下降有限,继续看statspack 发现。
Instance EfficiencyPercentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.98 Redo NoWait %: 100.00
Buffer Hit %: 99.27 In-memorySort %: 100.00
Library Hit %: 96.42 Soft Parse %: 87.50
Executeto Parse %: 81.83 LatchHit %: 99.46
Parse CPU to Parse Elapsd %: 33.37 % Non-Parse CPU: 89.61
软解析率很低,只有87.50%,说明数据库中存在过多的硬解析。
Latch Sleepbreakdown for DB: MESWP Instance:meswp Snaps: 2512 -2513
-> ordered bymisses desc
Get Spin &
Latch Name Requests Misses Sleeps Sleeps 1->4
-------------------------- -------------- ----------- ----------- ------------
library cache 11,487,159 109,504 24,235 85521/23736/242/5/0
session allocation 9,506,763 99,905 1,459 98447/1457/1/0/0
enqueues 5,947,923 42,175 2,062 40114/2060/1/0/0
shared pool 4,667,481 34,604 892 33716/884/4/0/0
row cache objects 6,959,017 33,322 52 33270/52/0/0/0
row cache enqueue latch 5,801,806 29,754 25 29729/25/0/0/0
查看latch部分的内容,Library cache latch 竞争很严重,同时在SQL ordered by Version Count部分,出现了大量SQL的version count,并且这些SQL都是根据主键的等值查询。
文章来源:《OracleDBA手记1》第17章聚簇因子、柱状图与执行计划 作者:段林仲
配图来源:http://www.itsne.com/image_upload/179693981.jpg
回复“手记1”可下载本书前三篇内容:DBA工作手记/诊断案例篇/SQL调优篇(建议复制链接至电脑端)