近期通过后台的监控,看到某台 ORACLE 9i数据库服务器整体负载有升高趋势,部分时段loadavg峰值到8,平均在4-5之间,调整之后,系统负载下降非常显著,现将调整的过程分享出来,文档可能会有些长度,为了打消阅读的畏难情绪和普遍面对调优无从着手的印象,我想先强调一点:数据库性能调优,有时候简单到超乎你的想像~~~
现状描述:
先来看看调整前后整机负载状况的对比,如图:
收集时间为上午10:00-12:00 之间,分别收集10.27和28号两天数据的对比~~
由上图可以看出,通过特别调整后,系统整体负载下降了数倍,现在平均负载保持在0.x左右,高峰时期也没有超过1!
接下来再通过ORACLE的STATSPACK生成性能报告来对比,报表信息采集时间均为下午14:00---15:00之间:
A>. 数据读写
修改前:
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 26,567.21 1,611.03
Logical reads: 389,897.11 23,643.26
Block changes: 165.96 10.06
Physical reads: 30,068.50 1,823.35
修改后:
Load Profile
~~~~~~~~~~~~ Per Second Per Transaction
--------------- ---------------
Redo size: 32,102.45 2,132.94
Logical reads: 31,283.94 2,078.55
Block changes: 173.87 11.55
Physical reads: 457.09 30.37
=================================
由上述信息可以看出,Redo size/Block changes基本无变化(后者甚至更高),说明涉及数据的修改操作基本相同,数据读则有十倍以上的差距,修改后系统物理读和逻辑读都有近10倍的下降,可以简单理解为,同等应用环境情况下,对IO的负载要求减轻了10倍!
B>. TOP5等待事件
修改前:
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time 11,495 92.03
db file sequential read 47,569,569 369 2.95
db file scattered read 16,255,400 355 2.84
log file sync 60,633 101 .81
log file parallel write 61,910 60 .48
修改后:
Top 5 Timed Events
~~~~~~~~~~~~~~~~~~ % Total
Event Waits Time (s) Ela Time
-------------------------------------------- ------------ ----------- --------
CPU time 1,254 87.71
db file sequential read 367,961 55 3.85
log file sync 55,857 35 2.42
LGWR wait on LNS 195,613 17 1.22
LGWR-LNS wait on channel 68,833 17 1.19
=================================
从排名前5的等待事件能够清晰的看到,曾经最严重的磁盘文件读等待现在已经完全可以忽视,并且CPU time时间也有10倍的下降~
C>. Buffer Gets
修改前:
CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
450,247,504 1,974 228,088.9 32.1 2445.17 2405.51 1551938908
select count(1) as num from t_wiki_doc_topics where doc_id_encry
pt=¨BBwReRlFZV0RRXlkG¨ and topic_state=1
446,599,884 1,959 227,973.4 31.8 2450.06 2406.25 1250588569
select b.* from (select rownum as r,a.* from (select t.user_id,t
.last_post_user_id,t.doc_title,t.topic_title,t.click_count+1 as
click_count,t.posts_count-1 as posts_count,t.user_nick,to_char(t
.last_post_time,¨yy-mm-dd hh24:mi:ss¨) last_post_time,t.last_pos
t_user_nick,t.last_post_user_id_encrypt,t.user_id_encrypt,t.onto
216,979,036 3,602 60,238.5 15.4 1878.72 1850.94 52310006
select e.*, rownum from T_WIKI_EVENT e where (rownum <= :1) and
NEXT_SEND_TIME < SYSDATE and EVENT_TYPE>=0 order by NEXT_SEND_TIME
98,751,515 14,404 6,855.8 7.4 2176.33 2182.08 3309920380
update T_WIKI_DOC_LOCK set DOC_LOCK_USER_ID=:1,DOC_LOCK_START_TI
ME=to_date(:2,¨yyyy-mm-dd hh24:mi:ss¨),DOC_LOCK_HEARTBEAT_TIME=t
o_date(:3,¨yyyy-mm-dd hh24:mi:ss¨),DOC_LOCK_STATE=:4,DOC_LOCK_US
ER_ID_ENCRYPT=:5,DOC_ID_ENCRYPT=:6,DOC_LOCK_USER_IP=:7,DOC_TITLE
=:8,DOC_USER_ID_EN=:9,DOC_USER_NICK=:10 where DOC_ID=:11
修改后:
CPU Elapsd
Buffer Gets Executions Gets per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
31,281,969 18,319 1,707.6 27.8 86.73 96.33 3049882434
select COUNT(*) from t_wiki_user_msg where ((dest_user_id_encryp
t = :1) AND (msg_state = 1) AND (open_time IS NULL))
.........
...........
13,723,327 2,037 6,737.0 12.2 345.14 344.57 1551938908
select count(1) as num from t_wiki_doc_topics where doc_id_encry
pt=¨BBwReRlFZV0RRXlkG¨ and topic_state=1
12,717,774 1,999 6,362.1 11.3 115.08 115.46 1250588569
select b.* from (select rownum as r,a.* from (select t.user_id,t
.last_post_user_id,t.doc_title,t.topic_title,t.click_count+1 as
click_count,t.posts_count-1 as posts_count,t.user_nick,to_char(t
.last_post_time,¨yy-mm-dd hh24:mi:ss¨) last_post_time,t.last_pos
t_user_nick,t.last_post_user_id_encrypt,t.user_id_encrypt,t.onto
=================================
可以看到,修改前那几条占用CPU/BUFFER最为夸张的几条语句,在新的性能REPORT中都基本可以忽略,新的统计报表中,排名最靠前的buffer gets不过3kw,CPU时间不到1分半钟,与之前的SQL语句完全不在同等重量级,相同SQL语句的统计信息相比修改之前,都有十数倍以上的降低。
D>. 物理读
修改前:
SQL ordered by Reads for DB: ORACLE9I Instance: oracle9i Snaps: 1014 -1015
-> End Disk Reads Threshold: 1000
CPU Elapsd
Physical Reads Executions Reads per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
104,960,046 3,602 29,139.4 96.9 1878.72 1850.94 52310006
select e.*, rownum from T_WIKI_EVENT e where (rownum <= :1) and
NEXT_SEND_TIME < SYSDATE and EVENT_TYPE>=0 order by NEXT_SEND_TIME
修改后:
SQL ordered by Reads for DB: ORACLE9I Instance: oracle9i Snaps: 1090 -1091
-> End Disk Reads Threshold: 1000
CPU Elapsd
Physical Reads Executions Reads per Exec %Total Time (s) Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
4,225 18,319 0.2 0.3 86.73 96.33 3049882434
select COUNT(*) from t_wiki_user_msg where ((dest_user_id_encryp
t = :1) AND (msg_state = 1) AND (open_time IS NULL))
=================================
曾经排名最靠前的产生物理读的SQL语句,在新的报表中已经不再出现,并且新的性能报表中,排名最高的SQL所产生的物理读也只有4,225个(该语句执行次数稍高了一点,不过这个就非DBA所能控制了,需要开发人员及设计人员配合处理才能有可能降低其请求次数)。
E>. 实例整体统计信息
修改前:
Statistic Total per Second per Trans
--------------------------------- ------------------ -------------- ------------
CPU used by this session 1,149,502 319.1 19.4
CPU used when call started 1,149,615 319.2 19.4
CR blocks created 10,632 3.0 0.2
Cached Commit SCN referenced 11,403,832 3,166.0 192.0
Commit SCN cached 105 0.0 0.0
DBWR buffers scanned 68,437 19.0 1.2
DBWR summed scan depth 68,437 19.0 1.2
buffer is not pinned count 1,369,310,429 380,152.8 23,052.4
buffer is pinned count 6,491,211,605 1,802,113.2 109,279.7
.......
consistent changes 16,402 4.6 0.3
consistent gets 1,403,720,942 389,706.0 23,631.7
consistent gets - examination 6,368,534 1,768.1 107.2
.........
no work - consistent read gets 1,396,760,295 387,773.5 23,514.5
.........
physical reads 108,306,724 30,068.5 1,823.4
physical reads direct 46,758 13.0 0.8
physical reads direct (lob) 2,161 0.6 0.0
.........
session logical reads 1,404,409,388 389,897.1 23,643.3
.........
table fetch by rowid 3,496,022,454 970,578.1 58,855.6
table fetch continued row 395,372,678 109,764.8 6,656.1
table scan blocks gotten 453,859,894 126,002.2 7,640.7
table scan rows gotten 6,925,949,059 1,922,806.5 116,598.5
修改后:
Statistic Total per Second per Trans
--------------------------------- ------------------ -------------- ------------
CPU used by this session 125,354 34.8 2.3
CPU used when call started 125,227 34.8 2.3
CR blocks created 278 0.1 0.0
Cached Commit SCN referenced 12,475,990 3,464.6 230.2
Commit SCN cached 66 0.0 0.0
DBWR buffers scanned 50 0.0 0.0
DBWR summed scan depth 50 0.0 0.0
buffer is not pinned count 86,322,220 23,971.7 1,592.7
buffer is pinned count 315,042,719 87,487.6 5,812.8
.........
no work - consistent read gets 103,809,445 28,828.0 1,915.4
.........
physical reads 1,645,979 457.1 30.4
physical reads direct 0 0.0 0.0
physical reads direct (lob) 11,441 3.2 0.2
.........
session logical reads 112,653,483 31,283.9 2,078.6
.........
table fetch by rowid 178,037,671 49,441.2 3,285.0
table fetch continued row 640,957 178.0 11.8
table scan blocks gotten 27,871,674 7,740.0 514.3
table scan rows gotten 435,414,252 120,914.8 8,033.8
=================================
在consistent gets,consistent read gets,physical reads,session logical reads,table fetch,table scan 等方面都有数倍到数十倍的下降
F>. 表空间IO
修改前:
Tablespace
------------------------------
Av Av Av Av Buffer Av Buf
Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)
-------------- ------- ------ ------- ------------ -------- ---------- ------
GAIBAN_DATA
63,821,302 17,718 0.0 1.7 10,734 3 5,633 1.5
修改后:
Tablespace
------------------------------
Av Av Av Av Buffer Av Buf
Reads Reads/s Rd(ms) Blks/Rd Writes Writes/s Waits Wt(ms)
-------------- ------- ------ ------- ------------ -------- ---------- ------
GAIBAN_DATA
562,450 156 0.1 2.9 17,842 5 6 6.7
=================================
从统计结果看,调整主要涉及表空间的读和等待均有百倍的提升~~~
|