Beiträge getaggt mit 10g New Features
Performance Monitoring: Active Session History at work
Teaching an Oracle Database 10g Performance Tuning course this week, I introduced the 10g New Feature Active Session History (ASH) to the students. That was one major improvement – together with the Automatic Workload Repository (AWR) and the Automatic Database Diagnostic Monitor (ADDM) – of the 10g version. Way better than STATSPACK was before!
Imagine you are a DBA on a production system and get an emergency call like „The Database is dead slow!“. You are supposed to spot the cause as soon as possible. ASH kicks in here: We sample the Wait-Events of active sessions every second into the ASH-Buffer. It is accessed most comfortable with the Enterprise Manager GUI from the Performance Page (Button ASH Report there). Or with little effort from the command line like this:
----------------------------------------- -- -- Top 10 CPU consumers in last 5 minutes -- ----------------------------------------- SQL> select * from ( select session_id, session_serial#, count(*) from v$active_session_history where session_state= 'ON CPU' and sample_time > sysdate - interval '5' minute group by session_id, session_serial# order by count(*) desc ) where rownum <= 10; -------------------------------------------- -- -- Top 10 waiting sessions in last 5 minutes -- -------------------------------------------- SQL> select * from ( select session_id, session_serial#,count(*) from v$active_session_history where session_state='WAITING' and sample_time > sysdate - interval '5' minute group by session_id, session_serial# order by count(*) desc ) where rownum <= 10;
These 2 queries should spot the most incriminating sessions of the last 5 minutes. But who is that and what SQL was running?
-------------------- -- -- Who is that SID? -- -------------------- set lines 200 col username for a10 col osuser for a10 col machine for a10 col program for a10 col resource_consumer_group for a10 col client_info for a10 SQL> select serial#, username, osuser, machine, program, resource_consumer_group, client_info from v$session where sid=&sid; ------------------------- -- -- What did that SID do? -- ------------------------- SQL> select distinct sql_id, session_serial# from v$active_session_history where sample_time > sysdate - interval '5' minute and session_id=&sid; ---------------------------------------------- -- -- Retrieve the SQL from the Library Cache: -- ---------------------------------------------- col sql_text for a80 SQL> select sql_text from v$sql where sql_id='&sqlid';
You may spot the cause of the current performance problem in very short time with the shown technique. But beware: You need to purchase the Diagnostic Pack in order to be allowed to use AWR, ADDM and ASH 🙂
Restoring old Optimizer Statistics for troubleshooting purpose
We deliver a Oracle Database 10g (sic!) New Features course this week in our education center in Düsseldorf. Many customers are still on 10g productively and some even just upgrade from 9i to 10g, still being in the process of evaluating 11g. Many new features in the 10g version are related to the optimizer. Some are quite well known to the public, like the obsolence of the rule based optimizer and the presence of a scheduler job that gathers optimizer statistics every night out-of-the-box.
Others lead a more quiet life, one of them is the possibility to restore old optimizer statistics (i.e. as a troubleshooting measure), should there be issues with the newly gathered ones. Since 10g, we do not simply overwrite old optimizer statistics by gathering new. Instead, the old optimizer statistics are automatically historized and kept for one month by default, so that they can be restored easily if that should be desired. Now did you know that? Here we go with a short demonstration of restoring historic optimizer statistics:
SQL> select * from v$version; BANNER ---------------------------------------------------------------- Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod PL/SQL Release 10.2.0.1.0 - Production CORE 10.2.0.1.0 Production TNS for Linux: Version 10.2.0.1.0 - Production NLSRTL Version 10.2.0.1.0 - Production SQL> grant dba to adam identified by adam; Grant succeeded. SQL> connect adam/adam Connected. SQL> create table sales as select rownum as id, mod(rownum,5) as channel_id, mod(rownum,1000) as cust_id, 5000 as amount_sold, sysdate as time_id from dual connect by level<=3e7; Table created. SQL> create index sales_idx on sales(id) nologging; Index created. SQL> select segment_name,bytes/1024/1024 as mb from user_segments; SEGMENT_NAME MB ------------------------ ---------- SALES 942 SALES_IDX 566
I just created a demo user, a table and an index on that table. Notice that the two segments take about 1.5 Gig space, should you like to reproduce the demo yourself. Right now, there are no optimizer statistics for the table:
SQL> select num_rows,blocks from user_tables; NUM_ROWS BLOCKS ---------- ---------- "NULL values here"
I am now going to gather statistics on the table manually – the same would be done automatically by the standard scheduler job during the night:
SQL> exec dbms_stats.gather_table_stats('ADAM','SALES')
PL/SQL procedure successfully completed.
SQL> select num_rows,blocks from user_tables;
NUM_ROWS BLOCKS
---------- ----------
29933962 119585
SQL> select count(*) from sales;
COUNT(*)
----------
30000000
As we can see, the statistics are quite accurate, reflecting well the actual size of the table. The index is used for the following query, as we can tell by runtime already:
SQL> set timing on SQL> select amount_sold from sales where id=4711; AMOUNT_SOLD ----------- 5000 Elapsed: 00:00:00.00
I am now going to introduce a problem with the optimizer statistics artificially by just setting them very inaccurate. A real-world problem that is caused by new optimizer statistics is a little harder to come up with – probably you will never encounter it during your career…
SQL> exec dbms_stats.set_table_stats('ADAM','SALES',numrows=>100,numblks=>1) PL/SQL procedure successfully completed. SQL> select num_rows,blocks from user_tables; NUM_ROWS BLOCKS ---------- ---------- 100 1
With the above (completely misleading) statistics, the optimizer will think that a full table scan of the sales table is fairly cheap. Please notice that I ask after id 4712 and not 4711 again, because it could happen that the already computed execution plan (index range scan) is still in the library cache available for reuse. I could also flush the shared pool here to make sure that a new execution plan has to be generated for the id 4711.
SQL> select amount_sold from sales where id=4712; AMOUNT_SOLD ----------- 5000 Elapsed: 00:00:01.91
We can tell by the runtime of almost 2 seconds here that this was a full table scan. Proof would be to retrieve the execution plan from the library cache. I leave that to your studies. Please be aware that the autotrace feature might be misleading here. For our scope, it is enough to say that we have an issue caused by generation of new optimizer statistics. We want to get back our good old statistics! Therefore, we look at the historic optimizer statistics:
SQL> alter session set NLS_TIMESTAMP_TZ_FORMAT='yyyy-mm-dd:hh24:mi:ss';
Session altered.
SQL> select table_name,stats_update_time from user_tab_stats_history;
TABLE_NAME STATS_UPDATE_TIME
------------------------------ --------------------------------------
SALES 2010-05-18:09:47:16
SALES 2010-05-18:09:47:38
We see two rows, representing the old statistics of the sales table. The first was from the time, as there where NULL entries (before the first gather_table_stats). The second row represents the accurate statistics. I am going to restore them:
SQL> begin
dbms_stats.restore_table_stats('ADAM','SALES',
to_timestamp('2010-05-18:09:47:38','yyyy-mm-dd:hh24:mi:ss'));
end;
/
PL/SQL procedure successfully completed.
SQL> select num_rows,blocks from user_tables;
NUM_ROWS BLOCKS
---------- ----------
29933962 119585
The good old statistics are back, but the statements are still in the library cache and may execute with the same misguided execution plan:
SQL> set timing on SQL> select amount_sold from sales where id=4712; AMOUNT_SOLD ----------- 5000 Elapsed: 00:00:01.92
Still full table scan! The following empties the shared pool and forces the computation of new execution plans, using the accurate statistics:
SQL> alter system flush shared_pool; System altered. Elapsed: 00:00:00.04 SQL> select amount_sold from sales where id=4712; AMOUNT_SOLD ----------- 5000 Elapsed: 00:00:00.01
We fixed the problem by restoring the old stats. Should the problem persist – because apparently, optimizer statistics did not cause it – we can get back the new computed statistics also:
SQL> select table_name,stats_update_time from user_tab_stats_history;
TABLE_NAME STATS_UPDATE_TIME
------------------------------ ---------------------------------------------------------------------------
SALES 2010-05-18:09:47:16
SALES 2010-05-18:09:47:38
SALES 2010-05-18:09:52:30
The last line represents the last historized stats.
SQL> begin dbms_stats.restore_table_stats('ADAM','SALES', to_timestamp('2010-05-18:09:52:30','yyyy-mm-dd:hh24:mi:ss')); end; / 2 3 4 5 PL/SQL procedure successfully completed. SQL> select num_rows,blocks from user_tables; NUM_ROWS BLOCKS ---------- ---------- 100 1
Conclusion: If we suspect that new Optimizer Statistics are responsible for a performance problem, we can restore the „good old statistics“ very easy since 10g.
Retrieve SQL and Execution Plan from AWR Snapshots
A handy little script that I have used in the Oracle Database 11g Performance Tuning course this week. It queries the AWR Snapshots for SQL statements and sorts them by elapsed time:
SQL> col parsed format a6 col sql_text format a40 set lines 200 set pages 300 select sql_text, parsing_schema_name as parsed, elapsed_time_delta/1000/1000 as elapsed_sec, stat.snap_id, to_char(snap.end_interval_time,'dd.mm hh24:mi:ss') as snaptime, txt.sql_id from dba_hist_sqlstat stat, dba_hist_sqltext txt, dba_hist_snapshot snap where stat.sql_id=txt.sql_id and stat.snap_id=snap.snap_id and snap.begin_interval_time>=sysdate-1 and lower(sql_text) like '%&t%' and parsing_schema_name not in ('SYS','SYSMAN','MDSYS','WKSYS') order by elapsed_time_delta asc;
sysdate-1 returns SQL up to yesterday and can be modified to retrieve SQL as long as there are AWR snapshots of course. &t is a substring of the SQL, we may be interested in. With the sql_id, we can also retrieve the execution plan from the snapshots:
SQL> select plan_table_output from table (dbms_xplan.display_awr('&sqlid'));
The amount of SQL that gets collected into a snapshot can be modified with
SQL> begin
dbms_workload_repository.modify_snapshot_settings (
topnsql=>'MAXIMUM');
end;
/
MAXIMUM (uppercase!) takes all SQL from the Library Cache into the snapshot – alternative, we can specify a number of Top SQL to collect. The example is from an 11g Database, but it should work the same with 10g already.