Beiträge getaggt mit 10g New Features

Performance Monitoring: Active Session History at work

Teaching an Oracle Database 10g Performance Tuning course this week, I introduced the 10g New Feature Active Session History (ASH) to the students. That was one major improvement – together with the Automatic Workload Repository (AWR) and the Automatic Database Diagnostic Monitor (ADDM) – of the 10g version. Way better than STATSPACK was before!

Imagine you are a DBA on a production system and get an emergency call like „The Database is dead slow!“. You are supposed to spot the cause as soon as possible. ASH kicks in here: We sample the Wait-Events of active sessions every second into the ASH-Buffer. It is accessed most comfortable with the Enterprise Manager GUI from the Performance Page (Button ASH Report there). Or with little effort from the command line like this:

-----------------------------------------
--
-- Top 10 CPU consumers in last 5 minutes
--
-----------------------------------------
SQL> select * from
(
select session_id, session_serial#, count(*)
from v$active_session_history
where session_state= 'ON CPU' and
 sample_time > sysdate - interval '5' minute
group by session_id, session_serial#
order by count(*) desc
)
where rownum <= 10;
--------------------------------------------
--
-- Top 10 waiting sessions in last 5 minutes
--
--------------------------------------------
SQL> select * from
(
select session_id, session_serial#,count(*)
from v$active_session_history
where session_state='WAITING'  and
 sample_time >  sysdate - interval '5' minute
group by session_id, session_serial#
order by count(*) desc
)
where rownum <= 10;

These 2 queries should spot the most incriminating sessions of the last 5 minutes. But who is that and what SQL was running?

--------------------
--
-- Who is that SID?
--
--------------------

set lines 200
col username for a10
col osuser for a10
col machine for a10
col program for a10
col resource_consumer_group for a10
col client_info for a10

SQL> select  serial#,
 username,
 osuser,
 machine,
 program,
 resource_consumer_group,
 client_info
from v$session where sid=&sid;

-------------------------
--
-- What did that SID do?
--
-------------------------

SQL> select distinct sql_id, session_serial# from v$active_session_history
where sample_time >  sysdate - interval '5' minute
and session_id=&sid;
----------------------------------------------
--
-- Retrieve the SQL from the Library Cache:
--
----------------------------------------------
col sql_text for a80
SQL> select sql_text from v$sql where sql_id='&sqlid';

You may spot the cause of the current performance problem in very short time with the shown technique. But beware: You need to purchase the Diagnostic Pack in order to be allowed to use AWR, ADDM and ASH 🙂

,

13 Kommentare

Restoring old Optimizer Statistics for troubleshooting purpose

We deliver a Oracle Database 10g (sic!) New Features course this week in our education center in Düsseldorf. Many customers are still on 10g productively and some even just upgrade from 9i to 10g, still being in the process of evaluating 11g. Many new features in the 10g version are related to the optimizer. Some are quite well known to the public, like the obsolence of the rule based optimizer and the presence of a scheduler job that gathers optimizer statistics every night out-of-the-box.

Others lead a more quiet life, one of them is the possibility to restore old optimizer statistics (i.e. as a troubleshooting measure), should there be issues with the newly gathered ones. Since 10g, we do not simply overwrite old optimizer statistics by gathering new. Instead, the old optimizer statistics are automatically historized and kept for one month by default, so that they can be restored easily if that should be desired. Now did you know that? Here we go with a short demonstration of restoring historic optimizer statistics:

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

SQL> grant dba to adam identified by adam;
Grant succeeded.

SQL> connect adam/adam
Connected.

SQL> create table sales as select
rownum as id,
mod(rownum,5) as channel_id,
mod(rownum,1000) as cust_id,
5000 as amount_sold,
sysdate as time_id
from dual connect by level<=3e7;
Table created.

SQL> create index sales_idx on sales(id) nologging;
Index created.

SQL> select segment_name,bytes/1024/1024 as mb from user_segments;

SEGMENT_NAME                     MB
------------------------ ----------
SALES                           942
SALES_IDX                       566

I just created a demo user, a table and an index on that table. Notice that the two segments take about 1.5 Gig space, should you like to reproduce the demo yourself. Right now, there are no optimizer statistics for the table:

SQL> select num_rows,blocks from user_tables;

 NUM_ROWS     BLOCKS
---------- ----------
"NULL values here"

I am now going to gather statistics on the table manually – the same would be done automatically by the standard scheduler job during the night:

SQL> exec dbms_stats.gather_table_stats('ADAM','SALES')

PL/SQL procedure successfully completed.

SQL> select num_rows,blocks from user_tables;

 NUM_ROWS     BLOCKS
---------- ----------
 29933962     119585

SQL> select count(*) from sales;

 COUNT(*)
----------
 30000000

As we can see, the statistics are quite accurate, reflecting well the actual size of the table. The index is used for the following query, as we can tell by runtime already:

SQL> set timing on
SQL> select amount_sold from sales where id=4711;

AMOUNT_SOLD
-----------
 5000

Elapsed: 00:00:00.00

I am now going to introduce a problem with the optimizer statistics artificially by just setting them very inaccurate. A real-world problem that is caused by new optimizer statistics is a little harder to come up with – probably you will never encounter it during your career…

SQL> exec dbms_stats.set_table_stats('ADAM','SALES',numrows=>100,numblks=>1)

PL/SQL procedure successfully completed.

SQL> select num_rows,blocks from user_tables;

 NUM_ROWS     BLOCKS
---------- ----------
 100          1

With the above (completely misleading) statistics, the optimizer will think that a full table scan of the sales table is fairly cheap. Please notice that I ask after id 4712 and not 4711 again, because it could happen that the already computed execution plan (index range scan) is still in the library cache available for reuse. I could also flush the shared pool here to make sure that a new execution plan has to be generated for the id 4711.

SQL> select amount_sold from sales where id=4712;

AMOUNT_SOLD
-----------
 5000

Elapsed: 00:00:01.91

We can tell by the runtime of almost 2 seconds here that this was a full table scan. Proof would be to retrieve the execution plan from the library cache. I leave that to your studies. Please be aware that the autotrace feature might be misleading here. For our scope, it is enough to say that we have an issue caused by generation of new optimizer statistics. We want to get back our good old statistics! Therefore, we look at the historic optimizer statistics:

SQL> alter session set NLS_TIMESTAMP_TZ_FORMAT='yyyy-mm-dd:hh24:mi:ss';

Session altered.

SQL> select table_name,stats_update_time from user_tab_stats_history;

TABLE_NAME                     STATS_UPDATE_TIME
------------------------------ --------------------------------------
SALES                          2010-05-18:09:47:16
SALES                          2010-05-18:09:47:38

We see two rows, representing the old statistics of the sales table. The first was from the time, as there where NULL entries  (before the first gather_table_stats). The second row represents the accurate statistics. I am going to restore them:

SQL> begin
dbms_stats.restore_table_stats('ADAM','SALES',
to_timestamp('2010-05-18:09:47:38','yyyy-mm-dd:hh24:mi:ss'));
end;
/
PL/SQL procedure successfully completed.
SQL> select num_rows,blocks from user_tables;
 NUM_ROWS     BLOCKS
---------- ----------
 29933962     119585

The good old statistics are back, but the statements are still in the library cache and may execute with the same misguided execution plan:

SQL> set timing on
SQL> select amount_sold from sales where id=4712;

AMOUNT_SOLD
-----------
 5000

Elapsed: 00:00:01.92

Still full table scan! The following empties the shared pool and forces the computation of new execution plans, using the accurate statistics:

SQL> alter system flush shared_pool;

System altered.

Elapsed: 00:00:00.04
SQL> select amount_sold from sales where id=4712;

AMOUNT_SOLD
-----------
 5000

Elapsed: 00:00:00.01

We fixed the problem by restoring the old stats. Should the problem persist – because apparently, optimizer statistics did not cause it – we can get back the new computed statistics also:

SQL> select table_name,stats_update_time from user_tab_stats_history;

TABLE_NAME                     STATS_UPDATE_TIME
------------------------------ ---------------------------------------------------------------------------
SALES                          2010-05-18:09:47:16
SALES                          2010-05-18:09:47:38
SALES                          2010-05-18:09:52:30

The last line represents the last historized stats.

SQL> begin
dbms_stats.restore_table_stats('ADAM','SALES',
to_timestamp('2010-05-18:09:52:30','yyyy-mm-dd:hh24:mi:ss'));
end;
/  2    3    4    5

PL/SQL procedure successfully completed.

SQL> select num_rows,blocks from user_tables;

 NUM_ROWS     BLOCKS
---------- ----------
 100          1

Conclusion: If we suspect that new Optimizer Statistics are responsible for a performance problem, we can restore the „good old statistics“ very easy since 10g.

,

12 Kommentare

Retrieve SQL and Execution Plan from AWR Snapshots

snap

A handy little script that I have used in the Oracle Database 11g Performance Tuning course this week. It queries the AWR Snapshots for SQL statements and sorts them by elapsed time:

SQL> col parsed format a6
col sql_text format a40
set lines 200
set pages 300

select     
 sql_text,
 parsing_schema_name as parsed,
 elapsed_time_delta/1000/1000 as elapsed_sec,
 stat.snap_id,
 to_char(snap.end_interval_time,'dd.mm hh24:mi:ss') as snaptime,
 txt.sql_id
from     
 dba_hist_sqlstat stat,
 dba_hist_sqltext txt,
 dba_hist_snapshot snap
where     
 stat.sql_id=txt.sql_id and
 stat.snap_id=snap.snap_id and
 snap.begin_interval_time>=sysdate-1 and
 lower(sql_text) like '%&t%' and
 parsing_schema_name not in ('SYS','SYSMAN','MDSYS','WKSYS')
order by elapsed_time_delta asc;

sysdate-1 returns SQL up to yesterday and can be modified to retrieve SQL as long as there are AWR snapshots of course. &t is a substring of the SQL, we may be interested in. With the sql_id, we can also retrieve the execution plan from the snapshots:

SQL> select plan_table_output from table (dbms_xplan.display_awr('&sqlid'));

The amount of SQL that gets collected into a snapshot can be modified with

SQL> begin
dbms_workload_repository.modify_snapshot_settings (
 topnsql=>'MAXIMUM');
end;
/

MAXIMUM (uppercase!) takes all SQL from the Library Cache into the snapshot – alternative, we can specify a number of Top SQL to collect. The example is from an 11g Database, but it should work the same with 10g already.

, ,

8 Kommentare