Archiv für die Kategorie TOI

New organization of the Cluster Stack with Oracle Grid Infrastructure 11gR2

Veröffentlicht von Uwe Hesse in TOI am Juni 8, 2010

Just a short notice to point you to a very instructive posting of my dear colleage Joel Goodman – the start of a new series related to Oracle Grid Infrastructure & Real Application Clusters 11gR2 that I am now looking forward to with much interest!

11g New Features, High Availability, RAC

Hinterlasse einen Kommentar

Restoring old Optimizer Statistics for troubleshooting purpose

Veröffentlicht von Uwe Hesse in TOI am Mai 18, 2010

We deliver a Oracle Database 10g (sic!) New Features course this week in our education center in Düsseldorf. Many customers are still on 10g productively and some even just upgrade from 9i to 10g, still being in the process of evaluating 11g. Many new features in the 10g version are related to the optimizer. Some are quite well known to the public, like the obsolence of the rule based optimizer and the presence of a scheduler job that gathers optimizer statistics every night out-of-the-box.

Others lead a more quiet life, one of them is the possibility to restore old optimizer statistics (i.e. as a troubleshooting measure), should there be issues with the newly gathered ones. Since 10g, we do not simply overwrite old optimizer statistics by gathering new. Instead, the old optimizer statistics are automatically historized and kept for one month by default, so that they can be restored easily if that should be desired. Now did you know that? Here we go with a short demonstration of restoring historic optimizer statistics:

SQL> select * from v$version;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
PL/SQL Release 10.2.0.1.0 - Production
CORE    10.2.0.1.0      Production
TNS for Linux: Version 10.2.0.1.0 - Production
NLSRTL Version 10.2.0.1.0 - Production

SQL> grant dba to adam identified by adam;
Grant succeeded.

SQL> connect adam/adam
Connected.

SQL> create table sales as select
rownum as id,
mod(rownum,5) as channel_id,
mod(rownum,1000) as cust_id,
5000 as amount_sold,
sysdate as time_id
from dual connect by level<=3e7;
Table created.

SQL> create index sales_idx on sales(id) nologging;
Index created.

SQL> select segment_name,bytes/1024/1024 as mb from user_segments;

SEGMENT_NAME                     MB
------------------------ ----------
SALES                           942
SALES_IDX                       566

I just created a demo user, a table and an index on that table. Notice that the two segments take about 1.5 Gig space, should you like to reproduce the demo yourself. Right now, there are no optimizer statistics for the table:

SQL> select num_rows,blocks from user_tables;

 NUM_ROWS     BLOCKS
---------- ----------
"NULL values here"

I am now going to gather statistics on the table manually – the same would be done automatically by the standard scheduler job during the night:

SQL> exec dbms_stats.gather_table_stats('ADAM','SALES')

PL/SQL procedure successfully completed.

SQL> select num_rows,blocks from user_tables;

 NUM_ROWS     BLOCKS
---------- ----------
 29933962     119585

SQL> select count(*) from sales;

 COUNT(*)
----------
 30000000

As we can see, the statistics are quite accurate, reflecting well the actual size of the table. The index is used for the following query, as we can tell by runtime already:

SQL> set timing on
SQL> select amount_sold from sales where id=4711;

AMOUNT_SOLD
-----------
 5000

Elapsed: 00:00:00.00

I am now going to introduce a problem with the optimizer statistics artificially by just setting them very inaccurate. A real-world problem that is caused by new optimizer statistics is a little harder to come up with – probably you will never encounter it during your career…

SQL> exec dbms_stats.set_table_stats('ADAM','SALES',numrows=>100,numblks=>1)

PL/SQL procedure successfully completed.

SQL> select num_rows,blocks from user_tables;

 NUM_ROWS     BLOCKS
---------- ----------
 100          1

With the above (completely misleading) statistics, the optimizer will think that a full table scan of the sales table is fairly cheap. Please notice that I ask after id 4712 and not 4711 again, because it could happen that the already computed execution plan (index range scan) is still in the library cache available for reuse. I could also flush the shared pool here to make sure that a new execution plan has to be generated for the id 4711.

SQL> select amount_sold from sales where id=4712;

AMOUNT_SOLD
-----------
 5000

Elapsed: 00:00:01.91

We can tell by the runtime of almost 2 seconds here that this was a full table scan. Proof would be to retrieve the execution plan from the library cache. I leave that to your studies. Please be aware that the autotrace feature might be misleading here. For our scope, it is enough to say that we have an issue caused by generation of new optimizer statistics. We want to get back our good old statistics! Therefore, we look at the historic optimizer statistics:

SQL> alter session set NLS_TIMESTAMP_TZ_FORMAT='yyyy-mm-dd:hh24:mi:ss';

Session altered.

SQL> select table_name,stats_update_time from user_tab_stats_history;

TABLE_NAME                     STATS_UPDATE_TIME
------------------------------ --------------------------------------
SALES                          2010-05-18:09:47:16
SALES                          2010-05-18:09:47:38

We see two rows, representing the old statistics of the sales table. The first was from the time, as there where NULL entries (before the first gather_table_stats). The second row represents the accurate statistics. I am going to restore them:

SQL> begin

dbms_stats.restore_table_stats('ADAM','SALES',

to_timestamp('2010-05-18:09:47:38','yyyy-mm-dd:hh24:mi:ss'));

end;

PL/SQL procedure successfully completed.

SQL> select num_rows,blocks from user_tables;

 NUM_ROWS     BLOCKS

---------- ----------
 29933962     119585

The good old statistics are back, but the statements are still in the library cache and may execute with the same misguided execution plan:

SQL> set timing on
SQL> select amount_sold from sales where id=4712;

AMOUNT_SOLD
-----------
 5000

Elapsed: 00:00:01.92

Still full table scan! The following empties the shared pool and forces the computation of new execution plans, using the accurate statistics:

SQL> alter system flush shared_pool;

System altered.

Elapsed: 00:00:00.04
SQL> select amount_sold from sales where id=4712;

AMOUNT_SOLD
-----------
 5000

Elapsed: 00:00:00.01

We fixed the problem by restoring the old stats. Should the problem persist – because apparently, optimizer statistics did not cause it – we can get back the new computed statistics also:

SQL> select table_name,stats_update_time from user_tab_stats_history;

TABLE_NAME                     STATS_UPDATE_TIME
------------------------------ ---------------------------------------------------------------------------
SALES                          2010-05-18:09:47:16
SALES                          2010-05-18:09:47:38
SALES                          2010-05-18:09:52:30

The last line represents the last historized stats.

SQL> begin
dbms_stats.restore_table_stats('ADAM','SALES',
to_timestamp('2010-05-18:09:52:30','yyyy-mm-dd:hh24:mi:ss'));
end;
/  2    3    4    5

PL/SQL procedure successfully completed.

SQL> select num_rows,blocks from user_tables;

 NUM_ROWS     BLOCKS
---------- ----------
 100          1

Conclusion: If we suspect that new Optimizer Statistics are responsible for a performance problem, we can restore the „good old statistics“ very easy since 10g.

10g New Features, Performance Tuning

12 Kommentare

Performance Impact of Logical Standby on the Primary

Veröffentlicht von Uwe Hesse in TOI am April 21, 2010

Implementing High Availabilty for an Oracle Database may impact the performance of the protected Database, depending on the method used. If Data Guard is used, the two protection levels Maximum Availability and Maximum Protection may impact performance of the Primary, as discussed for example in SMITH, MICHAEL T. (2007): Data Guard Redo Transport and Network Best Practices: Oracle Database 10g Release 2, available on OTN.

A special case is the use of a Logical Standby Database, because there is an additional possible Performance Impact, regardless of the protection level: Because the actualization of the Logical Standby is done with SQL Apply, more Redo is generated on the Primary to enable the retrievement of the changed rows on the Standby. With Physical Standby, that is not necessary, because the ROWID recorded on the Primary is the same on the Standby. But on Logical Standby Databases, rows can reside in completely different blocks, so the ROWID from the Primary is meaningless.

Therefore, with Logical Standby present, at least additionally the Primary Key resp. a Unique Column of rows, modified on the Primary Database is recorded. If there is no Primary Key on the table where rows where modified, we record all columns additionally. So depending on the number of columns and the absence of Primary Keys resp. Unique Columns on the production system, a Logical Standby may significantly impact the performance of the Primary Database. Fortunately, we can test that easily before we actually implement a Logical Standby – this posting is designed to show how to do that.

First, we prepare a table without Primary/Unique Columns and relatively many columns:

SQL> drop user adam cascade;
grant dba to adam identified by adam;
connect adam/adam@prima

create table sales as select
rownum as id,
mod(rownum,5) as channel_id,
mod(rownum,1000) as cust_id,
5000 as amount_sold,
sysdate as time_id,
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' as col1,
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' as col2,
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' as col3,
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' as col4,
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' as col5,
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' as col6,
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' as col7,
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' as col8,
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' as col9,
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' as col10,
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' as col11,
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' as col12,
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' as col13,
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' as col14,
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' as col15,
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' as col16,
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' as col17,
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' as col18,
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' as col19,
'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA' as col20
from dual connect by level<=2e5;

That created a table with 200.000 rows and 176 MB in size. We will now record statistics under normal circumstances when no Logical Standby is present.

SQL> exec dbms_workload_repository.create_snapshot

update sales set amount_sold=amount_sold-1;

commit;

exec dbms_workload_repository.create_snapshot

@?/rdbms/admin/awrrpt

The update above took about 27 seconds to complete. We look now at two sections of the AWR-Report:

Elapsed Time was 0.86 minutes, DB Time 0.54 minutes, LGWR wrote 113 MB during that period. There is nothing special with these numbers, they only become meaningful by comparison. Now we setup supplemental log data as if there would be a Logical Standby Database present that would need it. If you are on 11g, the creation of a Logical Standby Database automatically turns on Supplemental Logging (You can spot this by looking at the alert logfile of the Primary during creation of the Logical Standby or by looking at v$database).

SQL> alter database add supplemental log data 
(primary key, unique index) columns;

exec dbms_workload_repository.create_snapshot

update sales set amount_sold=amount_sold-1;

commit;

exec dbms_workload_repository.create_snapshot

@?/rdbms/admin/awrrpt

alter database drop supplemental log data 
(primary key, unique index) columns;

The very same update as before now took about 45 seconds. If we look at the same AWR-sections as before, we can see why:

Elapsed Time increased to 1.18 minutes, DB Time increased to 0.78 minutes, LGWR had to write 249 MB now – more than doubled as before! Of course, the impact is artificially high in this example, but you can see how easy this can be tested before a Logical Standby was actually created. Although this demonstration here was with 11g Release 2, it can be done in the same way with lower versions also.

Smith, Michael T. (2007): Data Guard Redo Transport and Network Best Practices: Oracle Database 10g Release 2.

Data Guard, Performance Tuning

4 Kommentare