Beiträge getaggt mit 10g New Features

DIFF_TABLE_STATS_IN_HISTORY Example

In 10g, we introduced the automatic retention of old Optimizer Statistics. We might want to compare the Optimizer Statistics, presently in use with the old stats. This posting gives an example that shows how to do that. The Online Documentation for the function has no example for that – probably it is considered too trivial 🙂

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0    Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 - Production
SQL> connect scott/tiger
Connected.

SQL> alter session set nls_timestamp_tz_format='yyyy-mm-dd:hh24:mi:ss';

Session altered.

SQL> alter session set nls_date_format='yyyy-mm-dd:hh24:mi:ss';

Session altered.
SQL> select num_rows,last_analyzed from user_tables where table_name='DEPT';

  NUM_ROWS LAST_ANAL
---------- ---------

There are no Optimizer Statistics yet for the DEPT table of SCOTT. We change that:

SQL> exec dbms_stats.gather_table_stats('SCOTT','DEPT')

PL/SQL procedure successfully completed.
SQL> select num_rows,last_analyzed from user_tables where table_name='DEPT';

  NUM_ROWS LAST_ANALYZED
---------- -------------------
     4 2012-04-23:16:14:38

This is my demo setup. Now we have some DML on the table and gather statistics afterwards:

SQL> insert into dept values (50,'TEST','TEST');

1 row created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats('SCOTT','DEPT')

PL/SQL procedure successfully completed.

SQL> select num_rows,last_analyzed from user_tables where table_name='DEPT';

  NUM_ROWS LAST_ANALYZED
---------- -------------------
     5 2012-04-23:16:17:42

The above stats are actually in use, while the old stats got recorded. They show up in dba_tab_stats_history with the time they got stored there:

SQL> select table_name,stats_update_time from user_tab_stats_history;

TABLE_NAME               STATS_UPDATE_TIME
------------------------------ ---------------------------------------------------------------------------
DEPT                   2012-04-23:16:14:38
DEPT                   2012-04-23:16:17:42

I got two rows above; the first represents the NULL stats before the first gather, the second row are the stats before the second gather. We can now compare the present stats with the old stats in this way:

SQL> select * from table(dbms_stats.diff_table_stats_in_history(
                    ownname => user,
                    tabname => upper('&tabname'),
                    time1 => systimestamp,
                    time2 => to_timestamp('&time2','yyyy-mm-dd:hh24:mi:ss'),
                    pctthreshold => 0));   
Enter value for tabname: dept
old   3:             tabname => upper('&tabname'),
new   3:             tabname => upper('dept'),
Enter value for time2: 2012-04-23:16:17:42
old   5:             time2 => to_timestamp('&time2','yyyy-mm-dd:hh24:mi:ss'),
new   5:             time2 => to_timestamp('2012-04-23:16:17:42','yyyy-mm-dd:hh24:mi:ss'),

STATISTICS DIFFERENCE REPORT FOR:
.................................

TABLE          : DEPT
OWNER          : SCOTT
SOURCE A      : Statistics as of 2012-04-23:16:25:16
SOURCE B      : Statistics as of 2012-04-23:16:17:42
PCTTHRESHOLD  : 0
TABLE / (SUB)PARTITION STATISTICS DIFFERENCE:
.............................................

OBJECTNAME            TYP SRC ROWS       BLOCKS      ROWLEN     SAMPSIZE
...............................................................................

DEPT                   T    A   5           8          18         5
                            B   4           4          20         4

I had to shorten and adjust the above report to make it look nice on WordPress – there is also a section about Column Statistics in it that I left out here.
Conclusion: We can not only restore old Optimizer Statistics easy, we can also compare the old stats with the present stats beforehand.

,

10 Kommentare

Optimizer Stats: Treat some tables different than AutoTask does

This posting was inspired by an OTN thread. Since 10g, we have an Automatic Maintenance Task (AutoTask) that gathers Optimizer Statistics during the night. This task uses DBMS_STATS.GATHER_DATABASE_STATS with default values. For whatever reason, we may consider these defaults not appropriate for some tables. We have 2 options:

  1. Write our own procedure/job to gather statistics different (10g way)
  2. Change the Statistic Preferences for these tables (11g way)

We look at the 11g New Feature first, because it is the more efficient way to go:

SQL> create table adam.a as
 select
 rownum as id,
 rownum + 1 as flag,
 'Oracle Enterprise Edition' as product,
 mod(rownum,5) as channel_id,
 mod(rownum,1000) as cust_id ,
 5000 as amount_sold,
 trunc(sysdate - 10000 + mod(rownum,10000)) as order_date,
 trunc(sysdate - 9999 + mod(rownum,10000)) as ship_date
 from dual connect by level<=1e6;

Table created.


SQL> create table adam.b as select * from adam.a;

Table created.

SQL> select table_name,num_rows,sample_size,to_char(last_analyzed,'hh24:mi:ss') from user_tables;

TABLE_NAME                       NUM_ROWS SAMPLE_SIZE TO_CHAR(
------------------------------ ---------- ----------- --------
A
B
SQL> exec dbms_stats.gather_schema_stats('ADAM')

PL/SQL procedure successfully completed.

SQL> select table_name,num_rows,sample_size,to_char(last_analyzed,'hh24:mi:ss') from user_tables;

TABLE_NAME                       NUM_ROWS SAMPLE_SIZE TO_CHAR(
------------------------------ ---------- ----------- --------
A                                 1000000     1000000 10:03:11
B                                 1000000     1000000 10:03:12

You may notice that DBMS_STATS analyzed all the blocks/rows by default – that is because the tables are rather small in size. I want to change that for table b:

SQL> exec dbms_stats.set_table_prefs('ADAM','B','ESTIMATE_PERCENT','50')

PL/SQL procedure successfully completed.

Next, I will delete more than 10 Percent of both tables, so that DBMS_STATS will consider both as stale and collect new stats for them. Then I call again GATHER_SCHEMA_STATS as before, but b is treated different:

SQL> delete from a where rownum<200000;

199999 rows deleted.

SQL> delete from b where rownum<200000;

199999 rows deleted.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_schema_stats('ADAM')

PL/SQL procedure successfully completed.

SQL> select table_name,num_rows,sample_size,to_char(last_analyzed,'hh24:mi:ss') from user_tables;

TABLE_NAME                       NUM_ROWS SAMPLE_SIZE TO_CHAR(
------------------------------ ---------- ----------- --------
A                                  800001      800001 10:10:06
B                                  801196      400598 10:10:09

Notice the different sample size of b. Now to the old (10g) way to achieve the same. If we would just let the AutoTask collect all the stats (including our „different tables“) and after that collect stats again on those tables, we would do the doubled work for them. Therefore we lock the stats (10g New Feature). I drop & recreate the tables a & b the same as before. Then:

SQL> exec dbms_stats.lock_table_stats('ADAM','B')

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_schema_stats('ADAM')

PL/SQL procedure successfully completed.
SQL> create procedure mygather
as
begin
dbms_stats.unlock_table_stats('ADAM','B');
dbms_stats.gather_table_stats('ADAM','B',estimate_percent=>50);
dbms_stats.lock_table_stats('ADAM','B');
end;


Procedure created.

SQL> exec mygather

PL/SQL procedure successfully completed.

SQL> select table_name,num_rows,sample_size,to_char(last_analyzed,'hh24:mi:ss') from user_tables;

TABLE_NAME                       NUM_ROWS SAMPLE_SIZE TO_CHAR(
------------------------------ ---------- ----------- --------
A                                 1000000     1000000 10:18:27
B                                  999662      499831 10:18:54

Conclusion: If we are aware of the (New) Features of each Oracle Database version, we are able to do our work more efficiently. One aspect of this general rule is the dealing with Optimizer Statistics, that we can collect different from the AutoTasks default for some tables if we are inclined to do so.

,

8 Kommentare

Reorganizing Tables in Oracle – is it worth the effort?

reorg

This topic seems to be some kind of „Evergreen“, since it comes up regularly in my courses and also in the OTN Discussion Forum. I decided therefore to cover it briefly here in order to be able to point to this post in the future.

Short answer: Probably not

If the intention of the reorganizing operation is to gain space resp. „defragment“ the table, this operation is very likely just a waste of effort & resources. I will try to illustrate my point with a simple demonstration that anybody with access to an Oracle Database can easily reproduce. My demo Database is 11gR2, but the same can be done with 10g also. If you are on an even older version, there is no SHRINK SPACE available, so you would have to use MOVE instead. I prepare a little demo Table with 1Mio rows now – the time_id population was the hardest part of that demo for me 🙂

SQL> create table sales as
select
'Oracle Enterprise Edition' as product,
mod(rownum,5) as channel_id,
mod(rownum,1000) as cust_id ,
5000 as amount_sold,
to_date
('01.' || lpad(to_char(mod(rownum,12)+1),2,'0') || '.2010' ,'dd.mm.yyyy')
as time_id
from dual connect by level<=1e6;  Table created.  SQL> select time_id ,count(*)
from sales group by time_id
order by 1;

TIME_ID     COUNT(*)
--------- ----------
01-JAN-10      83333
01-FEB-10      83334
01-MAR-10      83334
01-APR-10      83334
01-MAY-10      83334
01-JUN-10      83333
01-JUL-10      83333
01-AUG-10      83333
01-SEP-10      83333
01-OCT-10      83333
01-NOV-10      83333
01-DEC-10      83333

12 rows selected.
SQL> select segment_name,bytes/1024/1024 as mb from user_segments;
SEGMENT_NAME                 MB
-------------------- ----------
SALES                        54

The table contains about 83000 rows per month. Now I will delete the first quarter of rows:

SQL> delete from sales where time_id<to_date('01.04.2010','dd.mm.yyyy');

250001 rows deleted.

SQL> commit;

Commit complete.
SQL> select segment_name,bytes/1024/1024 as mb from user_segments;

SEGMENT_NAME                 MB
-------------------- ----------
SALES                        54

This is the starting point of a possible reorganization: Although 250k rows got deleted, the table consumes the same space as before. In other words: The High Water Mark did not move. A reorganization would move the High Water Mark and would regain the space that was consumed by the 250k rows, like shown in the below picture:

Picture of a table before and after reorganize

The question is: Is that necessary? If inserts would take place after the deletion again, then the space would get reused without any need to reorganize:

SQL> insert into sales
select
rownum as id,
mod(rownum,5) as channel_id,
mod(rownum,1000) as cust_id ,
5000 as amount_sold,
to_date
('01.' || lpad(to_char(mod(rownum,3)+1),2,'0') || '.2011' ,'dd.mm.yyyy')
as time_id
from dual connect by level<=2.5e5;

250000 rows created.

SQL> commit;

Commit complete.

SQL> select time_id ,count(*)
from sales group by time_id
order by 1;  

TIME_ID     COUNT(*)
--------- ----------
01-APR-10      83334
01-MAY-10      83334
01-JUN-10      83333
01-JUL-10      83333
01-AUG-10      83333
01-SEP-10      83333
01-OCT-10      83333
01-NOV-10      83333
01-DEC-10      83333
01-JAN-11      83333
01-FEB-11      83334
01-MAR-11      83333

12 rows selected.

I inserted a new quarter of rows. The table remains in the same size as before:

SQL> select segment_name,bytes/1024/1024 as mb from user_segments;

SEGMENT_NAME                 MB
-------------------- ----------
SALES                        54

That is exactly the point I like to emphasize: If you have inserts following deletes, reorganization of tables is not necessary! Only if that is not the case (Table gets no longer inserts after deletion), you may reorganize it:

SQL> delete from sales where time_id<to_date('01.07.2010','dd.mm.yyyy');  250001 rows deleted. SQL > commit;
Commit complete.

SQL> alter table sales enable row movement;

Table altered.
SQL> alter table sales shrink space;

Table altered.
SQL> select segment_name,bytes/1024/1024 as mb from user_segments;

SEGMENT_NAME                 MB
-------------------- ----------
SALES                     40.25

The space consumed by the table got reduced now and is usable for other segments. Although the table is not locked during the SHRINK SPACE operation and users can do DML on the table as usual, the operation is not „for free“ in terms of resource consumption. It does a lot of I/O, creates a lot before images that consume space in the UNDO tablespace and the operation modifies many blocks, so lots of redo protocol (and therefore many archived logs, probably) gets generated.

If you really think that you need that kind of reorganization regularly, you should probably evaluate the Partitioning Option here:

SQL> create table sales_part
(product char(25), channel_id number,
 cust_id number, amount_sold number, time_id date)  
partition by range (time_id)  
interval (numtoyminterval(3,'month'))
(partition p1 values less than (to_date('01.04.2010','dd.mm.yyyy')))
;

Above command created a table that is partitioned by the quarter. We could simply drop a partition instead of deleting lots of rows and we would never need to reorganize to regain space here. If you are on 10g, the feature INTERVAL PARTITIONING is not available there. You can then use RANGE PARTITIONING with the additional effort that you need to create the necessary range partitions manually.

Conclusion: Before you decide to reorganize  a table, make sure that this is really necessary, because likely it isn’t 🙂

Watch me explaining the above on YouTube:

, ,

39 Kommentare