Posts Tagged partitioning

Materialized Views & Partition Change Tracking

During the 11g Data Warehouse Administration course that I delivered this week in Munich, I have shown this demonstration about Partition Change Tracking (PCT) that I’d like to share with the Oracle Community. You may have a look here for an Introduction into Materialized Views if that topic is new for you. Often, the Base Tables of Materialized Views are partitioned, which gives use additional options for the Refresh as well as for the Rewrite:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

SQL> grant dba to adam identified by adam;

Grant succeeded.

SQL> connect adam/adam
Connected.
SQL> create table sales
 (product     varchar2(50),
  channel_id  number,
  cust_id     number,
  amount_sold number(10,2),
  time_id     date)
partition by list (channel_id)
(partition c0 values (0),
 partition c1 values (1),
 partition c2 values (2),
 partition c3 values (3),
 partition c4 values (4)
);
Table created.
SQL> alter table sales nologging;

Table altered.
SQL> insert /*+ append */ into sales
select
'Oracle Enterprise Edition' as product,
mod(rownum,5) as channel_id,
mod(rownum,1000) as cust_id ,
5000 as amount_sold,
to_date
('01.' || lpad(to_char(mod(rownum,12)+1),2,'0') || '.2010' ,'dd.mm.yyyy')
as time_id
from dual connect by level<=1e6; 
1000000 rows created.

SQL> commit;

Commit complete.

The above was my demo setup, done with 11.2.0.1 Most of the shown should work with lower versions also, though. We will now see a Materialized View that contains the Partition Key of the Base Table, which makes it easily possible for the system to associate the rows of the Materialized View with the Partitions. Otherwise, we would need to explain that with a Partition Marker.

SQL> create materialized view mv_pct
enable query rewrite as
select channel_id,sum(amount_sold) from sales group by channel_id;  

Materialized view created.

The MV is now fresh and may get used for Query Rewrite:

SQL> select mview_name,last_refresh_type,staleness from user_mviews;

MVIEW_NAME                     LAST_REF STALENESS
------------------------------ -------- -------------------
MV_PCT                         COMPLETE FRESH

SQL> set autotrace on explain
SQL> select channel_id,sum(amount_sold) from sales group by channel_id;

CHANNEL_ID SUM(AMOUNT_SOLD)
---------- ----------------
         0       1000000000
         1       1000000000
         2       1000000000
         3       1000000000
         4       1000000000

Execution Plan
----------------------------------------------------------
Plan hash value: 470332451

---------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        |     5 |   130 |     3   (0)| 00:00:01 |
|   1 |  MAT_VIEW REWRITE ACCESS FULL| MV_PCT |     5 |   130 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)

Now we modify one row:

SQL> set autotrace off
SQL> update sales set amount_sold=1 where rownum<2;

1 row updated.

SQL> commit;

Commit complete.:

This makes the MV stale right? Well actually not all of it:

SQL> select mview_name,last_refresh_type,staleness from user_mviews;

MVIEW_NAME                     LAST_REF STALENESS
------------------------------ -------- -------------------
MV_PCT                         COMPLETE NEEDS_COMPILE

SQL> select detail_partition_name, freshness from user_mview_detail_partition;

DETAIL_PARTITION_NAME          FRESH
------------------------------ -----
C4                             FRESH
C3                             FRESH
C2                             FRESH
C1                             FRESH
C0                             STALE

The second Dictionary View is new in 11g, not the behavior of the next query. Only Partition C0 is stale, which is why queries that do not need channel_id 0 can still use the MV:

SQL> set autotrace on explain
SQL> select sum(amount_sold) from sales where channel_id=1;

SUM(AMOUNT_SOLD)
----------------
      1000000000


Execution Plan
----------------------------------------------------------
Plan hash value: 1805549181

----------------------------------------------------------------------------------------
| Id  | Operation                     | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |        |     1 |    26 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE               |        |     1 |    26 |            |          |
|*  2 |   MAT_VIEW REWRITE ACCESS FULL| MV_PCT |     1 |    26 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("MV_PCT"."CHANNEL_ID"=1)

Note
-----
   - dynamic sampling used for this statement (level=2)

The Rewrite is still possible although part of the MV is stale. We call that PCT Rewrite. Furthermore, we can now do a PCT Fast Refresh, although there is no MV Log on the Base Table sales. The PCT Refresh does not need to scan the whole Base Table but only the stale partitions, which is in this case about 5 times faster than a Complete Refresh, because we have 5 equally sized partitions. The value ‘P‘ is new in 11g, but it also works with ‘F’ or ‘?’ in earlier versions.

SQL> set autotrace off
SQL> exec dbms_mview.refresh('MV_PCT','P')

PL/SQL procedure successfully completed.

SQL> select mview_name,last_refresh_type,staleness from user_mviews;

MVIEW_NAME                     LAST_REF STALENESS
------------------------------ -------- -------------------
MV_PCT                         FAST_PCT FRESH

Conclusion: Partitioning delivers some extra benefits when used together with Materialized Views. Especially, Rewrite & Refresh can be done on the Partition layer then. As always: “Don’t believe it, test it!” :-)

, , ,

Leave a Comment

CELL_PARTITION_LARGE_EXTENTS now obsolete

During the Exadata course that I am just delivering in Munich, I noticed that the fairly new parameter CELL_PARTITION_LARGE_EXTENTS is already obsolete now:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SQL> alter system set cell_partition_large_extents=true;
alter system set cell_partition_large_extents=true
*
ERROR at line 1:
ORA-25138: CELL_PARTITION_LARGE_EXTENTS initialization parameter has been made
obsolete

This parameter was introduced in 11.2.0.1 especially for Exadata Database Machine because the Allocation Unit Size (AU_SIZE) for Diskgroups built upon Exadata Cells is recommended with 4 MB. Large Segments should therefore use a multiple of 4 MB already for their initial extents. Although the parameter was made obsolete, the effect that was achievable with it is still present:

SQL> create table t (n number) partition by range (n) (partition p1 values less than (2));

Table created.

SQL> insert into t values (1);

1 row created.

SQL> select bytes/1024/1024 as mb from user_extents where segment_name='T';

 MB
----------
 8

I inserted before checking USER_EXTENTS because of the 11g New Feature deferred segment creation:

SQL> drop table t purge;

Table dropped.

SQL> create table t (n number);

Table created.

SQL> select bytes/1024/1024 as mb from user_extents where segment_name='T';

no rows selected

SQL> insert into t values (1);

1 row created.

SQL> select bytes/1024/1024 as mb from user_extents where segment_name='T';

 MB
----------
 .0625

Notice that only partitioned tables are affected by the 8 MB initial extent behavior. The new hidden parameter _PARTITION_LARGE_EXTENTS (defaults to true!) is now responsible for that:

SQL> alter session set "_partition_large_extents"=false;

Session altered.

SQL> drop table t purge;

Table dropped.

SQL> create table t (n number) partition by range (n) (partition p1 values less than (2));

Table created.

SQL> insert into t values (1);

1 row created.

SQL> select bytes/1024/1024 as mb from user_extents where segment_name='T';

 MB
----------
 .0625

Notice that the setting of CELL_PARTITION_LARGE_EXTENTS with alter session is silently overridden by the underscore parameter:

SQL> drop table t purge;

Table dropped.

SQL> alter session set cell_partition_large_extents=true;

Session altered.

SQL> create table t (n number) partition by range (n) (partition p1 values less than (2));

Table created.

SQL> insert into t values (1);

1 row created.

SQL> select bytes/1024/1024 as mb from user_extents where segment_name='T';

 MB
----------
 .0625

The parameter setting of the underscore parameter was still false.

SQL> drop table t purge;

Table dropped.

SQL> alter session set "_partition_large_extents"=true;

Session altered.

SQL> alter session set cell_partition_large_extents=false;

Session altered.

SQL> create table t (n number) partition by range (n) (partition p1 values less than (2));

Table created.

SQL> insert into t values (1);

1 row created.

SQL> select bytes/1024/1024 as mb from user_extents where segment_name='T';

 MB
----------
 8

Conclusion: With 11.2.0.2, partitioned tables get initial extents of 8 MB in size, which is particular useful in Exadata Environments where the ASM AU_SIZE will be 4 MB. But also ordinary Databases are affected – which is probably a good thing if we assume that partitioned tables will be large in size anyway and will therefore benefit from a large initial extent size as well.

, ,

4 Comments

Exadata Part IV: Flash Cache

With this posting we will look at the Flash Cache, built into the Exadata Database Machine. This feature is one major reason why Exadata is not only beneficial for Data Warehouse but also for OLTP. In a Full Rack, we have 8 Database Servers and 14 Storage Servers. Each Storage Server (brief: Cell) contains 4 PCIe Flash Cards:

This Flash Card delivers 96 GB Flash Storage, devided into 4 Flash Drives. This summarizes to 5 TB Flash Storage for a Full Rack – many Databases will probably fit completely into it. In an OLTP Database, the number of IOs per second deliverable is one of the most critical factors. With the above configuration, we can deliver up to 1 Million IOs per second.

The default way to deal with the Flash Storage is to use it completely as Flash Cache. You may think of Flash Cache as a prolongation of the Database Buffer Cache. It is populated automatically by the system with objects deemed useful to cache them. Without any intervention it is used that way:

CellCLI> list flashcache detail
 name:                   exa5cel02_FLASHCACHE
 cellDisk:                [... list of all 16 flashdisks]
 creationTime:           2011-02-02T01:14:04-08:00
 degradedCelldisks:
 effectiveCacheSize:     365.25G
 id:                     880d826b-47cc-4cf5-95fc-c36d6d315ba8
 size:                   365.25G
 status:                 normal

That was from one of the cells as the celladmin user connected using the command line interface. The whole Flash Storage is in use for Flash Cache on that cell. On the Database Layer, we may see the effect like this:

SQL> select name,value  from v$sysstat where name in
('physical read total IO requests','cell flash cache read hits');  

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
physical read total IO requests                                    51572139
cell flash cache read hits                                         32344851

That is already a good ratio: The majority of IO requests is resolved from the Flash Cache. We can specify storage attributes on the segment layer to influence the caching behavior of that segment:

SQL> select segment_name,cell_flash_cache from user_segments;
SEGMENT_NA CELL_FL
---------- -------
SALES      KEEP

We have 3 possible values here: DEFAULT (the default), KEEP and NONE. Keep means that the sales table will be stored in the Flash Cache “more aggressively” than the default. It will in other words increase the chance to read it from there. The table is the same as in the previous posting. Because of this setting and previous selects on the table that populated the Flash Cache with it, I am now able to read it from there. I reconnect to initialize v$mystat:

SQL> connect adam/adam
Connected.
SQL> set timing on
SQL>  select count(*) from sales;

 COUNT(*)
----------
 20000000

Elapsed: 00:00:00.50
SQL> select name,value  from v$mystat natural join v$statname where name in
('physical read total IO requests','cell flash cache read hits');  

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
physical read total IO requests                                       10265
cell flash cache read hits                                            10265

The second possibility to deal with the Flash Storage is to take a part of it for building ASM diskgroups upon. All files on these ASM diskgroups will then reside permanently on Flash Storage:

CellCLI> drop flashcache
Flash cache exa5cel01_FLASHCACHE successfully dropped

CellCLI> create flashcache all size=100g
Flash cache exa5cel01_FLASHCACHE successfully created

CellCLI> create griddisk all flashdisk prefix=flashdrive
GridDisk flashdrive_FD_00_exa5cel01 successfully created
GridDisk flashdrive_FD_01_exa5cel01 successfully created
GridDisk flashdrive_FD_02_exa5cel01 successfully created
GridDisk flashdrive_FD_03_exa5cel01 successfully created
GridDisk flashdrive_FD_04_exa5cel01 successfully created
GridDisk flashdrive_FD_05_exa5cel01 successfully created
GridDisk flashdrive_FD_06_exa5cel01 successfully created
GridDisk flashdrive_FD_07_exa5cel01 successfully created
GridDisk flashdrive_FD_08_exa5cel01 successfully created
GridDisk flashdrive_FD_09_exa5cel01 successfully created
GridDisk flashdrive_FD_10_exa5cel01 successfully created
GridDisk flashdrive_FD_11_exa5cel01 successfully created
GridDisk flashdrive_FD_12_exa5cel01 successfully created
GridDisk flashdrive_FD_13_exa5cel01 successfully created
GridDisk flashdrive_FD_14_exa5cel01 successfully created
GridDisk flashdrive_FD_15_exa5cel01 successfully created

The Flash Cache for this cell is now reduced to 100 GB; all means “upon all 16 Flash Drives” here. I am doing the same on the second cell – my Database Machine is limited to only one Server Node and two Cells. That gives me 32 Grid Disks based on Flash Drives to create ASM diskgroups upon:

CellCLI> drop flashcache
Flash cache exa5cel02_FLASHCACHE successfully dropped

CellCLI> create flashcache all size=100g
Flash cache exa5cel02_FLASHCACHE successfully created

CellCLI> create griddisk all flashdisk prefix=flashdrive
GridDisk flashdrive_FD_00_exa5cel02 successfully created
GridDisk flashdrive_FD_01_exa5cel02 successfully created
GridDisk flashdrive_FD_02_exa5cel02 successfully created
GridDisk flashdrive_FD_03_exa5cel02 successfully created
GridDisk flashdrive_FD_04_exa5cel02 successfully created
GridDisk flashdrive_FD_05_exa5cel02 successfully created
GridDisk flashdrive_FD_06_exa5cel02 successfully created
GridDisk flashdrive_FD_07_exa5cel02 successfully created
GridDisk flashdrive_FD_08_exa5cel02 successfully created
GridDisk flashdrive_FD_09_exa5cel02 successfully created
GridDisk flashdrive_FD_10_exa5cel02 successfully created
GridDisk flashdrive_FD_11_exa5cel02 successfully created
GridDisk flashdrive_FD_12_exa5cel02 successfully created
GridDisk flashdrive_FD_13_exa5cel02 successfully created
GridDisk flashdrive_FD_14_exa5cel02 successfully created
GridDisk flashdrive_FD_15_exa5cel02 successfully created

CellCLI> list griddisk flashdrive_FD_10_exa5cel02 detail
 name:                   flashdrive_FD_10_exa5cel02
 availableTo:
 cellDisk:               FD_10_exa5cel02
 comment:
 creationTime:           2011-02-02T02:56:52-08:00
 diskType:               FlashDisk
 errorCount:             0
 id:                     0000012d-e604-87f2-0000-000000000000
 offset:                 6.28125G
 size:                   16.578125G
 status:                 active

Changing to the Database Server Node to create the ASM diskgroup as sysasm:

SQL> select path,header_status, os_mb,free_mb from v$asm_disk where path like '%flash%'

PATH                                               HEADER_STATU      OS_MB    FREE_MB
-------------------------------------------------- ------------ ---------- ----------
o/192.168.14.10/flashdrive_FD_14_exa5cel02         CANDIDATE         16976          0
o/192.168.14.9/flashdrive_FD_12_exa5cel01          CANDIDATE         16976          0
o/192.168.14.10/flashdrive_FD_05_exa5cel02         CANDIDATE         16976          0
o/192.168.14.10/flashdrive_FD_11_exa5cel02         CANDIDATE         16976          0
o/192.168.14.10/flashdrive_FD_08_exa5cel02         CANDIDATE         16976          0
o/192.168.14.10/flashdrive_FD_15_exa5cel02         CANDIDATE         16976          0
o/192.168.14.10/flashdrive_FD_00_exa5cel02         CANDIDATE         16976          0
o/192.168.14.10/flashdrive_FD_03_exa5cel02         CANDIDATE         16976          0
o/192.168.14.10/flashdrive_FD_06_exa5cel02         CANDIDATE         16976          0
o/192.168.14.10/flashdrive_FD_12_exa5cel02         CANDIDATE         16976          0
o/192.168.14.10/flashdrive_FD_09_exa5cel02         CANDIDATE         16976          0
o/192.168.14.10/flashdrive_FD_01_exa5cel02         CANDIDATE         16976          0
o/192.168.14.10/flashdrive_FD_04_exa5cel02         CANDIDATE         16976          0
o/192.168.14.10/flashdrive_FD_13_exa5cel02         CANDIDATE         16976          0
o/192.168.14.10/flashdrive_FD_07_exa5cel02         CANDIDATE         16976          0
o/192.168.14.10/flashdrive_FD_10_exa5cel02         CANDIDATE         16976          0
o/192.168.14.9/flashdrive_FD_07_exa5cel01          CANDIDATE         16976          0
o/192.168.14.9/flashdrive_FD_04_exa5cel01          CANDIDATE         16976          0
o/192.168.14.9/flashdrive_FD_10_exa5cel01          CANDIDATE         16976          0
o/192.168.14.9/flashdrive_FD_01_exa5cel01          CANDIDATE         16976          0
o/192.168.14.9/flashdrive_FD_13_exa5cel01          CANDIDATE         16976          0
o/192.168.14.9/flashdrive_FD_08_exa5cel01          CANDIDATE         16976          0
o/192.168.14.9/flashdrive_FD_05_exa5cel01          CANDIDATE         16976          0
o/192.168.14.9/flashdrive_FD_02_exa5cel01          CANDIDATE         16976          0
o/192.168.14.9/flashdrive_FD_14_exa5cel01          CANDIDATE         16976          0
o/192.168.14.9/flashdrive_FD_11_exa5cel01          CANDIDATE         16976          0
o/192.168.14.9/flashdrive_FD_00_exa5cel01          CANDIDATE         16976          0
o/192.168.14.9/flashdrive_FD_09_exa5cel01          CANDIDATE         16976          0
o/192.168.14.9/flashdrive_FD_03_exa5cel01          CANDIDATE         16976          0
o/192.168.14.9/flashdrive_FD_15_exa5cel01          CANDIDATE         16976          0
o/192.168.14.9/flashdrive_FD_06_exa5cel01          CANDIDATE         16976          0
o/192.168.14.10/flashdrive_FD_02_exa5cel02         CANDIDATE         16976          0

32 rows selected.
SQL> create diskgroup flashdrive normal redundancy
disk 'o/*/flashdrive*'
attribute 'compatible.rdbms'='11.2.0.0.0',
'compatible.asm'='11.2.0.0.0',
'cell.smart_scan_capable'='TRUE',
'au_size'='4M';  

Diskgroup created.

Please notice the Allocation Unit size of 4MB, necessary for Exadata. Normal Redundancy is strongly recommended – automatically each cell is also a Failure Group. In other words: Should one Storage Server crash, no loss of Data will happen, regardless whether we use Flash Drives or spinning drives to build the diskgroups upon. After the creation, we use the diskgroup like any other:

SQL>  create tablespace veryfast datafile '+flashdrive' size 10g;

Tablespace created.

Any segment created in this tablespace will reside on Flash Drives permanently.

Let’s take the opportunity to give an example for Information Lifecycle Management (ILM):

SQL> create tablespace compahigh datafile size 1g;
Tablespace created.
SQL> alter tablespace compahigh default compress for archive high;
Tablespace altered.

The tablespace got created on spinning drives, because my DB_CREATE_FILE_DEST parameter points to such a diskgroup. Same for the next two:

SQL> create tablespace querylow datafile size 1g;
Tablespace created.
SQL> alter tablespace querylow default compress for query low;
Tablespace altered.
SQL> create tablespace ordinary datafile size 1g;
Tablespace created.

My plan is to store one large partitioned table partly compressed, partly uncompressed on spinning drives and partly on Flash Drives – the newest and most volatile part.

SQL> create table sales_part
(id number, flag number, product char(25),channel_id number,cust_id number,
amount_sold number, order_date date, ship_date date)
partition by range (order_date)
interval (numtoyminterval(1,'year'))
store in (veryfast)
(
partition archhigh values less than (to_date('01.01.1991','dd.mm.yyyy')) tablespace compahigh,
partition querylow values less than (to_date('01.01.1998','dd.mm.yyyy')) tablespace querylow,
partition ordi1998 values less than (to_date('01.01.1999','dd.mm.yyyy')) tablespace ordinary,
partition ordi1999 values less than (to_date('01.01.2000','dd.mm.yyyy')) tablespace ordinary,
partition ordi2000 values less than (to_date('01.01.2001','dd.mm.yyyy')) tablespace ordinary,
partition ordi2001 values less than (to_date('01.01.2002','dd.mm.yyyy')) tablespace ordinary,
partition ordi2002 values less than (to_date('01.01.2003','dd.mm.yyyy')) tablespace ordinary,
partition ordi2003 values less than (to_date('01.01.2004','dd.mm.yyyy')) tablespace ordinary,
partition ordi2004 values less than (to_date('01.01.2005','dd.mm.yyyy')) tablespace ordinary,
partition ordi2005 values less than (to_date('01.01.2006','dd.mm.yyyy')) tablespace ordinary
)
;  

Table created.

This uses the 11g New Feature Interval Partitioning to create new partitions automatically on Flash Drives. Now loading the table from the old sales table:

SQL> alter table sales_part nologging;

Table altered.

SQL> insert /*+ append */ into sales_part select * from sales order by order_date;

20000000 rows created.

Notice the order by above. It makes it later on possible not only to do partition pruning but also to use Storage Indexes if we query after ORDER_DATE or even SHIP_DATE. The single partitions now look like this:

SQL> select partition_name,tablespace_name,bytes/1024/1024 as mb
from user_segments where segment_name='SALES_PART';  

PARTITION_NAME                 TABLESPACE_NAME                        MB
------------------------------ ------------------------------ ----------
ARCHHIGH                       COMPAHIGH                               8
ORDI1998                       ORDINARY                               56
ORDI1999                       ORDINARY                               56
ORDI2000                       ORDINARY                               56
ORDI2001                       ORDINARY                               56
ORDI2002                       ORDINARY                               56
ORDI2003                       ORDINARY                               56
ORDI2004                       ORDINARY                               56
ORDI2005                       ORDINARY                               56
QUERYLOW                       QUERYLOW                               40
SYS_P101                       VERYFAST                               56
SYS_P102                       VERYFAST                               56
SYS_P103                       VERYFAST                               56
SYS_P104                       VERYFAST                               56
SYS_P105                       VERYFAST                               56
SYS_P106                       VERYFAST                                8

16 rows selected.

SQL> select count(*) from sales_part partition (archhigh);

 COUNT(*)
----------
 5330000

SQL>  select count(*) from sales_part partition (querylow);

 COUNT(*)
----------
 5114000

SQL> select count(*) from sales_part partition (ordi1998);

 COUNT(*)
----------
 730000

SQL>  select count(*) from sales_part partition (sys_p101);

 COUNT(*)
----------
 730000

During the life cycle of the data, partitions may no longer be highly volatile and can be moved to spinning drives or even get compressed:

SQL> alter table sales_part move partition sys_p101 tablespace ordinary;

Table altered.

SQL> alter table sales_part move partition ordi1998 compress for query low tablespace querylow;

Table altered.

SQL> select partition_name,tablespace_name,bytes/1024/1024 as mb
from user_segments where segment_name='SALES_PART';  

PARTITION_NAME                 TABLESPACE_NAME                        MB
------------------------------ ------------------------------ ----------
ARCHHIGH                       COMPAHIGH                               8
ORDI1998                       QUERYLOW                                8
ORDI1999                       ORDINARY                               56
ORDI2000                       ORDINARY                               56
ORDI2001                       ORDINARY                               56
ORDI2002                       ORDINARY                               56
ORDI2003                       ORDINARY                               56
ORDI2004                       ORDINARY                               56
ORDI2005                       ORDINARY                               56
QUERYLOW                       QUERYLOW                               40
SYS_P101                       ORDINARY                               56
SYS_P102                       VERYFAST                               56
SYS_P103                       VERYFAST                               56
SYS_P104                       VERYFAST                               56
SYS_P105                       VERYFAST                               56
SYS_P106                       VERYFAST                                8

16 rows selected.

We have no indexes in place – so there is no rebuild needed :-)

Summary: Flash Storage inside the Oracle Exadata Database Machine is used completely as Flash Cache by default, effectively working as an extension of the Database Buffer Cache  and delivering faster Access together with a very high IO per Second rate which is especially important for OLTP. Additionally, we may take a part of the Flash Storage to build ASM diskgroups upon it. Files placed on these diskgroups will reside permanently on Flash Storage – no Caching needed.

 

 

, , ,

19 Comments

Partitioning a table online with DBMS_REDEFINITION

If there is a requirement to change the structure of a table that is already in use productively, it may be impossible to get a maintenance downtime for that table, because it is constantly in use. That can be the case for all kind of structural changes of a table, particulary for the change from an ordinary heap table into a partitioned table, which I am going to take here as an example, because I am getting asked frequently in my courses how to achieve it. In order to demonstrate that, I will create a demonstration user with a non-partitioned table with privileges and additional dependent objects on it:

SQL> grant dba to adam identified by adam;

Grant succeeded.

SQL> connect adam/adam
Connected.

SQL> create table original as select
rownum as id,
mod(rownum,5) as channel_id,
5000 as amount_sold,
mod (rownum,1000) as cust_id,
sysdate as time_id
from dual connect by level<=1e6;  

Table created.

SQL> create index original_id_idx on original(id) nologging;

Index created.
SQL> grant select on original to hr;

Grant succeeded.

The challenge is now to change this table into a partitioned one while it is used with DML & queries by end users. For this purpose, we introduced already in 9i (if I recall it right) the package DBMS_REDEFINITION. First step would be to ask, whether it can be used in this case:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

SQL> begin
dbms_redefinition.can_redef_table
 (uname=>'ADAM',
 tname=>'ORIGINAL',
 options_flag=>DBMS_REDEFINITION.CONS_USE_ROWID);
end;
/
PL/SQL procedure successfully completed.

Because there is no Primary Key on the original table, I have to use CONS_USE_ROWID, else I could use CONS_USE_PK. There are no objections against the online redefinition of the table here – else an error message would appear. Next step is to create an interim table of the structure, desired for the original table. In my case, I create it interval partitioned (an 11g New Feature). I could also change storage attributes and add or remove columns during that process.

SQL> create table interim
(id number,
channel_id number(1),
amount_sold number(4),
cust_id number(4),
time_id date)
partition by range (cust_id)
interval (10)
(partition p1 values less than (10));

Table created.

My original table has 1000 distinct cust_ids, so this will lead to 100 partitions – each partion will contain 10 distinct cust_ids. One benefit of that would be the possibility of partition pruning, should there be statements, specifying the cust_id in the where-condition. These statements will be about 100 times faster as a full table scan. The next step will basically insert all the rows from the orginal table into the interim table (thereby automatically generating 99 partitions), while DML during that period is recorded:

SQL> set timing on
SQL>
BEGIN
DBMS_REDEFINITION.START_REDEF_TABLE
 (uname=>'ADAM',
 orig_table=>'ORIGINAL',
 int_table=>'INTERIM',
 options_flag=>DBMS_REDEFINITION.CONS_USE_ROWID);
end;
/

PL/SQL procedure successfully completed.
Elapsed: 00:00:22.76

If this step takes a long time to run it might be beneficial to use the SYNC_INTERIM_TABLE procedure occasionally from another session. That prevents a longer locking time for the last step, the calling of FINISH_REDEF_TABLE. Next step is now to add the dependent objects/privileges to the interim table:

SQL> set timing off
SQL> vari num_errors number
BEGIN
DBMS_REDEFINITION.COPY_TABLE_DEPENDENTS
 (uname=>'ADAM',
 orig_table=>'ORIGINAL',
 int_table=>'INTERIM',
 num_errors=>:num_errors);
END;
/
PL/SQL procedure successfully completed.
SQL> print num_errors
NUM_ERRORS
----------
 0

There was no problem with this step. Until now the original table is still an ordinary heap table – only the interim table is partitioned:

SQL> select table_name from user_part_tables;

TABLE_NAME
------------------------------
INTERIM

In the last step, the two tables change their names and the recorded DML that occured in the meantime gets used for actualization:

SQL> begin
dbms_redefinition.finish_redef_table
 (uname=>'ADAM',
 orig_table=>'ORIGINAL',
 int_table=>'INTERIM');
end;
/  

PL/SQL procedure successfully completed.

We will now determine that the original table is partitioned and the dependencies are still there:

SQL> select table_name,partitioning_type from user_part_tables;
TABLE_NAME                     PARTITION
------------------------------ ---------
ORIGINAL                       RANGE
SQL> select count(*) from user_tab_partitions;
 COUNT(*)
----------
 100
SQL> select grantee,privilege from  user_tab_privs_made where table_name='ORIGINAL';
GRANTEE                        PRIVILEGE
------------------------------ ----------------------------------------
HR                             SELECT
SQL> select index_name,table_name from user_indexes;
INDEX_NAME                     TABLE_NAME
------------------------------ ------------------------------
ORIGINAL_ID_IDX                ORIGINAL
TMP$$_ORIGINAL_ID_IDX0         INTERIM

The interim table can now be dropped. We changed the table into a partitioned table without any end user noticing it!

, ,

16 Comments

Reducing Buffer Busy Waits with Automatic Segment Space Management & Hash Partitioning

Last week, I was in Frankfurt (Germany) teaching about Performance Tuning & Partitioning in a Private Event for LHS Telecommunication. One demonstration that I developed for that event was about Reducing Buffer Busy Waits. One major reason why the wait event Buffer Busy Waits can occur is if many sessions are inserting at the same time in the same table. Large OLTP installations therefore will see it most likely. This can lead to a performance problem, especially if the site does not make use of Automatic Segment Space Management – a technique introduced in 9i and used by default since 10g. The old fashioned way to determine where to insert a new row in blocks before the high water mark uses Freelists. So if you are going with that “traditional” technique with freelists (and PCTUSED), then your segment has one freelist by default, which leads to contention easy, because multiple simultaneous sessions that insert look at that one freelist and all pick the same Blocks to insert in. Here is an example:

SQL> create tablespace old datafile '/u01/app/oracle/oradata/orcl/old01.dbf'
     size 50m segment space management manual;

Tablespace created.

SQL> create table bbw_freelist (n number, name varchar2(50)) tablespace old;

Table created.

That table now uses (one) freelist and PCTUSED (40 by default) in order to determine what blocks may be used

for insert. The next steps are going to create 10 simultaneous sessions that insert 1 Million rows into the table.

SQL> create or replace procedure bufwait1 as
begin
for i in 1..100000 loop
insert into bbw_freelist values (i, 'BUFFER BUSY WAITS?');
end loop;
commit;
end;
/  2    3    4    5    6    7    8

Procedure created.

SQL> create or replace procedure do_bbw1 as
v_jobno number:=0;begin
FOR i in 1..10 LOOP
 dbms_job.submit(v_jobno,'bufwait1;', sysdate);
END LOOP;
commit;
end;
/  2    3    4    5    6    7    8

Procedure created.
SQL> exec do_bbw1

PL/SQL procedure successfully completed.

SQL> select count(*) from dba_jobs_running

COUNT(*)
----------
0
One easy way to discover Buffer Busy Waits is v$segment_statistics.

Of course, you may see them also in Statspack/AWR reports.

SQL> select sum(value) from v$segment_statistics
 where OBJECT_NAME='BBW_FREELIST' and statistic_name='buffer busy waits';

SUM(VALUE)
----------
 1623

Next we are doing exactly the same inserts with a table that uses Automatic Segment Space Management.

Simply speaking, ASSM achieves a better distribution of the new rows across multiple blocks if multiple

inserts occur at the same time. It is determined at the tablespace level:

SQL> select SEGMENT_SPACE_MANAGEMENT from dba_tablespaces where tablespace_name='USERS';

SEGMEN
------
AUTO

SQL> create table bbw_assm (n number, name varchar2(50)) tablespace users;

Table created.

create or replace procedure bufwait2 as
begin
for i in 1..100000 loop
insert into bbw_assm values (i, 'BUFFER BUSY WAITS?');
end loop;
commit;
end;
/

create or replace procedure do_bbw2 as
v_jobno number:=0;begin
FOR i in 1..10 LOOP   
 dbms_job.submit(v_jobno,'bufwait2;', sysdate);
END LOOP;
commit;
end;
/

SQL> exec do_bbw2

PL/SQL procedure successfully completed.

SQL> select count(*) from dba_jobs_running;

 COUNT(*)
----------
 10

SQL> select count(*) from dba_jobs_running;

 COUNT(*)
----------
 0

SQL> select sum(value) from v$segment_statistics
 where OBJECT_NAME='BBW_ASSM' and statistic_name='buffer busy waits';

SUM(VALUE)
----------
 295

As you can see, ASSM reduced strongly the number of Buffer Busy Waits! But still there are some,

and still this may be a performance problem for very large OLTP sites. Right now, the table is

not partitioned, so we have only one "hot extent" if the table is continually growing, all inserts

occur in the newest allocated extent - where they are distributed nicely via ASSM. If that same table

would be Hash Partitioned, then we could have multiple "hot extents" further more reducing the contention:

SQL> create table bbw_hash (n number, name varchar2(50))tablespace users
 partition by hash (n) partitions 256;

Table created.

SQL> create or replace procedure bufwait3 as
begin
for i in 1..100000 loop
insert into bbw_hash values (i, 'BUFFER BUSY WAITS?');
end loop;
commit;
end;
/  2    3    4    5    6    7    8

Procedure created.

SQL> create or replace procedure do_bbw3 as
v_jobno number:=0;begin
FOR i in 1..10 LOOP
 dbms_job.submit(v_jobno,'bufwait3;', sysdate);
END LOOP;
commit;
end;
/  2    3    4    5    6    7    8

Procedure created.

SQL> exec do_bbw3

PL/SQL procedure successfully completed.

SQL> select sum(value) from v$segment_statistics
 where OBJECT_NAME='BBW_HASH' and statistic_name='buffer busy waits';

SUM(VALUE)
----------
 226

We observe a (in this case) moderate further decrease of the Buffer Busy Waits. That

is one major reason for hash partitioning: Fighting against contention. Another reason

is - as in other kinds of partitioning - the possibilty for Partiton Pruning, if the

partition key is part of the where-clause of our statement. Then the optimizer

implicitly knows what partitions to scan. Even in my (relatively tiny) two tables, that

effect is visible:

SQL> set timing on
SQL> select * from bbw_assm where n=4711;

 N NAME
---------- --------------------------------------------------
 4711 BUFFER BUSY WAITS?
 4711 BUFFER BUSY WAITS?
 4711 BUFFER BUSY WAITS?
 4711 BUFFER BUSY WAITS?
 4711 BUFFER BUSY WAITS?
 4711 BUFFER BUSY WAITS?
 4711 BUFFER BUSY WAITS?
 4711 BUFFER BUSY WAITS?
 4711 BUFFER BUSY WAITS?
 4711 BUFFER BUSY WAITS?

10 rows selected.

Elapsed: 00:00:00.11

SQL> select * from bbw_hash where n=4711;

 N NAME
---------- --------------------------------------------------
 4711 BUFFER BUSY WAITS?
 4711 BUFFER BUSY WAITS?
 4711 BUFFER BUSY WAITS?
 4711 BUFFER BUSY WAITS?
 4711 BUFFER BUSY WAITS?
 4711 BUFFER BUSY WAITS?
 4711 BUFFER BUSY WAITS?
 4711 BUFFER BUSY WAITS?
 4711 BUFFER BUSY WAITS?
 4711 BUFFER BUSY WAITS?

10 rows selected.

Elapsed: 00:00:00.01

,

6 Comments

Follow

Get every new post delivered to your Inbox.

Join 635 other followers