Beiträge getaggt mit Data Warehouse

Materialized Views & Partition Change Tracking

During the 11g Data Warehouse Administration course that I delivered this week in Munich, I have shown this demonstration about Partition Change Tracking (PCT) that I’d like to share with the Oracle Community. You may have a look here for an Introduction into Materialized Views if that topic is new for you. Often, the Base Tables of Materialized Views are partitioned, which gives use additional options for the Refresh as well as for the Rewrite:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

SQL> grant dba to adam identified by adam;

Grant succeeded.

SQL> connect adam/adam
Connected.
SQL> create table sales
 (product     varchar2(50),
  channel_id  number,
  cust_id     number,
  amount_sold number(10,2),
  time_id     date)
partition by list (channel_id)
(partition c0 values (0),
 partition c1 values (1),
 partition c2 values (2),
 partition c3 values (3),
 partition c4 values (4)
);
Table created.
SQL> alter table sales nologging;

Table altered.
SQL> insert /*+ append */ into sales
select
'Oracle Enterprise Edition' as product,
mod(rownum,5) as channel_id,
mod(rownum,1000) as cust_id ,
5000 as amount_sold,
to_date
('01.' || lpad(to_char(mod(rownum,12)+1),2,'0') || '.2010' ,'dd.mm.yyyy')
as time_id
from dual connect by level<=1e6; 
1000000 rows created.

SQL> commit;

Commit complete.

The above was my demo setup, done with 11.2.0.1 Most of the shown should work with lower versions also, though. We will now see a Materialized View that contains the Partition Key of the Base Table, which makes it easily possible for the system to associate the rows of the Materialized View with the Partitions. Otherwise, we would need to explain that with a Partition Marker.

SQL> create materialized view mv_pct
enable query rewrite as
select channel_id,sum(amount_sold) from sales group by channel_id;  

Materialized view created.

The MV is now fresh and may get used for Query Rewrite:

SQL> select mview_name,last_refresh_type,staleness from user_mviews;

MVIEW_NAME                     LAST_REF STALENESS
------------------------------ -------- -------------------
MV_PCT                         COMPLETE FRESH

SQL> set autotrace on explain
SQL> select channel_id,sum(amount_sold) from sales group by channel_id;

CHANNEL_ID SUM(AMOUNT_SOLD)
---------- ----------------
         0       1000000000
         1       1000000000
         2       1000000000
         3       1000000000
         4       1000000000

Execution Plan
----------------------------------------------------------
Plan hash value: 470332451

---------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |        |     5 |   130 |     3   (0)| 00:00:01 |
|   1 |  MAT_VIEW REWRITE ACCESS FULL| MV_PCT |     5 |   130 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement (level=2)

Now we modify one row:

SQL> set autotrace off
SQL> update sales set amount_sold=1 where rownum<2;

1 row updated.

SQL> commit;

Commit complete.:

This makes the MV stale right? Well actually not all of it:

SQL> select mview_name,last_refresh_type,staleness from user_mviews;

MVIEW_NAME                     LAST_REF STALENESS
------------------------------ -------- -------------------
MV_PCT                         COMPLETE NEEDS_COMPILE

SQL> select detail_partition_name, freshness from user_mview_detail_partition;

DETAIL_PARTITION_NAME          FRESH
------------------------------ -----
C4                             FRESH
C3                             FRESH
C2                             FRESH
C1                             FRESH
C0                             STALE

The second Dictionary View is new in 11g, not the behavior of the next query. Only Partition C0 is stale, which is why queries that do not need channel_id 0 can still use the MV:

SQL> set autotrace on explain
SQL> select sum(amount_sold) from sales where channel_id=1;

SUM(AMOUNT_SOLD)
----------------
      1000000000


Execution Plan
----------------------------------------------------------
Plan hash value: 1805549181

----------------------------------------------------------------------------------------
| Id  | Operation                     | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |        |     1 |    26 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE               |        |     1 |    26 |            |          |
|*  2 |   MAT_VIEW REWRITE ACCESS FULL| MV_PCT |     1 |    26 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("MV_PCT"."CHANNEL_ID"=1)

Note
-----
   - dynamic sampling used for this statement (level=2)

The Rewrite is still possible although part of the MV is stale. We call that PCT Rewrite. Furthermore, we can now do a PCT Fast Refresh, although there is no MV Log on the Base Table sales. The PCT Refresh does not need to scan the whole Base Table but only the stale partitions, which is in this case about 5 times faster than a Complete Refresh, because we have 5 equally sized partitions. The value ‚P‚ is new in 11g, but it also works with ‚F‘ or ‚?‘ in earlier versions.

SQL> set autotrace off
SQL> exec dbms_mview.refresh('MV_PCT','P')

PL/SQL procedure successfully completed.

SQL> select mview_name,last_refresh_type,staleness from user_mviews;

MVIEW_NAME                     LAST_REF STALENESS
------------------------------ -------- -------------------
MV_PCT                         FAST_PCT FRESH

Conclusion: Partitioning delivers some extra benefits when used together with Materialized Views. Especially, Rewrite & Refresh can be done on the Partition layer then. As always: „Don’t believe it, test it!“ 🙂

, , ,

12 Kommentare

Auto DOP: Differences of parallel_degree_policy=auto/limited

Recently, I delivered a Seminar about Parallel Processing in 11g where I came across some interesting findings, that I’d like to share with the Oracle Community. See my introduction into the 11g New Feature Auto DOP here, if that topic is completely new for you. There are big differences in the handling of Hints resp. Table-assigned parallel degrees, depending on the setting of parallel_degree_policy.

The parameter defaults to MANUAL, which gives you the known behavior of versions before 11g. LIMITED will only assign a system computed degree of parallelism (DOP) for tables, decorated with a parallel degree of DEFAULT, while prallel_degree_policy=AUTO will consider to assign a system computed DOP to all tables. Let’s see some details:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
PL/SQL Release 11.2.0.1.0 - Production
CORE    11.2.0.1.0      Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production

I did my tests on 11.2.0.1 and 11.2.0.2.

SQL> grant dba to adam identified by adam;

Grant succeeded.

SQL> connect adam/adam@prima
Connected.
SQL> set pages 300
SQL> show parameter parallel

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
fast_start_parallel_rollback         string      LOW
parallel_adaptive_multi_user         boolean     TRUE
parallel_automatic_tuning            boolean     FALSE
parallel_degree_limit                string      CPU
parallel_degree_policy               string      MANUAL
parallel_execution_message_size      integer     16384
parallel_force_local                 boolean     FALSE
parallel_instance_group              string
parallel_io_cap_enabled              boolean     FALSE
parallel_max_servers                 integer     20
parallel_min_percent                 integer     0
parallel_min_servers                 integer     0
parallel_min_time_threshold          string      AUTO
parallel_server                      boolean     FALSE
parallel_server_instances            integer     1
parallel_servers_target              integer     8
parallel_threads_per_cpu             integer     2
recovery_parallelism                 integer     0

SQL> create table t as select * from dual; 
Table created.

That is my test case. All the red parameters have default values. The table t is of course way too small to justify a parallel operation; especially, it will not meet the parallel_min_time_threshold of estimated runtime (about 10 seconds with AUTO). The setting parallel_degree_policy=MANUAL would leave the system as dumb as in earlier versions regarding an appropriate DOP, though. It would give me any DOP I demand with Hints or Parallel Degree on the table. See how that is different with AUTO/LIMITED:

SQL> alter system set parallel_degree_policy=auto;

System altered.

SQL> select /*+ parallel (t,8) */ * from t;

D
-
X

SQL> select * from v$pq_sesstat;

STATISTIC                      LAST_QUERY SESSION_TOTAL
------------------------------ ---------- -------------
Queries Parallelized                    0             0
DML Parallelized                        0             0
DDL Parallelized                        0             0
DFO Trees                               0             0
Server Threads                          0             0
Allocation Height                       0             0
Allocation Width                        0             0
Local Msgs Sent                         0             0
Distr Msgs Sent                         0             0
Local Msgs Recv'd                       0             0
Distr Msgs Recv'd                       0             0

11 rows selected.

Although this was a valid hint, I got no parallel operation! That is different with LIMITED:

SQL> alter system set parallel_degree_policy=limited;

System altered.

SQL> select /*+ parallel (t,8) */ * from t;

D
-
X

SQL> select * from v$pq_sesstat;

STATISTIC                      LAST_QUERY SESSION_TOTAL
------------------------------ ---------- -------------
Queries Parallelized                    1             1
DML Parallelized                        0             0
DDL Parallelized                        0             0
DFO Trees                               1             1
Server Threads                          8             0
Allocation Height                       8             0
Allocation Width                        1             0
Local Msgs Sent                        26            26
Distr Msgs Sent                         0             0
Local Msgs Recv'd                      26            26
Distr Msgs Recv'd                       0             0

11 rows selected.

Same statement, now I got my (not sensible) DOP. There is a new hint in 11g on the statement level, though, that is also delivering my requested DOP with AUTO:

SQL> connect adam/adam@prima
Connected.
SQL> alter system set parallel_degree_policy=auto;

System altered.
SQL> select /*+ parallel (8) */ * from t;

D
-
X

SQL> select * from v$pq_sesstat;

STATISTIC                      LAST_QUERY SESSION_TOTAL
------------------------------ ---------- -------------
Queries Parallelized                    1             1
DML Parallelized                        0             0
DDL Parallelized                        0             0
DFO Trees                               1             1
Server Threads                          8             0
Allocation Height                       8             0
Allocation Width                        1             0
Local Msgs Sent                        26            26
Distr Msgs Sent                         0             0
Local Msgs Recv'd                      26            26
Distr Msgs Recv'd                       0             0

11 rows selected.

That is the only way to overrule the Auto DOP with parallel_degree_policy=AUTO. Similar that is with Parallel Degree on the table:

SQL> connect adam/adam@prima
Connected.
SQL> alter system set parallel_degree_policy=auto;

System altered.

SQL> alter table t parallel;

Table altered.

SQL> select * from t;

D
-
X

SQL> select * from v$pq_sesstat;

STATISTIC                      LAST_QUERY SESSION_TOTAL
------------------------------ ---------- -------------
Queries Parallelized                    0             0
DML Parallelized                        0             0
DDL Parallelized                        0             0
DFO Trees                               0             0
Server Threads                          0             0
Allocation Height                       0             0
Allocation Width                        0             0
Local Msgs Sent                         0             0
Distr Msgs Sent                         0             0
Local Msgs Recv'd                       0             0
Distr Msgs Recv'd                       0             0

11 rows selected.

SQL> alter table t parallel 8;

Table altered.

SQL> select * from t;

D
-
X

SQL> select * from v$pq_sesstat;

STATISTIC                      LAST_QUERY SESSION_TOTAL
------------------------------ ---------- -------------
Queries Parallelized                    0             0
DML Parallelized                        0             0
DDL Parallelized                        0             0
DFO Trees                               0             0
Server Threads                          0             0
Allocation Height                       0             0
Allocation Width                        0             0
Local Msgs Sent                         0             0
Distr Msgs Sent                         0             0
Local Msgs Recv'd                       0             0
Distr Msgs Recv'd                       0             0

11 rows selected.

SQL>  alter system set parallel_degree_policy=limited;

System altered.

SQL> alter table t parallel;

Table altered.

SQL> select * from t;

D
-
X

SQL> select * from v$pq_sesstat;

STATISTIC                      LAST_QUERY SESSION_TOTAL
------------------------------ ---------- -------------
Queries Parallelized                    0             0
DML Parallelized                        0             0
DDL Parallelized                        0             0
DFO Trees                               0             0
Server Threads                          0             0
Allocation Height                       0             0
Allocation Width                        0             0
Local Msgs Sent                         0             0
Distr Msgs Sent                         0             0
Local Msgs Recv'd                       0             0
Distr Msgs Recv'd                       0             0

11 rows selected.

SQL> alter table t parallel 8;

Table altered.

SQL> select * from t;

D
-
X

SQL> select * from v$pq_sesstat;

STATISTIC                      LAST_QUERY SESSION_TOTAL
------------------------------ ---------- -------------
Queries Parallelized                    1             1
DML Parallelized                        0             0
DDL Parallelized                        0             0
DFO Trees                               1             1
Server Threads                          8             0
Allocation Height                       8             0
Allocation Width                        1             0
Local Msgs Sent                        26            26
Distr Msgs Sent                         0             0
Local Msgs Recv'd                      26            26
Distr Msgs Recv'd                       0             0

11 rows selected.

You saw a behavior like in the below table described:

parallel_degree_policy parallel (t,8) Parallel (8) degree DEFAULT degree 8
manual 8 8 4 8
limited 8 8 1 8
auto 1* 8 1 1

The default degree with parallel_degree_policy=MANUAL is cpu_count * parallel_threads_per_cpu; 4 in my case.

Apart from the shown differences between AUTO and LIMITED, only AUTO enables the also New Features Parallel Statement Queueing and In-Memory Parallel Execution. My personal impression is that LIMITED works like we have hoped that parallel_automatic_tuning would but never did 🙂

Conclusion: parallel_degree_policy=LIMITED will give you the DOP you request and compute an appropriate DOP only with a parallel degree of DEFAULT as an attribute of the table. Use this parameter if you trust that your applications/designers know why they use a certain DOP. parallel_degree_policy=AUTO will overrule any specific DOP you gave – except the new 11g parallel (n) Hint – and consider to do things in parallel for all tables even without a Hint or Degree.

Addendum: See this nice related posting by Gwen Shapira, especially the part about the I/O calibration.

* Second Addendum: With 11.2.0.3, the hint /*+ parallel (t,8) */ determines the DOP to 8, regardless of the parallel_degree_policy setting. Everything else is the same as shown, especially the different behavior of the values AUTO and LIMITED with the parallel degree of the table t explicitly set to 8.

, ,

10 Kommentare

Exadata Part III: Compression

With this posting, we look at the New Feature Hybrid Columnar Compression that is only available with the Exadata Database Machine, offering stunning compression ratios together with high query speed. Meanwhile we have not less than 6 different kinds of Table Data Compression methods:

  1. BASIC compression, introduced in Oracle 8 already and only recommended for Data Warehouse
  2. OLTP compression, introduced in Oracle 11 and recommended for OLTP Databases as well
  3. QUERY LOW compression (Exadata only), recommended for Data Warehouse with Load Time as a critical factor
  4. QUERY HIGH compression (Exadata only), recommended for Data Warehouse with focus on Space Saving
  5. ARCHIVE LOW compression (Exadata only), recommended for Archival Data with Load Time as a critical factor
  6. ARCHIVE HIGH compression (Exadata only), recommended for Archival Data with maximum Space Saving

Method 1 and 2 are working by compressing the rows inside of Oracle Blocks – the more redundant values inside the block, the better the compression ratio. OLTP compression might be used with Exadata Database Machine, because Exadata V2 is designed to support OLTP as well as Data Warehouse. Methods 3 to 6 use Hybrid Columnar Compression (HCC), which is designed to achieve better compression ratios by organizing sets of rows into compression units and then detaching the columns from the rows and collecting them. It is more likely that the same columns store similar (redundant) data, therefore the compression ratio improves with HCC.

I will give examples for the 6 compression methods now. Keep in mind that I  have only limited access to our Oracle University Exadata Database Machine,  restricting me to just one Database Server and two Storage Servers – speed on a Full Rack will probably be faster. Also, the demo table is somewhat artificial and may therefore deliver better compression ratios compared to Real World tables.

In short: Although you should get the picture from the examples, it will be different on your site and it requires thorough testing before you take any kind of compression into production.

The (uncompressed) demonstration table is the same as in the previous posting and was created with this statement:

SQL> create table sales as
select
rownum as id,
rownum + 1 as flag,
'Oracle Enterprise Edition' as product,
mod(rownum,5) as channel_id,
mod(rownum,1000) as cust_id ,
5000 as amount_sold,
trunc(sysdate - 10000 + mod(rownum,10000))
as order_date,
trunc(sysdate - 9999 + mod(rownum,10000))
as ship_date
from dual connect by level<=2e7;

It is 1.5 GB in size. Let’s look how small we can compress and how fast we can load!

BASIC method

SQL> create table comp_classic compress as select * from sales where 1=2;
Table created.
SQL> alter table comp_classic nologging;
Table altered.
SQL> insert /*+ append */ into comp_classic select * from sales;
20000000 rows created.

Elapsed: 00:00:39.80
SQL> commit;
Commit complete.

OLTP method. Notice that we cannot assume direct loads in an OLTP environment. Therefore conventional insert:

SQL> create table comp_oltp compress for oltp as select * from sales where 1=2;
Table created.
SQL> insert into comp_oltp select * from sales;
20000000 rows created.

Elapsed: 00:02:16.56
SQL> commit;
Commit complete.

QUERY LOW method. We expect this method to have the fastest Load Time:

SQL> create table comp_query_low compress for query low as select * from sales where 1=2;
Table created.
SQL> alter table comp_query_low nologging;
Table altered.
SQL> insert /*+ append */ into comp_query_low select * from sales;
20000000 rows created.

Elapsed: 00:00:26.96
SQL> commit;
Commit complete.

QUERY HIGH method

SQL> create table comp_query_high compress for query high as select * from sales where 1=2;
Table created.
SQL> alter table comp_query_high nologging;
Table altered.

SQL> insert /*+ append */ into comp_query_high select * from sales;
20000000 rows created.

Elapsed: 00:00:33.99
SQL> commit;
Commit complete.

ARCHIVE LOW method

SQL> create table comp_archive_low compress for archive low as select * from sales where 1=2;
Table created.
SQL> alter table comp_archive_low nologging;
Table altered.

SQL> insert /*+ append */ into comp_archive_low select * from sales;
20000000 rows created.

Elapsed: 00:00:35.14
SQL> commit;
Commit complete.

ARCHIVE HIGH method. We expect this method to have the slowest Load Time and the best Compression Ratio:

SQL> create table comp_archive_high compress for archive high as select * from sales where 1=2;
Table created.
SQL> alter table comp_archive_high nologging;
Table altered.

SQL> insert /*+ append */ into comp_archive_high select * from sales;
20000000 rows created.

Elapsed: 00:04:13.13
SQL> commit;
Commit complete.

That was indeed slow! Now let’s look at the outcome:

SQL> select segment_name,bytes/1024/1024 as mb from user_segments order by 1;

SEGMENT_NAME                           MB
------------------------------ ----------
COMP_ARCHIVE_HIGH                       5
COMP_ARCHIVE_LOW                        8
COMP_CLASSIC                          736
COMP_OLTP                             864
COMP_QUERY_HIGH                         8
COMP_QUERY_LOW                        304
SALES                              1535.5

7 rows selected.

We have condensend 1.5 Gig into 5 Megabyte with the ARCHIVE HIGH method!

Summary: Except for ARCHIVE HIGH, we have faster Load Times with HCC than with Block Compression. Every HCC method delivers a better compression ratio than the Block Compression methods. Now how about Query Times?

SQL> set timing on
SQL> connect adam/adam
Connected.
SQL> select count(*) from sales where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:02.07
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan        46.807457

We take the FTS (with Smart Scan) on the uncompressed table as Baseline. I always reconnect to initialize v$mystat. Now Query Performance of BASIC compression:

SQL> connect adam/adam
Connected.
SQL> select count(*) from comp_classic where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:01.79
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan       123.650536

Runtime was faster as uncompressed, Smart Scan was possible. Now OLTP compression:

SQL> connect adam/adam
Connected.
SQL> select count(*) from comp_oltp  where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:01.80
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan        130.22644

Dito. Now QUERY LOW (HCC) compression:

SQL> connect adam/adam
Connected.
SQL>  select count(*) from comp_query_low where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:00.86
SQL>  select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan       2.68922424

Fastest Runtime yet. Smart Scan was possible. Now QUERY HIGH (HCC):

SQL> connect adam/adam
Connected.
SQL> select count(*) from comp_query_high where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:01.27
SQL>  select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan                0

Still faster than uncompressed, although not using Smart Scan. Now ARCHIVE LOW (HCC):

SQL>  connect adam/adam
Connected.
SQL>  select count(*) from comp_archive_low where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:01.13
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan                0

Faster than uncompressed (and surprisingly faster than QUERY HIGH) without Smart Scan. Finally ARCHIVE HIGH (HCC):

SQL> connect adam/adam
Connected.
SQL> select count(*) from comp_archive_high where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:02.93
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan                0

Slowest Query Performance (the only one slower than uncompressed in this experiment) without Smart Scan.

Conclusion: HCC delivers (far) better Compression Ratios than the Block Compression methods. Load Time increases from QUERY LOW (best) over QUERY HIGH and ARCHIVE LOW (both moderate) to ARCHIVE HIGH (longest Load Time). Query Performance decreases similar from QUERY LOW (best) to ARCHIVE HIGH (longest Query Time). All HCC methods except ARCHIVE HIGH delivered better Query Performance than uncompressed, though. In short: HCC is most likely able to save large amounts of space in your Data Warehouse without decreasing Query Performance much if at all. It will probably compress your Archival Data significantly.

, ,

32 Kommentare