hcc | Pickleball spielen

Exadata Part III: Compression

Veröffentlicht von Uwe Hesse in TOI am Januar 21, 2011

With this posting, we look at the New Feature Hybrid Columnar Compression that is only available with the Exadata Database Machine, offering stunning compression ratios together with high query speed. Meanwhile we have not less than 6 different kinds of Table Data Compression methods:

BASIC compression, introduced in Oracle 8 already and only recommended for Data Warehouse
OLTP compression, introduced in Oracle 11 and recommended for OLTP Databases as well
QUERY LOW compression (Exadata only), recommended for Data Warehouse with Load Time as a critical factor
QUERY HIGH compression (Exadata only), recommended for Data Warehouse with focus on Space Saving
ARCHIVE LOW compression (Exadata only), recommended for Archival Data with Load Time as a critical factor
ARCHIVE HIGH compression (Exadata only), recommended for Archival Data with maximum Space Saving

Method 1 and 2 are working by compressing the rows inside of Oracle Blocks – the more redundant values inside the block, the better the compression ratio. OLTP compression might be used with Exadata Database Machine, because Exadata V2 is designed to support OLTP as well as Data Warehouse. Methods 3 to 6 use Hybrid Columnar Compression (HCC), which is designed to achieve better compression ratios by organizing sets of rows into compression units and then detaching the columns from the rows and collecting them. It is more likely that the same columns store similar (redundant) data, therefore the compression ratio improves with HCC.

I will give examples for the 6 compression methods now. Keep in mind that I have only limited access to our Oracle University Exadata Database Machine, restricting me to just one Database Server and two Storage Servers – speed on a Full Rack will probably be faster. Also, the demo table is somewhat artificial and may therefore deliver better compression ratios compared to Real World tables.

In short: Although you should get the picture from the examples, it will be different on your site and it requires thorough testing before you take any kind of compression into production.

The (uncompressed) demonstration table is the same as in the previous posting and was created with this statement:

SQL> create table sales as
select
rownum as id,
rownum + 1 as flag,
'Oracle Enterprise Edition' as product,
mod(rownum,5) as channel_id,
mod(rownum,1000) as cust_id ,
5000 as amount_sold,
trunc(sysdate - 10000 + mod(rownum,10000))
as order_date,
trunc(sysdate - 9999 + mod(rownum,10000))
as ship_date
from dual connect by level<=2e7;

It is 1.5 GB in size. Let’s look how small we can compress and how fast we can load!

BASIC method

SQL> create table comp_classic compress as select * from sales where 1=2;
Table created.
SQL> alter table comp_classic nologging;
Table altered.
SQL> insert /*+ append */ into comp_classic select * from sales;
20000000 rows created.

Elapsed: 00:00:39.80
SQL> commit;
Commit complete.

OLTP method. Notice that we cannot assume direct loads in an OLTP environment. Therefore conventional insert:

SQL> create table comp_oltp compress for oltp as select * from sales where 1=2;
Table created.
SQL> insert into comp_oltp select * from sales;
20000000 rows created.

Elapsed: 00:02:16.56
SQL> commit;
Commit complete.

QUERY LOW method. We expect this method to have the fastest Load Time:

SQL> create table comp_query_low compress for query low as select * from sales where 1=2;
Table created.
SQL> alter table comp_query_low nologging;
Table altered.
SQL> insert /*+ append */ into comp_query_low select * from sales;
20000000 rows created.

Elapsed: 00:00:26.96
SQL> commit;
Commit complete.

QUERY HIGH method

SQL> create table comp_query_high compress for query high as select * from sales where 1=2;
Table created.
SQL> alter table comp_query_high nologging;
Table altered.

SQL> insert /*+ append */ into comp_query_high select * from sales;
20000000 rows created.

Elapsed: 00:00:33.99
SQL> commit;
Commit complete.

ARCHIVE LOW method

SQL> create table comp_archive_low compress for archive low as select * from sales where 1=2;
Table created.
SQL> alter table comp_archive_low nologging;
Table altered.

SQL> insert /*+ append */ into comp_archive_low select * from sales;
20000000 rows created.

Elapsed: 00:00:35.14
SQL> commit;
Commit complete.

ARCHIVE HIGH method. We expect this method to have the slowest Load Time and the best Compression Ratio:

SQL> create table comp_archive_high compress for archive high as select * from sales where 1=2;
Table created.
SQL> alter table comp_archive_high nologging;
Table altered.

SQL> insert /*+ append */ into comp_archive_high select * from sales;
20000000 rows created.

Elapsed: 00:04:13.13
SQL> commit;
Commit complete.

That was indeed slow! Now let’s look at the outcome:

SQL> select segment_name,bytes/1024/1024 as mb from user_segments order by 1;

SEGMENT_NAME                           MB
------------------------------ ----------
COMP_ARCHIVE_HIGH                       5
COMP_ARCHIVE_LOW                        8
COMP_CLASSIC                          736
COMP_OLTP                             864
COMP_QUERY_HIGH                         8
COMP_QUERY_LOW                        304
SALES                              1535.5

7 rows selected.

We have condensend 1.5 Gig into 5 Megabyte with the ARCHIVE HIGH method!

Summary: Except for ARCHIVE HIGH, we have faster Load Times with HCC than with Block Compression. Every HCC method delivers a better compression ratio than the Block Compression methods. Now how about Query Times?

SQL> set timing on
SQL> connect adam/adam
Connected.
SQL> select count(*) from sales where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:02.07
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan        46.807457

We take the FTS (with Smart Scan) on the uncompressed table as Baseline. I always reconnect to initialize v$mystat. Now Query Performance of BASIC compression:

SQL> connect adam/adam
Connected.
SQL> select count(*) from comp_classic where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:01.79
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan       123.650536

Runtime was faster as uncompressed, Smart Scan was possible. Now OLTP compression:

SQL> connect adam/adam
Connected.
SQL> select count(*) from comp_oltp  where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:01.80
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan        130.22644

Dito. Now QUERY LOW (HCC) compression:

SQL> connect adam/adam
Connected.
SQL>  select count(*) from comp_query_low where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:00.86
SQL>  select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan       2.68922424

Fastest Runtime yet. Smart Scan was possible. Now QUERY HIGH (HCC):

SQL> connect adam/adam
Connected.
SQL> select count(*) from comp_query_high where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:01.27
SQL>  select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan                0

Still faster than uncompressed, although not using Smart Scan. Now ARCHIVE LOW (HCC):

SQL>  connect adam/adam
Connected.
SQL>  select count(*) from comp_archive_low where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:01.13
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan                0

Faster than uncompressed (and surprisingly faster than QUERY HIGH) without Smart Scan. Finally ARCHIVE HIGH (HCC):

SQL> connect adam/adam
Connected.
SQL> select count(*) from comp_archive_high where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:02.93
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan                0

Slowest Query Performance (the only one slower than uncompressed in this experiment) without Smart Scan.

Conclusion: HCC delivers (far) better Compression Ratios than the Block Compression methods. Load Time increases from QUERY LOW (best) over QUERY HIGH and ARCHIVE LOW (both moderate) to ARCHIVE HIGH (longest Load Time). Query Performance decreases similar from QUERY LOW (best) to ARCHIVE HIGH (longest Query Time). All HCC methods except ARCHIVE HIGH delivered better Query Performance than uncompressed, though. In short: HCC is most likely able to save large amounts of space in your Data Warehouse without decreasing Query Performance much if at all. It will probably compress your Archival Data significantly.