Exadata Part III: Compression

With this posting, we look at the New Feature Hybrid Columnar Compression that is only available with the Exadata Database Machine, offering stunning compression ratios together with high query speed. Meanwhile we have not less than 6 different kinds of Table Data Compression methods:

  1. BASIC compression, introduced in Oracle 8 already and only recommended for Data Warehouse
  2. OLTP compression, introduced in Oracle 11 and recommended for OLTP Databases as well
  3. QUERY LOW compression (Exadata only), recommended for Data Warehouse with Load Time as a critical factor
  4. QUERY HIGH compression (Exadata only), recommended for Data Warehouse with focus on Space Saving
  5. ARCHIVE LOW compression (Exadata only), recommended for Archival Data with Load Time as a critical factor
  6. ARCHIVE HIGH compression (Exadata only), recommended for Archival Data with maximum Space Saving

Method 1 and 2 are working by compressing the rows inside of Oracle Blocks – the more redundant values inside the block, the better the compression ratio. OLTP compression might be used with Exadata Database Machine, because Exadata V2 is designed to support OLTP as well as Data Warehouse. Methods 3 to 6 use Hybrid Columnar Compression (HCC), which is designed to achieve better compression ratios by organizing sets of rows into compression units and then detaching the columns from the rows and collecting them. It is more likely that the same columns store similar (redundant) data, therefore the compression ratio improves with HCC.

I will give examples for the 6 compression methods now. Keep in mind that I  have only limited access to our Oracle University Exadata Database Machine,  restricting me to just one Database Server and two Storage Servers – speed on a Full Rack will probably be faster. Also, the demo table is somewhat artificial and may therefore deliver better compression ratios compared to Real World tables.

In short: Although you should get the picture from the examples, it will be different on your site and it requires thorough testing before you take any kind of compression into production.

The (uncompressed) demonstration table is the same as in the previous posting and was created with this statement:

SQL> create table sales as
select
rownum as id,
rownum + 1 as flag,
'Oracle Enterprise Edition' as product,
mod(rownum,5) as channel_id,
mod(rownum,1000) as cust_id ,
5000 as amount_sold,
trunc(sysdate - 10000 + mod(rownum,10000))
as order_date,
trunc(sysdate - 9999 + mod(rownum,10000))
as ship_date
from dual connect by level<=2e7;

It is 1.5 GB in size. Let’s look how small we can compress and how fast we can load!

BASIC method

SQL> create table comp_classic compress as select * from sales where 1=2;
Table created.
SQL> alter table comp_classic nologging;
Table altered.
SQL> insert /*+ append */ into comp_classic select * from sales;
20000000 rows created.

Elapsed: 00:00:39.80
SQL> commit;
Commit complete.

OLTP method. Notice that we cannot assume direct loads in an OLTP environment. Therefore conventional insert:

SQL> create table comp_oltp compress for oltp as select * from sales where 1=2;
Table created.
SQL> insert into comp_oltp select * from sales;
20000000 rows created.

Elapsed: 00:02:16.56
SQL> commit;
Commit complete.

QUERY LOW method. We expect this method to have the fastest Load Time:

SQL> create table comp_query_low compress for query low as select * from sales where 1=2;
Table created.
SQL> alter table comp_query_low nologging;
Table altered.
SQL> insert /*+ append */ into comp_query_low select * from sales;
20000000 rows created.

Elapsed: 00:00:26.96
SQL> commit;
Commit complete.

QUERY HIGH method

SQL> create table comp_query_high compress for query high as select * from sales where 1=2;
Table created.
SQL> alter table comp_query_high nologging;
Table altered.

SQL> insert /*+ append */ into comp_query_high select * from sales;
20000000 rows created.

Elapsed: 00:00:33.99
SQL> commit;
Commit complete.

ARCHIVE LOW method

SQL> create table comp_archive_low compress for archive low as select * from sales where 1=2;
Table created.
SQL> alter table comp_archive_low nologging;
Table altered.

SQL> insert /*+ append */ into comp_archive_low select * from sales;
20000000 rows created.

Elapsed: 00:00:35.14
SQL> commit;
Commit complete.

ARCHIVE HIGH method. We expect this method to have the slowest Load Time and the best Compression Ratio:

SQL> create table comp_archive_high compress for archive high as select * from sales where 1=2;
Table created.
SQL> alter table comp_archive_high nologging;
Table altered.

SQL> insert /*+ append */ into comp_archive_high select * from sales;
20000000 rows created.

Elapsed: 00:04:13.13
SQL> commit;
Commit complete.

That was indeed slow! Now let’s look at the outcome:

SQL> select segment_name,bytes/1024/1024 as mb from user_segments order by 1;

SEGMENT_NAME                           MB
------------------------------ ----------
COMP_ARCHIVE_HIGH                       5
COMP_ARCHIVE_LOW                        8
COMP_CLASSIC                          736
COMP_OLTP                             864
COMP_QUERY_HIGH                         8
COMP_QUERY_LOW                        304
SALES                              1535.5

7 rows selected.

We have condensend 1.5 Gig into 5 Megabyte with the ARCHIVE HIGH method!

Summary: Except for ARCHIVE HIGH, we have faster Load Times with HCC than with Block Compression. Every HCC method delivers a better compression ratio than the Block Compression methods. Now how about Query Times?

SQL> set timing on
SQL> connect adam/adam
Connected.
SQL> select count(*) from sales where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:02.07
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan        46.807457

We take the FTS (with Smart Scan) on the uncompressed table as Baseline. I always reconnect to initialize v$mystat. Now Query Performance of BASIC compression:

SQL> connect adam/adam
Connected.
SQL> select count(*) from comp_classic where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:01.79
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan       123.650536

Runtime was faster as uncompressed, Smart Scan was possible. Now OLTP compression:

SQL> connect adam/adam
Connected.
SQL> select count(*) from comp_oltp  where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:01.80
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan        130.22644

Dito. Now QUERY LOW (HCC) compression:

SQL> connect adam/adam
Connected.
SQL>  select count(*) from comp_query_low where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:00.86
SQL>  select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan       2.68922424

Fastest Runtime yet. Smart Scan was possible. Now QUERY HIGH (HCC):

SQL> connect adam/adam
Connected.
SQL> select count(*) from comp_query_high where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:01.27
SQL>  select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan                0

Still faster than uncompressed, although not using Smart Scan. Now ARCHIVE LOW (HCC):

SQL>  connect adam/adam
Connected.
SQL>  select count(*) from comp_archive_low where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:01.13
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan                0

Faster than uncompressed (and surprisingly faster than QUERY HIGH) without Smart Scan. Finally ARCHIVE HIGH (HCC):

SQL> connect adam/adam
Connected.
SQL> select count(*) from comp_archive_high where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:02.93
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan                0

Slowest Query Performance (the only one slower than uncompressed in this experiment) without Smart Scan.

Conclusion: HCC delivers (far) better Compression Ratios than the Block Compression methods. Load Time increases from QUERY LOW (best) over QUERY HIGH and ARCHIVE LOW (both moderate) to ARCHIVE HIGH (longest Load Time). Query Performance decreases similar from QUERY LOW (best) to ARCHIVE HIGH (longest Query Time). All HCC methods except ARCHIVE HIGH delivered better Query Performance than uncompressed, though. In short: HCC is most likely able to save large amounts of space in your Data Warehouse without decreasing Query Performance much if at all. It will probably compress your Archival Data significantly.

About these ads

, ,

  1. #1 by Ignacio Ruiz on March 7, 2011 - 09:06

    “Also, the demo table is somewhat artificial and may therefore deliver better compression ratios compared to Real World tables”… you’re very right!!! I had one customer which db had this huge 310+ columns table and compression improvement from QUERY LOW to HIGH didn’t changed the 10x factor, and moving the next level to ARCHIVE jumped to the 15x compression factor only… thus compression results are very dependent on data sets.

  2. #2 by Uwe Hesse on March 14, 2011 - 16:48

    Ignacio,
    thank you for pointing this out – all I can say about it is: “I told you so” :-)

  3. #3 by Bhavik Desai on April 11, 2011 - 08:34

    Hi Uwe,

    Thanks for such wonderful series on exadata. I appreciate your efforts.

    I have a question on exadata and active standby db.
    i am using a quarter rack exadata machine with EHCC on all my old partitions. I wanted to confirm that whether i can create an active physical standby on non-exadata machine ? If yes, how would Oracle store data on physical standby ? Compressed ? or non compressed ?

    Can you clear one more doubt ? I cannot have smart scan on reports running on active standby since they are on non-exadata….is that right ?

  4. #4 by Uwe Hesse on April 11, 2011 - 19:15

    Hi Bhavik,
    you’re welcome :-)
    To your questions:
    1) It is possible to have a non Exadata Standby Database for an Exadata Primary, even with HCC Compression on the Primary. The HCC compressed columns get actualized on the Standby by Redo Apply – they cannot be accessed by select statements, though. This will compromise the Availability of these tables in case of a failover accordingly – you probably should not go with that strategy
    2) Since Smart Scan is a feature of Exadata Cells, you cannot have it on the non Exadata Standby. You will have just Dumb Scans :-)

  5. #5 by Bhavik Desai on April 12, 2011 - 13:35

    Thanks Uwe for clarifying this.

    So, for my first question, there could be two scenarios
    1) Primary running on Exadata and Standby on traditional storage system

    In this case, all reports running on primary will get benefited by SMART SCAN. Also at storage level EHCC would be in effect.
    However, on standby, MRP will apply NON-COMPRESSED data and thus there will not be any compression. If standby is accessed by ACTIVE DATA GUARD, then also reports will not get any advantage of SMART SCAN as active standby is running on non-exadata platform.

    2) Primary is running on non-exadata platform and standby on exadata.
    In this case, There will not be any EHCC on primary as well as on standby as well. (?) If exadata standby is accessed as ACTIVE standby, then reports can take advantage of SMART SCAN…

    Am i right in understanding your point ?

    Regards,
    Bhavik Desai

  6. #6 by Uwe Hesse on April 12, 2011 - 20:35

    Bhavik, in 1), Redo Apply is done WITHOUT DECOMPRESSING. After a failover/switchover to the non Exadata site, you cannot access the tables with HCC compressed columns. Instead, YOU need to decompress them now. That is why I said it will compromise the availability of these tables.

    2) yes, I think so. Haven’t seen this configuration yet (non Exadata Primary and Exadata Physical Standby with Real-Time Query), though :-)

  7. #7 by John Clarke on May 17, 2011 - 06:15

    In your second query against comp_oltp, it looks as if more bytes are returned over the interconnect although size and block counts are less than the “normal” table. In fact, I’d bet that “cell IO uncompressed bytes” is less for the query against comp_oltp and “cell physical IO interconnect bytes” statistic is higher proportional to “something”, because my tests have shown the same thing. My question is this – what is the “something”? Is the cell un-compressing the OLTP blocks before sending over the interconnect, and if so, why would it do this?

    – John

  8. #8 by Uwe Hesse on May 18, 2011 - 13:06

    John,
    it is indeed so that the Cells do the decompression for HCC compressed tables as well as decryption for TDE encrypted data. So “offloading” does not only contain I/O but also CPU intensive operations. HCC decompression is in fact very CPU intensive, so it makes sense to offload that to the Cells. That is the technical reason why HCC was made exclusively available for Exadata – after a short period in 11gR1 where it was available on conventional Oracle also.

  9. #9 by Ballough on August 23, 2011 - 15:37

    Very informative posting. I’m disappointed though that you didn’t include a load time comparison with nologging OLTP compression, rather than subjecting only this type of compression to redo log overhead.

  10. #10 by Uwe Hesse on August 24, 2011 - 10:54

    Thank you, Ballough, for your comment! To your question: The population of the OLTP table in a real world scenario would probably be done with multiple sessions, inserting single rows over a very long time. In so far, the load time is somewhat useless and maybe I should have avoided to mention it altogether :-)
    The important point to emphasize here was (in my opinion), that although we can’t assume direct load on an OLTP table, the compression ratio is similar as good as with the BASIC method.

  11. #11 by Michael on September 27, 2011 - 10:20

    Hi Uwe,

    can HCC be combined with Secure Files Compression or will we get better results if we are using HCC instead of Secure Files compression for our LOBs?

    Thanks

    Michael

  12. #12 by JSM on October 10, 2011 - 19:03

    Secure Files Compression can do LOB compression/deduplication. HCC doesn’t.

  13. #13 by Anthony on November 20, 2011 - 08:05

    Hi.

    Can you please explain why the selects on the basic compressed and OLTP compressed tables returned approximately 3 times as much data as the select on the uncompressed original table?

    Thanks

    Anthony

  14. #14 by Uwe Hesse on November 20, 2011 - 21:54

    Anthony,
    I think that is because the Cells did not do the decompression for the BASIC and OLTP compressed tables.

    They DID the filtering according to the where condition (the data volume delivered by smart scan are the compressed blocks, containing channel_id 1) but NOT the column projection count(*) like they did for the uncompressed table.

  15. #15 by Marco on December 7, 2011 - 17:51

    Verry good.

  16. #16 by vnmaster on December 22, 2011 - 14:25

    Hi
    EHCC is used for compressing tables. How do we compress the indexes. Is there an option for compressing indexes?

  17. #17 by Mark on August 18, 2012 - 01:34

    All documentation and examples are for compressing tables. Can you address compressing indexes on Exadata.
    1. Is HCC available for indexes? How would you specify, all I see is a “compress” parameter with no additional detail. What type of compression is this enabling?
    2. Can you modify or rebuild a non-compressed index to compressed without dropping?

  18. #18 by Uwe Hesse on August 25, 2012 - 08:41

    1) No. That is the ordinary key compression that has been available since Oracle 8
    2) Yes. See http://docs.oracle.com/cd/E11882_01/server.112/e26088/statements_1010.htm#BABHFFHI

  19. #19 by Kumar Ramalingam on November 1, 2012 - 02:59

    Hi
    I was testing with EHCC compression and storage indexes. In my testing, first I compressed my subpartitions for query high and I ended up with 74K blocks for about 9M rows. I created another test table with the same structure of the previous (same storage attributes) and loaded the data from the partition mentioned above (that has 9M rows) with sorting on two columns. Now, after firing the query with the predicate on the sorted columns, I could get the storage indexes working perfectly.
    But, when I checked the number of blocks occupied for the partition that has the sorted data, it was 20K blocks more than the unsorted partition.
    Do you know, from your testing if it shows such a behavior? Would it be a bug? Thanks in advance.

    –Kumar Ramalingam

  1. Exadata « Oracle Scratchpad
  2. DBMS_COMPRESSION Example « The Oracle Instructor
  3. DML and HCC – Exadata at Amardeep Sidhu's Oracle blog
  4. Top 10 postings in 2012 « The Oracle Instructor

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 2,777 other followers

%d bloggers like this: