Exadata Part III: Compression

With this posting, we look at the New Feature Hybrid Columnar Compression that is only available with the Exadata Database Machine, offering stunning compression ratios together with high query speed. Meanwhile we have not less than 6 different kinds of Table Data Compression methods:

BASIC compression, introduced in Oracle 8 already and only recommended for Data Warehouse
OLTP compression, introduced in Oracle 11 and recommended for OLTP Databases as well
QUERY LOW compression (Exadata only), recommended for Data Warehouse with Load Time as a critical factor
QUERY HIGH compression (Exadata only), recommended for Data Warehouse with focus on Space Saving
ARCHIVE LOW compression (Exadata only), recommended for Archival Data with Load Time as a critical factor
ARCHIVE HIGH compression (Exadata only), recommended for Archival Data with maximum Space Saving

Method 1 and 2 are working by compressing the rows inside of Oracle Blocks – the more redundant values inside the block, the better the compression ratio. OLTP compression might be used with Exadata Database Machine, because Exadata V2 is designed to support OLTP as well as Data Warehouse. Methods 3 to 6 use Hybrid Columnar Compression (HCC), which is designed to achieve better compression ratios by organizing sets of rows into compression units and then detaching the columns from the rows and collecting them. It is more likely that the same columns store similar (redundant) data, therefore the compression ratio improves with HCC.

I will give examples for the 6 compression methods now. Keep in mind that I have only limited access to our Oracle University Exadata Database Machine, restricting me to just one Database Server and two Storage Servers – speed on a Full Rack will probably be faster. Also, the demo table is somewhat artificial and may therefore deliver better compression ratios compared to Real World tables.

In short: Although you should get the picture from the examples, it will be different on your site and it requires thorough testing before you take any kind of compression into production.

The (uncompressed) demonstration table is the same as in the previous posting and was created with this statement:

SQL> create table sales as
select
rownum as id,
rownum + 1 as flag,
'Oracle Enterprise Edition' as product,
mod(rownum,5) as channel_id,
mod(rownum,1000) as cust_id ,
5000 as amount_sold,
trunc(sysdate - 10000 + mod(rownum,10000))
as order_date,
trunc(sysdate - 9999 + mod(rownum,10000))
as ship_date
from dual connect by level<=2e7;

It is 1.5 GB in size. Let’s look how small we can compress and how fast we can load!

BASIC method

SQL> create table comp_classic compress as select * from sales where 1=2;
Table created.
SQL> alter table comp_classic nologging;
Table altered.
SQL> insert /*+ append */ into comp_classic select * from sales;
20000000 rows created.

Elapsed: 00:00:39.80
SQL> commit;
Commit complete.

OLTP method. Notice that we cannot assume direct loads in an OLTP environment. Therefore conventional insert:

SQL> create table comp_oltp compress for oltp as select * from sales where 1=2;
Table created.
SQL> insert into comp_oltp select * from sales;
20000000 rows created.

Elapsed: 00:02:16.56
SQL> commit;
Commit complete.

QUERY LOW method. We expect this method to have the fastest Load Time:

SQL> create table comp_query_low compress for query low as select * from sales where 1=2;
Table created.
SQL> alter table comp_query_low nologging;
Table altered.
SQL> insert /*+ append */ into comp_query_low select * from sales;
20000000 rows created.

Elapsed: 00:00:26.96
SQL> commit;
Commit complete.

QUERY HIGH method

SQL> create table comp_query_high compress for query high as select * from sales where 1=2;
Table created.
SQL> alter table comp_query_high nologging;
Table altered.

SQL> insert /*+ append */ into comp_query_high select * from sales;
20000000 rows created.

Elapsed: 00:00:33.99
SQL> commit;
Commit complete.

ARCHIVE LOW method

SQL> create table comp_archive_low compress for archive low as select * from sales where 1=2;
Table created.
SQL> alter table comp_archive_low nologging;
Table altered.

SQL> insert /*+ append */ into comp_archive_low select * from sales;
20000000 rows created.

Elapsed: 00:00:35.14
SQL> commit;
Commit complete.

ARCHIVE HIGH method. We expect this method to have the slowest Load Time and the best Compression Ratio:

SQL> create table comp_archive_high compress for archive high as select * from sales where 1=2;
Table created.
SQL> alter table comp_archive_high nologging;
Table altered.

SQL> insert /*+ append */ into comp_archive_high select * from sales;
20000000 rows created.

Elapsed: 00:04:13.13
SQL> commit;
Commit complete.

That was indeed slow! Now let’s look at the outcome:

SQL> select segment_name,bytes/1024/1024 as mb from user_segments order by 1;

SEGMENT_NAME                           MB
------------------------------ ----------
COMP_ARCHIVE_HIGH                       5
COMP_ARCHIVE_LOW                        8
COMP_CLASSIC                          736
COMP_OLTP                             864
COMP_QUERY_HIGH                         8
COMP_QUERY_LOW                        304
SALES                              1535.5

7 rows selected.

We have condensend 1.5 Gig into 5 Megabyte with the ARCHIVE HIGH method!

Summary: Except for ARCHIVE HIGH, we have faster Load Times with HCC than with Block Compression. Every HCC method delivers a better compression ratio than the Block Compression methods. Now how about Query Times?

SQL> set timing on
SQL> connect adam/adam
Connected.
SQL> select count(*) from sales where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:02.07
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan        46.807457

We take the FTS (with Smart Scan) on the uncompressed table as Baseline. I always reconnect to initialize v$mystat. Now Query Performance of BASIC compression:

SQL> connect adam/adam
Connected.
SQL> select count(*) from comp_classic where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:01.79
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan       123.650536

Runtime was faster as uncompressed, Smart Scan was possible. Now OLTP compression:

SQL> connect adam/adam
Connected.
SQL> select count(*) from comp_oltp  where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:01.80
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan        130.22644

Dito. Now QUERY LOW (HCC) compression:

SQL> connect adam/adam
Connected.
SQL>  select count(*) from comp_query_low where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:00.86
SQL>  select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan       2.68922424

Fastest Runtime yet. Smart Scan was possible. Now QUERY HIGH (HCC):

SQL> connect adam/adam
Connected.
SQL> select count(*) from comp_query_high where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:01.27
SQL>  select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan                0

Still faster than uncompressed, although not using Smart Scan. Now ARCHIVE LOW (HCC):

SQL>  connect adam/adam
Connected.
SQL>  select count(*) from comp_archive_low where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:01.13
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan                0

Faster than uncompressed (and surprisingly faster than QUERY HIGH) without Smart Scan. Finally ARCHIVE HIGH (HCC):

SQL> connect adam/adam
Connected.
SQL> select count(*) from comp_archive_high where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:02.93
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name = 'cell physical IO interconnect bytes returned by smart scan';  2

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan                0

Slowest Query Performance (the only one slower than uncompressed in this experiment) without Smart Scan.

Conclusion: HCC delivers (far) better Compression Ratios than the Block Compression methods. Load Time increases from QUERY LOW (best) over QUERY HIGH and ARCHIVE LOW (both moderate) to ARCHIVE HIGH (longest Load Time). Query Performance decreases similar from QUERY LOW (best) to ARCHIVE HIGH (longest Query Time). All HCC methods except ARCHIVE HIGH delivered better Query Performance than uncompressed, though. In short: HCC is most likely able to save large amounts of space in your Data Warehouse without decreasing Query Performance much if at all. It will probably compress your Archival Data significantly.

Data Warehouse, exadata, hcc

Dieser Eintrag wurde erstellt am Januar 21, 2011, 12:47 und wurde abgelegt unter TOI. Du kannst die Antworten auf diesen Beitrag über RSS 2.0 verfolgen. Du kannst eine Antwort schreiben oder einen Trackback von deiner eigenen Seite schicken.

#1 von Ignacio Ruiz am März 7, 2011 - 09:06

„Also, the demo table is somewhat artificial and may therefore deliver better compression ratios compared to Real World tables“… you’re very right!!! I had one customer which db had this huge 310+ columns table and compression improvement from QUERY LOW to HIGH didn’t changed the 10x factor, and moving the next level to ARCHIVE jumped to the 15x compression factor only… thus compression results are very dependent on data sets.
#2 von Uwe Hesse am März 14, 2011 - 16:48

Ignacio,
thank you for pointing this out – all I can say about it is: „I told you so“ 🙂
#3 von Bhavik Desai am April 11, 2011 - 08:34

Hi Uwe,

Thanks for such wonderful series on exadata. I appreciate your efforts.

I have a question on exadata and active standby db.
i am using a quarter rack exadata machine with EHCC on all my old partitions. I wanted to confirm that whether i can create an active physical standby on non-exadata machine ? If yes, how would Oracle store data on physical standby ? Compressed ? or non compressed ?

Can you clear one more doubt ? I cannot have smart scan on reports running on active standby since they are on non-exadata….is that right ?
#4 von Uwe Hesse am April 11, 2011 - 19:15

Hi Bhavik,
you’re welcome 🙂
To your questions:
1) It is possible to have a non Exadata Standby Database for an Exadata Primary, even with HCC Compression on the Primary. The HCC compressed columns get actualized on the Standby by Redo Apply – they cannot be accessed by select statements, though. This will compromise the Availability of these tables in case of a failover accordingly – you probably should not go with that strategy
2) Since Smart Scan is a feature of Exadata Cells, you cannot have it on the non Exadata Standby. You will have just Dumb Scans 🙂
#5 von Bhavik Desai am April 12, 2011 - 13:35

Thanks Uwe for clarifying this.

So, for my first question, there could be two scenarios
1) Primary running on Exadata and Standby on traditional storage system

In this case, all reports running on primary will get benefited by SMART SCAN. Also at storage level EHCC would be in effect.
However, on standby, MRP will apply NON-COMPRESSED data and thus there will not be any compression. If standby is accessed by ACTIVE DATA GUARD, then also reports will not get any advantage of SMART SCAN as active standby is running on non-exadata platform.

2) Primary is running on non-exadata platform and standby on exadata.
In this case, There will not be any EHCC on primary as well as on standby as well. (?) If exadata standby is accessed as ACTIVE standby, then reports can take advantage of SMART SCAN…

Am i right in understanding your point ?

Regards,
Bhavik Desai
#6 von Uwe Hesse am April 12, 2011 - 20:35

Bhavik, in 1), Redo Apply is done WITHOUT DECOMPRESSING. After a failover/switchover to the non Exadata site, you cannot access the tables with HCC compressed columns. Instead, YOU need to decompress them now. That is why I said it will compromise the availability of these tables.

2) yes, I think so. Haven’t seen this configuration yet (non Exadata Primary and Exadata Physical Standby with Real-Time Query), though 🙂
#7 von John Clarke am Mai 17, 2011 - 06:15

In your second query against comp_oltp, it looks as if more bytes are returned over the interconnect although size and block counts are less than the „normal“ table. In fact, I’d bet that „cell IO uncompressed bytes“ is less for the query against comp_oltp and „cell physical IO interconnect bytes“ statistic is higher proportional to „something“, because my tests have shown the same thing. My question is this – what is the „something“? Is the cell un-compressing the OLTP blocks before sending over the interconnect, and if so, why would it do this?

– John
#8 von Uwe Hesse am Mai 18, 2011 - 13:06

John,
it is indeed so that the Cells do the decompression for HCC compressed tables as well as decryption for TDE encrypted data. So „offloading“ does not only contain I/O but also CPU intensive operations. HCC decompression is in fact very CPU intensive, so it makes sense to offload that to the Cells. That is the technical reason why HCC was made exclusively available for Exadata – after a short period in 11gR1 where it was available on conventional Oracle also.
#9 von Ballough am August 23, 2011 - 15:37

Very informative posting. I’m disappointed though that you didn’t include a load time comparison with nologging OLTP compression, rather than subjecting only this type of compression to redo log overhead.
#10 von Uwe Hesse am August 24, 2011 - 10:54

Thank you, Ballough, for your comment! To your question: The population of the OLTP table in a real world scenario would probably be done with multiple sessions, inserting single rows over a very long time. In so far, the load time is somewhat useless and maybe I should have avoided to mention it altogether 🙂
The important point to emphasize here was (in my opinion), that although we can’t assume direct load on an OLTP table, the compression ratio is similar as good as with the BASIC method.
#11 von Michael am September 27, 2011 - 10:20

Hi Uwe,

can HCC be combined with Secure Files Compression or will we get better results if we are using HCC instead of Secure Files compression for our LOBs?

Thanks

Michael
#12 von JSM am Oktober 10, 2011 - 19:03

Secure Files Compression can do LOB compression/deduplication. HCC doesn’t.
#13 von Anthony am November 20, 2011 - 08:05

Hi.

Can you please explain why the selects on the basic compressed and OLTP compressed tables returned approximately 3 times as much data as the select on the uncompressed original table?

Thanks

Anthony
#14 von Uwe Hesse am November 20, 2011 - 21:54

Anthony,
I think that is because the Cells did not do the decompression for the BASIC and OLTP compressed tables.

They DID the filtering according to the where condition (the data volume delivered by smart scan are the compressed blocks, containing channel_id 1) but NOT the column projection count(*) like they did for the uncompressed table.
#15 von Marco am Dezember 7, 2011 - 17:51

Verry good.
#16 von vnmaster am Dezember 22, 2011 - 14:25

Hi
EHCC is used for compressing tables. How do we compress the indexes. Is there an option for compressing indexes?
#17 von Mark am August 18, 2012 - 01:34

All documentation and examples are for compressing tables. Can you address compressing indexes on Exadata.
1. Is HCC available for indexes? How would you specify, all I see is a „compress“ parameter with no additional detail. What type of compression is this enabling?
2. Can you modify or rebuild a non-compressed index to compressed without dropping?
#18 von Uwe Hesse am August 25, 2012 - 08:41

1) No. That is the ordinary key compression that has been available since Oracle 8
2) Yes. See http://docs.oracle.com/cd/E11882_01/server.112/e26088/statements_1010.htm#BABHFFHI
#19 von Kumar Ramalingam am November 1, 2012 - 02:59

Hi
I was testing with EHCC compression and storage indexes. In my testing, first I compressed my subpartitions for query high and I ended up with 74K blocks for about 9M rows. I created another test table with the same structure of the previous (same storage attributes) and loaded the data from the partition mentioned above (that has 9M rows) with sorting on two columns. Now, after firing the query with the predicate on the sorted columns, I could get the storage indexes working perfectly.
But, when I checked the number of blocks occupied for the partition that has the sorted data, it was 20K blocks more than the unsorted partition.
Do you know, from your testing if it shows such a behavior? Would it be a bug? Thanks in advance.

–Kumar Ramalingam
#20 von Mike Ault am Dezember 2, 2014 - 18:58

I realize this is an old blog but I wanted to ask a question, using AWR data only how can you determine the bandwidth used when HCC is used?
#21 von sshdba am Januar 9, 2015 - 22:32

Reblogged this on Easy Oracle DBA.
#22 von Nitin Kashiv am Juni 22, 2015 - 13:24

Hello dear
I like this post very much, would like to be a member/follower please add me

Regards
Nitin
#23 von Uwe Hesse am Juni 30, 2015 - 07:14

Nitin, you can subscribe yourself easy: Just click the subscribe button in the upper left which says: ‚Sign me up!‘
#24 von MIke Kim am August 15, 2015 - 20:26

Uwe,

I am trying to find a reference for the fact that decompression for Basic and OLTP does occur in the exadata cells and not in the database servers. I have read conflicting information and need some guidance.
#25 von Uwe Hesse am August 17, 2015 - 13:35

Mike, to my knowledge, decompression can only be offloaded for hybrid columnar compression.
#26 von Mike Kim am August 17, 2015 - 18:18

Uwe
Thanks for the prompt reply. I thought the same way and wanted some opinion from the master.

Mike
#27 von FAM am Februar 7, 2017 - 12:19

Hi Uwe

In my environment HCC is enabled for some tables , with QUERY HIGH and oltp option may I know how to disable this compression.
#28 von Uwe Hesse am Februar 7, 2017 - 13:25

FAM, you have to do ALTER TABLE … MOVE NOCOMPRESS in order to remove any compression, including HCC