How #Exadata benefits your Data Warehouse

A migration of your Data Warehouse to Exadata will deliver most likely the combined benefits of Hybrid Columnar Compression, Smart Scan and Storage Indexes. That means better performance with less storage space consumption. Let’s see an example:

 

SQL> desc sales_noexa
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID                                                 NUMBER
 PRODUCT                                            CHAR(25)
 CHANNEL_ID                                         NUMBER
 CUST_ID                                            NUMBER
 AMOUNT_SOLD                                        NUMBER
 ORDER_DATE                                         DATE
 SHIP_DATE                                          DATE

SQL> select count(*) from sales_noexa;

  COUNT(*)
----------
  80000000

SQL> select sum(bytes)/1024/1024 as mb from user_segments where segment_name='SALES_NOEXA';

        MB
----------
      5520

This table is partitioned by ORDER_DATE with one partition per year. Without Exadata, performance is as follows:

SQL> select count(*),avg(amount_sold) from sales_noexa where channel_id=2;

  COUNT(*) AVG(AMOUNT_SOLD)
---------- ----------------
  16000000             5000

Elapsed: 00:00:11.06
SQL> select sum(amount_sold) from sales_noexa where ship_date = to_date('01.01.2011','dd.mm.yyyy');

SUM(AMOUNT_SOLD)
----------------
        40000000

Elapsed: 00:00:06.07
SQL> select sum(amount_sold) from sales_noexa where order_date = to_date('01.01.2011','dd.mm.yyyy');

SUM(AMOUNT_SOLD)
----------------
        40000000

Elapsed: 00:00:00.15

There is no index, so each statement above was a Full Table Scan. The last was much faster because of Partition Pruning. Now the same on Exadata:

SQL> desc sales_exa
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID                                                 NUMBER
 PRODUCT                                            CHAR(25)
 CHANNEL_ID                                         NUMBER
 CUST_ID                                            NUMBER
 AMOUNT_SOLD                                        NUMBER
 ORDER_DATE                                         DATE
 SHIP_DATE                                          DATE

SQL> select count(*) from sales_exa;

  COUNT(*)
----------
  80000000

SQL> select sum(bytes)/1024/1024 as mb from user_segments where segment_name='SALES_EXA';

        MB
----------
      1574

The table is way smaller although it contains the very same rows. That’s because I used Hybrid Columnar Compression to create this table:

create table sales_exa (id number, product char(25), channel_id number, cust_id number, amount_sold number, order_date date, ship_date date)
partition by range (order_date)
(
partition p1990 values less than (to_date('01.01.1991','dd.mm.yyyy')) compress for archive high,
partition p1991 values less than (to_date('01.01.1992','dd.mm.yyyy')) compress for archive high,
partition p1992 values less than (to_date('01.01.1993','dd.mm.yyyy')) compress for archive high,
partition p1993 values less than (to_date('01.01.1994','dd.mm.yyyy')) compress for archive high,
partition p1994 values less than (to_date('01.01.1995','dd.mm.yyyy')) compress for archive high,
partition p1995 values less than (to_date('01.01.1996','dd.mm.yyyy')) compress for archive high,
partition p1996 values less than (to_date('01.01.1997','dd.mm.yyyy')) compress for archive low,
partition p1997 values less than (to_date('01.01.1998','dd.mm.yyyy')) compress for archive low,
partition p1998 values less than (to_date('01.01.1999','dd.mm.yyyy')) compress for archive low,
partition p1999 values less than (to_date('01.01.2000','dd.mm.yyyy')) compress for archive low,
partition p2000 values less than (to_date('01.01.2001','dd.mm.yyyy')) compress for archive low,
partition p2001 values less than (to_date('01.01.2002','dd.mm.yyyy')) compress for query high,
partition p2002 values less than (to_date('01.01.2003','dd.mm.yyyy')) compress for query high,
partition p2003 values less than (to_date('01.01.2004','dd.mm.yyyy')) compress for query high,
partition p2004 values less than (to_date('01.01.2005','dd.mm.yyyy')) compress for query high,
partition p2005 values less than (to_date('01.01.2006','dd.mm.yyyy')) compress for query high,
partition p2006 values less than (to_date('01.01.2007','dd.mm.yyyy')) compress for query low,
partition p2007 values less than (to_date('01.01.2008','dd.mm.yyyy')) compress for query low,
partition p2008 values less than (to_date('01.01.2009','dd.mm.yyyy')) compress for query low,
partition p2009 values less than (to_date('01.01.2010','dd.mm.yyyy')) compress for query low,
partition p2010 values less than (to_date('01.01.2011','dd.mm.yyyy')) compress for query low,
partition p2011 values less than (to_date('01.01.2012','dd.mm.yyyy')),
partition p2012 values less than (to_date('01.01.2013','dd.mm.yyyy')),
partition p2013 values less than (to_date('01.01.2014','dd.mm.yyyy')),
partition p2014 values less than (to_date('01.01.2015','dd.mm.yyyy')),
partition p2015 values less than (to_date('01.01.2016','dd.mm.yyyy')),
partition p2016 values less than (to_date('01.01.2017','dd.mm.yyyy')),
partition p2017 values less than (to_date('01.01.2018','dd.mm.yyyy'))
);

The older the partitions, the stronger the compression – that saved about 4 gig of storage space in this case. How about the performance?

SQL> set timing on
SQL> select count(*),avg(amount_sold) from sales_exa where channel_id=2;

  COUNT(*) AVG(AMOUNT_SOLD)
---------- ----------------
  16000000             5000

Elapsed: 00:00:05.83
SQL> select sum(amount_sold) from sales_exa where order_date = to_date('01.01.2011','dd.mm.yyyy');

SUM(AMOUNT_SOLD)
----------------
        40000000

Elapsed: 00:00:00.14

The first Full Table Scan is twice as fast as before now because of Smart Scan, while the Partition Pruning shows about the same effect as before. A real whopper is that one:

SQL> select sum(amount_sold) from sales_exa where ship_date = to_date('01.01.2011','dd.mm.yyyy');

SUM(AMOUNT_SOLD)
----------------
        40000000

Elapsed: 00:00:00.22

That was more than 6 seconds without Exadata before! Why is that so much faster now, almost as good as the Partition Pruning performance?

SQL> connect adam/adam
Connected.

SQL> select sum(amount_sold) from sales_exa where ship_date = to_date('01.01.2011','dd.mm.yyyy');

SUM(AMOUNT_SOLD)
----------------
        40000000

SQL> select name,value/1024/1024 as mb from v$statname
     natural join v$mystat where name='cell physical IO bytes saved by storage index';

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                    1385.64844

I connected newly to initialize v$mystat. The above shows that not only do we need to do less I/O because of the decreased size of the table due to compression and not only did we have a Smart Scan but additionally a Storage Index could be used to further significantly limit the amount of data the cells had to scan through. That is possible because the partitioning on ORDER_DATE leads to a physical sort on disk not only on the ORDER_DATE column but also on the SHIP_DATE column. SHIP_DATE is usually very close to ORDER_DATE. And this ordered way the data is stored is the foundation of the successful usage of Storage Indexes.

Overall, space consumption went down while performance went up 🙂

The above demonstration has been done on our Accenture Enkitec Exadata lab environment, using an X3-2 quarter rack. A more recent Exadata generation respectively a half or full rack would most likely deliver even better results. The tests with SALES_NOEXA have been done by putting CELL_OFFLOAD_PROCESSING=FALSE. That way, I get the hardware-related performance of an Exadata machine without the secret sauce of the Exadata software on the cells that enables them to do Smart Scan in particular.

Now does the demonstrated apply only to Data Warehouse systems? At least to some degree, the answer is yes: Hybrid Columnar Compression is not suitable for many subsequent updates; you wouldn’t want to use that for a table with Online Transaction Processing (OLTP) access pattern. You may have some rather static tables in an OLTP system that may benefit from it, though. Smart Scan requires Full Table Scan respectively Full Index Scan done as Direct Reads. Again, that is not the typical access pattern of an OLTP system although you may have some situations where that happens also there. Same for Storage Indexes – they can only get involved on behalf of a Smart Scan.

  1. #1 von aitorito am November 2, 2017 - 19:11

    Nice Uwe… can you add the insert or sqlldr statments to add the information in the table to compare results in my database?

Kommentar verfassen

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

WordPress.com-Logo

Du kommentierst mit Deinem WordPress.com-Konto. Abmelden /  Ändern )

Twitter-Bild

Du kommentierst mit Deinem Twitter-Konto. Abmelden /  Ändern )

Facebook-Foto

Du kommentierst mit Deinem Facebook-Konto. Abmelden /  Ändern )

Verbinde mit %s

Diese Seite verwendet Akismet, um Spam zu reduzieren. Erfahre, wie deine Kommentardaten verarbeitet werden..