Exadata Part II: Storage Index

Another important feature of the Exadata Database Machine that makes it more than just a collection of High-End Hardware are Storage Indexes. Storage Indexes are constructed automatically inside of the memory of the Storage Servers when Database segments (like heap tables) are being scanned. The Storage Cells are divided into 1 MB chunks and inside of these chunks, we record the minimum and maximum values of the columns of the table that allocated space inside that chunk. It is very well possible that these Storage Indexes make many of your “conventional indexes” obsolete. Look at following drastic example:

SQL> connect adam/adam
Connected.
SQL> select bytes/1024/1024/1024 as gb from user_segments where segment_name='SALES';

 GB
----------
 1.5

This demonstration table was created with the following statement

SQL> create table sales as
select
rownum as id,
rownum + 1 as flag,
'Oracle Enterprise Edition' as product,
mod(rownum,5) as channel_id,
mod(rownum,1000) as cust_id ,
5000 as amount_sold,
trunc(sysdate - 10000 + mod(rownum,10000))
as order_date,
trunc(sysdate - 9999 + mod(rownum,10000))
as ship_date
from dual connect by level<=2e7;

I needed to put WORKAREA_SIZE_POLICY=MANUAL and I specified a SORT_AREA_SIZE=1048576000 to be able to do the connect by level<=2e7, by the way. That is because of the limitation of 200M pga memory for a single session when using PGA_AGGREGATE_TARGET. Anyway, please notice that id has unique values and there is no conventional index present.

SQL> set timing on
SQL> select * from sales where id=4711;

 ID       FLAG PRODUCT                   CHANNEL_ID    CUST_ID AMOUNT_SOLD ORDER_DAT SHIP_DATE
---------- ---------- ------------------------- ---------- ---------- ----------- --------- ---------
 4711       4712 Oracle Enterprise Edition          1        711        5000 28-JUL-96 29-JUL-96

Elapsed: 00:00:00.11

I have just scanned 1.5 Gigabyte of data in fractions of a second! [Previously said in 11 Milliseconds, but that was of course a mistake – apart from SQL*Plus not being able to produce that exact timing, I just got the unit wrong] That was possible, because the Storage Cells knew that they could exclude 1499 chunks from scanning. 4711 was only possibly present inside one of the chunks. Please notice that that was not a Smart Scan in the first place – we did not offload much I/O on the Storage Layer. Instead, we avoided I/O almost completely:

SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name in
('cell physical IO interconnect bytes returned by smart scan',
'cell physical IO bytes saved by storage index');   2    3    4

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                    1483.67969
cell physical IO interconnect bytes returned by smart scan       .001525879

To show the difference, I will do a Smart Scan. I connect again to initialize v$mystat

SQL> connect adam/adam
Connected.
SQL> select count(*) from sales where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:02.04
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name in
('cell physical IO interconnect bytes returned by smart scan',
'cell physical IO bytes saved by storage index');  2    3    4

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                             0
cell physical IO interconnect bytes returned by smart scan       46.7903976

In this case, we scanned 1.5 Gig on the Storage Layer but delivered only 46 MB to the Database Layer. Now back to the Storage Index. Recently, I encountered the opinion that Storage Indexes would not work together with Bind Variables for some reason. That is not the case:

SQL> connect adam/adam
Connected.
SQL> vari n number
SQL> exec :n:=4711

PL/SQL procedure successfully completed.

SQL> set timing on
SQL> select * from sales where id=:n;

 ID       FLAG PRODUCT                   CHANNEL_ID    CUST_ID AMOUNT_SOLD ORDER_DAT SHIP_DATE
---------- ---------- ------------------------- ---------- ---------- ----------- --------- ---------
 4711       4712 Oracle Enterprise Edition          1        711        5000 28-JUL-96 29-JUL-96

Elapsed: 00:00:00.04
SQL>  select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name in
('cell physical IO interconnect bytes returned by smart scan',
'cell physical IO bytes saved by storage index');  2    3    4

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                    1483.67969
cell physical IO interconnect bytes returned by smart scan       .001525879

The Storage Index clearly could get used. Same with implicit datatype conversion:

SQL> vari c char(4)
SQL> exec :c:='4711'
SQL>  select * from sales where id=:c;

 ID       FLAG PRODUCT                   CHANNEL_ID    CUST_ID AMOUNT_SOLD ORDER_DAT SHIP_DATE
---------- ---------- ------------------------- ---------- ---------- ----------- --------- ---------
 4711       4712 Oracle Enterprise Edition          1        711        5000 28-JUL-96 29-JUL-96

Elapsed: 00:00:00.03
SQL>  select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name in
('cell physical IO interconnect bytes returned by smart scan',
'cell physical IO bytes saved by storage index');  2    3    4

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                    2967.35938
cell physical IO interconnect bytes returned by smart scan       .003051758

It would have surprised me a lot if Bind Variables would introduce any issues with Storage Indexes, because they are processed on the Database Layer. In fact, from the Optimizers viewpoint all these statements lead to the same execution plan: Full Table Scan. There could be an issue (but not exclusively related to Bind Variables), if we would have conventional indexes present here as well, because the Optimizer might favor them over the Storage Index. The Optimizer is not aware of Storage Indexes. Before I get another comment from Kevin suggesting to show joins with the example, let’s just do that:

SQL> select * from channels;

CHANNEL_ID CHANNEL_DESC
---------- ----------------------------------------
 0 Telesales
 1 Web
 2 Field
 3 Retail
 4 Promotion
 99 Special
SQL> create table customers as select distinct cust_id, 'John Doe' as cust_name from sales;

Table created.

Channels and Customers are now two dimension tables.

SQL> set timing on
SQL> select product,channel_desc,cust_name
from sales natural join channels
natural join customers
where id=4711;
 2    3    4
PRODUCT                   CHANNEL_DESC                             CUST_NAM
------------------------- ---------------------------------------- --------
Oracle Enterprise Edition Web                                      John Doe

Elapsed: 00:00:00.06
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name in
('cell physical IO interconnect bytes returned by smart scan',
'cell physical IO bytes saved by storage index');  2    3    4

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                    1483.67969
cell physical IO interconnect bytes returned by smart scan       .001495361

Storage Indexes can obviously be used in this case as well. When we select on columns with non unique values, it depends on how well clustered the values are, whether it is beneficial to use a Storage Index. The values of order_date column of the sales table are spread across the whole table, which makes the usage of a Storage Index impossible to speed up the query:

SQL> connect adam/adam
Connected.
SQL> set timing on
SQL> select count(*) from sales where order_date=to_date('11.03.2009','dd.mm.yyyy');

 COUNT(*)
----------
 2000

Elapsed: 00:00:02.30
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name in
('cell physical IO interconnect bytes returned by smart scan',
'cell physical IO bytes saved by storage index');
 2    3    4
NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                             0
cell physical IO interconnect bytes returned by smart scan       .672538757

When we create the same table, but ordered by order_date, this changes:

SQL> create table sales_order as select * from sales where 1=2;
Table created.
SQL> alter table sales_order nologging;
Table altered.
SQL> insert /*+ append */ into sales_order select * from sales order by order_date;

20000000 rows created.
SQL> commit;

Commit complete.

Now we need to access the new table multiple times to get the Storage Index built:

SQL> connect adam/adam
Connected.
SQL>  select count(*) from sales_order where order_date=to_date('11.03.2009','dd.mm.yyyy');

 COUNT(*)
----------
 2000

Elapsed: 00:00:02.02
SQL>  select count(*) from sales_order where order_date=to_date('11.03.2009','dd.mm.yyyy');

 COUNT(*)
----------
 2000

Elapsed: 00:00:01.73
SQL>  select count(*) from sales_order where order_date=to_date('11.03.2009','dd.mm.yyyy');

 COUNT(*)
----------
 2000

Elapsed: 00:00:00.03
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name in
('cell physical IO interconnect bytes returned by smart scan',
'cell physical IO bytes saved by storage index');  2    3    4

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                    1488.77344
cell physical IO interconnect bytes returned by smart scan       .506607056

Now the Storage has “learned” that it is beneficial to have a Storage Index on the sales_order table. This index will be kept as long as the Cells are up. The Storage Index is created over all the table_columns, apparently. Please notice that I now access the column ship_date (strongly correlated to order_date) for the first time:

SQL> select count(*) from sales_order where ship_date=to_date('11.03.2009','dd.mm.yyyy');

 COUNT(*)
----------
 2000

Elapsed: 00:00:00.37
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name in
('cell physical IO interconnect bytes returned by smart scan',
'cell physical IO bytes saved by storage index');   2    3    4

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                    2765.77344
cell physical IO interconnect bytes returned by smart scan       .562469482

Conclusion: Storage Indexes are another Smart Storage feature of Exadata Database Machine. They may make many conventional indexes obsolete. Although they are created and maintained automatically by the Storage Servers, it is important to know their behavior. Especially, it is relevant that (non unique) column values that we later on query for ocurr ordered to be able to benefit from Storage Indexes.

, ,

  1. #1 by Bhavik Desai on January 25, 2011 - 08:39

    Thanks for your valuable test results. It really puts lot of values at times when nothing concrete is available on Exa-Data. End user community (Like me) who has not started working on EXA-DATA are getting benefited a lot out of such posts.

    “Although they are created and maintained automatically by the Storage Servers”…

    May i ask you the overhead of maintaining such indexes dynamically ? Is there statistic that we can watch to keep an eye on cost of such index maintenance ? Have you any tested anything on how storage server maintains such storage indexes upon DELETE ? I am more curious to know how it updates min/max values in 1MB storage regions?

    Other question is : Why just 1MB? Do DBAs given any liberty to change size of such storage regions?

  2. #2 by Surachart Opun on March 1, 2011 - 20:14

    help me for Storage Index. I tested but no see “cell physical IO bytes saved by storage index” value

    NAME MB
    —————————————————————- ———-
    cell physical IO bytes saved by storage index 0
    cell physical IO interconnect bytes returned by smart scan 466.047287

    What should I do?

    Thank You

  3. #3 by Surachart Opun on March 2, 2011 - 06:41

    table on tablespace (griddisk type flash):
    NAME MB
    —————————————————————- ———-
    cell physical IO bytes saved by storage index 0
    cell physical IO interconnect bytes returned by smart scan 3.83892822

    table on tablespace (griddisk type hardisk):
    NAME MB
    —————————————————————- ———-
    cell physical IO bytes saved by storage index 16723.6563
    cell physical IO interconnect bytes returned by smart scan .453071594

  4. #4 by Surachart Opun on March 2, 2011 - 06:55

    After I waited and waited, then login again:
    On table at tablespace (griddisk type flash):

    NAME MB
    —————————————————————- ———-
    cell physical IO bytes saved by storage index 11920.8281
    cell physical IO interconnect bytes returned by smart scan .188720703

    I think… this feature support on all disk type… but on (griddisk type hardisk):
    we will see “cell physical IO bytes saved by storage index” immediately.
    and on (griddisk type flash) – just wait … and login again for testing.

  5. #5 by Uwe Hesse on March 14, 2011 - 16:38

    Surachart,
    thank you for this interesting bit of research🙂

  6. #6 by Surachart Opun on March 14, 2011 - 16:47

    just waited few minute, before I saw “cell physical IO bytes saved by storage index”

  7. #7 by Ram Santosh on August 28, 2013 - 01:24

    Uwe,

    Good Day..

    We had gone through an Emergency Preventive Maintainence (EPM) activity. We found that the loads were running very slow for few days… I know that when the exadata machine is down, the storage indexes will vanish.. Does this is the primary reason for performance issues or anything else… If it is so.. Is there any other way to come out of it or make the performance impact less… Do kindly suggest…

  8. #8 by Uwe Hesse on August 30, 2013 - 14:23

    Ram Santosh, it is true that Storage Indexes live in the memory of the Cells and get lost when the cells are restarted. That may have a performance impact, but I don’t think that loads (bulk inserts) should be affected by the absence of Storage Indexes.

  9. #9 by Ram Santosh on August 30, 2013 - 14:32

    Thanks for the post Uwe,

    When ever we go for EPM activity, we are getting serious performance issues and were unable to meet SLA’s for few days. After doing some googling, i found that whenever an EPM activity, the performance goes down. Is that so? If it goes down, are there any preventive measures we can take up to reduce these performance issues.

  10. #10 by Uwe Hesse on August 30, 2013 - 14:42

    When you power down a whole Rack on behalf of the EPM, it is quite natural to get bad performance after a restart since the caches and shared pools of all the DB servers are empty again. Same goes for the memory of the cells. I don’t see much you can do against it – except keep the Rack up🙂

  11. #11 by rafi on August 3, 2014 - 09:28

    thanks for Knowlege sharing. i am new to Exadata. can you please provide me the docs for reference.

  12. #12 by robin chatterjee on February 21, 2015 - 11:45

    Hi Uwe, I have an issue with the statement “Please notice that that was not a Smart Scan in the first place – we did not offload much I/O on the Storage Layer. Instead, we avoided I/O almost completely” A query that uses idb protocol and pushes down the predicate is attempting a smartscan. Whether the smartscan is successful in reducing i/o or not is another question. thus if predicate pushdown has occurred it is a smartscan no matter what magnitude of i/o is saved by the same. if you set cell_offload_capable to false the explain plan will change thus causeing a non smartscan explain plan. People are reading your post and assumingn that a smartscan is only a smartscan if storage extent pruning happens. My contention is that extent pruning requries smartscan but smartscan does not require extent pruning, it requires cell offload capability to be true and direct path reads.

  13. #13 by Chris Hammond on September 1, 2016 - 00:51

    Hi Uwe, Thanks for your great, detailed, AND clear explanations and examples. Tremendously helpful!
    I’ve been Googleing this, but have not gotten any hits about what happens to storage indexes after a flush of the library
    and buffer caches. Do storage indexes built as a result of the cursors and data accessed before the flush have to be recreated,
    or are they still available and could be used when a new version of a particular query is executed the first time?

    Thanks!

    Regards,
    Chris H.

  14. #14 by Chris Hammond on September 1, 2016 - 01:30

    Hi Uwe,

    Since Storage Indexes live on the Storage Server, they should still exist after a Flush of Cache on the DB Server – is that correct?
    What if the DB is shutdown and restarted? Logically it seems that Storage Indexes should still be in place on the data – logically to me at least. Is that true? Thank you!

    Regards,
    Chris

  15. #15 by Uwe Hesse on September 1, 2016 - 10:10

    Hi Chris, Storage Indexes are being created automatically in the memory of the cells after columns of tables have been accessed in the database layer. They stay in memory of the cells after instance restart or when objects age out there off the library cache. But they are gone after the cells have been restarted.

  16. #16 by Chris Hammond on September 3, 2016 - 00:51

    Hi Uwe,

    Thanks for the information, and for your quick response – both are much appreciated!

    What’s happening here is that a poor performing query is being run over and over again (multiple
    people are doing this) in the same environment, and at one point the execution time, worst case,
    was reported as being more than an hour), and then one engineer reported he got results in 29 seconds!
    Only plausible way that might conceivably happen would seem to be Storage Indexes kicking-in after the same
    data was referenced multiple times. Both engineers said that a FULL TS was shown in their
    Explain Plans on the one Huge table in the mix (110 million rows).

    It was suggested that the instance be shutdown and restarted to 1. clear the library cache
    of the cursors (I advised them that only the specific cursors you want to remove can be flushed
    out in 11g if you get the sqlids) and 2. clear out all the repeatedly accessed data, which would have to be in
    cache (no one else is executing anything else on that db during this tuning). That was done.

    So the buffer cache was cleared then, but the actual data would still be on the Storage Server –
    and WITH the Storage indexes still in place, and available for use by the new cursor — is that
    correct? And , if I understood correctly, those indexes will still be in place and available for use
    until the Storage Server is bounced? Is there another way to clear the cells?

    Sorry, one last question (I AM reading about this online).

    Let’s say I am running this one query several times, and then a slightly different query
    is executed (different cursor). If the results set is different for the second query, –
    maybe a couple predicates were added and more filtering took place – can the
    same Storage Index be used ? I guess what I ‘m not clear on is – with a traditional
    index – we create the index on some data definition, so when more data that matches
    that definition is added, it becomes part of that index. Not clear on how that fits (or doesn’t)
    with Storage Indexes…

    Thank you for your time and help!

    Regards,
    Chris

  17. #17 by Uwe Hesse on September 3, 2016 - 07:54

    Chris, the effect you mentioned may come from storage indexes, yes. They can be built after the columns have been accessed for the first time from the database layer, then they live in the memory of the cells. They are not bound to a particular query, so they may speed up any query that has those columns in the where condition subsequently. In order to turn them off, there is an underscore parameter: _kcfis_storageidx_disabled This way, you can confirm the effect without having to restart the cells. Storage Indexes are named confusingly: They are absolutely different from conventional indexes that you are used to, don’t try to transfer techniques and concepts you know from conventional indexes to storage indexes.

  18. #18 by Chris Hammond on September 5, 2016 - 00:48

    Hi Uwe,

    Thanks again for your quick responses and terrific insights. Much appreciated!
    I’ve been reading up on the this topic for the last few days, and most times when I get to
    an Expert’s website I find some input from you, with thanks from the website’s author, so
    clearly your knowledge on this is held in high esteem by the top people in the industry.

    Is there a way to determine what columns of a table Storage Indexes have been created
    on? There is an 8 column limit. If we suddenly start getting poor performance on a query that
    was performing beautifully in production, it would be nice to know if a column in the query
    was using had a Storage Index before, but that Storage Index was lost when another query came in on
    the same table using a different column (the 9th column) and Oracle bumped out that
    Storage Index to create a new one on the 9th column.

    From reading about this on the web for hours, it seems not to be a FIFO situation, such that the first
    column built with a Storage Index is not bumped out if 8 columns are indexed and a 9th on comes in with
    a new query used in a predicate against the same table. Apparently it’s not an LRU algorithm either.

    So it seems extremely important to have the ability to know exactly what you’ve got working for
    you at any given point, especially in production – particularly since, unlike B-Tree indexes,
    you can’t find out what Storage Indexes are being used just by reExplaining the query that
    suddenly went bad..

    Thanks again for your time and great insights.

    Regards,
    Chris Hammond

  19. #19 by robinsc on September 5, 2016 - 08:00

    Hi Chris, I think one of the most important things to bear in mind when thinking about the 8 column setting is that it is 8 columns per storgae extent , not 8 columns per table, so it is quite possible that if you have queries with filtering consitions different extends in the same table could have differnt sets of 8 columns. thisis also dynamic so if a particular column starts being more and more prevalently used at some point it may be chosen in favor of another column for that specific 1 mb extent. Now in light of this you will understand how storage indexes differ from normal indexes and how futuile it would be to want to have control over this because of course unlike a normal index where you would need to track 8 columns here you would have to track 8 columns* (size of table in mb) * redundancy factor( 2 or 3 depending on normal or high redundancy) which of course is impossible for a human being even for a single average table

  20. #20 by robinsc on September 5, 2016 - 08:17

    By extent I actually meant storage region which is the 1mb chunk into which Exadata storage divides each object.

  1. Tweets that mention Exadata Part II: Storage Index « The Oracle Instructor -- Topsy.com
  2. Exadata « Oracle Scratchpad
  3. c0t0d0s0.org
  4. Invisible Indexes & Exadata « The Oracle Instructor
  5. Important Statistics & Wait Events on Exadata « The Oracle Instructor

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: