Beiträge getaggt mit storage index

Invisible Indexes & Exadata

You may have noticed that we introduced Invisible Indexes as an 11g New Feature. Their main benefit is that we can test whether performance differs if we would drop an index without actually dropping it. This is particular useful after an Exadata Migration because we expect that some conventional indexes migrated are now obsolete and may be substituted by Storage Indexes. This is the scenario:

SQL> select segment_name,bytes/1024/1024 as mb from user_segments;

SEGMENT_NAME                           MB
------------------------------ ----------
SALES                                 304
SALES_ID_IDX                        79.75

This moderate sized table sales with a B*tree index on the column ID was moved to Exadata. We want to check whether it is a good idea to drop that index. Making it invisible therefore:

SQL> alter index SALES_ID_IDX invisible;

Index altered.

SQL> set timing on
SQL> select * from sales where id=4711;

        ID       FLAG PRODUCT                   CHANNEL_ID    CUST_ID AMOUNT_SOLD ORDER_DAT SHIP_DATE
---------- ---------- ------------------------- ---------- ---------- ----------- --------- ---------
      4711       4712 Oracle Enterprise Edition          1        711        5000 10-NOV-96 11-NOV-96

Elapsed: 00:00:00.19


Keep in mind that we may need multiple statements accessing the ID column in order to make the Cells aware that it is beneficial to maintain a Storage Index for that column. If after a while performance is fine for those statements, we may consider the index as obsolete. Should DML hit the table while the index is invisible, it is still maintained, so that we can make it visible again without any effort, should the index turn out to be necessary:

SQL> update sales set id=4711 where id=4710;

1 row updated.

SQL> commit;

Commit complete.

Let’s compare runtime of Smart Scan with Storage Index with Conventional Index Access:

SQL> select * from sales where id=4711;

        ID       FLAG PRODUCT                   CHANNEL_ID    CUST_ID AMOUNT_SOLD ORDER_DAT SHIP_DATE
---------- ---------- ------------------------- ---------- ---------- ----------- --------- ---------
      4711       4711 Oracle Enterprise Edition          0        710        5000 09-NOV-96 10-NOV-96
      4711       4712 Oracle Enterprise Edition          1        711        5000 10-NOV-96 11-NOV-96

Elapsed: 00:00:00.22
SQL> alter index sales_id_idx visible;

Index altered.

Elapsed: 00:00:00.02
SQL> select * from sales where id=4711;

        ID       FLAG PRODUCT                   CHANNEL_ID    CUST_ID AMOUNT_SOLD ORDER_DAT SHIP_DATE
---------- ---------- ------------------------- ---------- ---------- ----------- --------- ---------
      4711       4711 Oracle Enterprise Edition          0        710        5000 09-NOV-96 10-NOV-96
      4711       4712 Oracle Enterprise Edition          1        711        5000 10-NOV-96 11-NOV-96

Elapsed: 00:00:00.01

In this case, the Conventional Index is superior – so we keep it.

Conclusion: With making indexes invisible, we can easily check whether indexes are useful without having to drop (and in case recreate) them actually. While this may be of interest for „ordinary“ Oracle Databases already, it is particular a useful feature for Exadata where we expect some conventional indexes to become obsolete after a migration.

, ,

5 Kommentare

Exadata Part II: Storage Index

Another important feature of the Exadata Database Machine that makes it more than just a collection of High-End Hardware are Storage Indexes. Storage Indexes are constructed automatically inside of the memory of the Storage Servers when Database segments (like heap tables) are being scanned. The Storage Cells are divided into 1 MB chunks and inside of these chunks, we record the minimum and maximum values of the columns of the table that allocated space inside that chunk. It is very well possible that these Storage Indexes make many of your „conventional indexes“ obsolete. Look at following drastic example:

SQL> connect adam/adam
Connected.
SQL> select bytes/1024/1024/1024 as gb from user_segments where segment_name='SALES';

 GB
----------
 1.5

This demonstration table was created with the following statement

SQL> create table sales as
select
rownum as id,
rownum + 1 as flag,
'Oracle Enterprise Edition' as product,
mod(rownum,5) as channel_id,
mod(rownum,1000) as cust_id ,
5000 as amount_sold,
trunc(sysdate - 10000 + mod(rownum,10000))
as order_date,
trunc(sysdate - 9999 + mod(rownum,10000))
as ship_date
from dual connect by level<=2e7;

I needed to put WORKAREA_SIZE_POLICY=MANUAL and I specified a SORT_AREA_SIZE=1048576000 to be able to do the connect by level<=2e7, by the way. That is because of the limitation of 200M pga memory for a single session when using PGA_AGGREGATE_TARGET. Anyway, please notice that id has unique values and there is no conventional index present.

SQL> set timing on
SQL> select * from sales where id=4711;

 ID       FLAG PRODUCT                   CHANNEL_ID    CUST_ID AMOUNT_SOLD ORDER_DAT SHIP_DATE
---------- ---------- ------------------------- ---------- ---------- ----------- --------- ---------
 4711       4712 Oracle Enterprise Edition          1        711        5000 28-JUL-96 29-JUL-96

Elapsed: 00:00:00.11

I have just scanned 1.5 Gigabyte of data in fractions of a second! [Previously said in 11 Milliseconds, but that was of course a mistake – apart from SQL*Plus not being able to produce that exact timing, I just got the unit wrong] That was possible, because the Storage Cells knew that they could exclude 1499 chunks from scanning. 4711 was only possibly present inside one of the chunks. Please notice that that was not a Smart Scan in the first place – we did not offload much I/O on the Storage Layer. Instead, we avoided I/O almost completely:

SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name in
('cell physical IO interconnect bytes returned by smart scan',
'cell physical IO bytes saved by storage index');   2    3    4

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                    1483.67969
cell physical IO interconnect bytes returned by smart scan       .001525879

To show the difference, I will do a Smart Scan. I connect again to initialize v$mystat

SQL> connect adam/adam
Connected.
SQL> select count(*) from sales where channel_id=1;

 COUNT(*)
----------
 4000000

Elapsed: 00:00:02.04
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name in
('cell physical IO interconnect bytes returned by smart scan',
'cell physical IO bytes saved by storage index');  2    3    4

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                             0
cell physical IO interconnect bytes returned by smart scan       46.7903976

In this case, we scanned 1.5 Gig on the Storage Layer but delivered only 46 MB to the Database Layer. Now back to the Storage Index. Recently, I encountered the opinion that Storage Indexes would not work together with Bind Variables for some reason. That is not the case:

SQL> connect adam/adam
Connected.
SQL> vari n number
SQL> exec :n:=4711

PL/SQL procedure successfully completed.

SQL> set timing on
SQL> select * from sales where id=:n;

 ID       FLAG PRODUCT                   CHANNEL_ID    CUST_ID AMOUNT_SOLD ORDER_DAT SHIP_DATE
---------- ---------- ------------------------- ---------- ---------- ----------- --------- ---------
 4711       4712 Oracle Enterprise Edition          1        711        5000 28-JUL-96 29-JUL-96

Elapsed: 00:00:00.04
SQL>  select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name in
('cell physical IO interconnect bytes returned by smart scan',
'cell physical IO bytes saved by storage index');  2    3    4

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                    1483.67969
cell physical IO interconnect bytes returned by smart scan       .001525879

The Storage Index clearly could get used. Same with implicit datatype conversion:

SQL> vari c char(4)
SQL> exec :c:='4711'
SQL>  select * from sales where id=:c;

 ID       FLAG PRODUCT                   CHANNEL_ID    CUST_ID AMOUNT_SOLD ORDER_DAT SHIP_DATE
---------- ---------- ------------------------- ---------- ---------- ----------- --------- ---------
 4711       4712 Oracle Enterprise Edition          1        711        5000 28-JUL-96 29-JUL-96

Elapsed: 00:00:00.03
SQL>  select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name in
('cell physical IO interconnect bytes returned by smart scan',
'cell physical IO bytes saved by storage index');  2    3    4

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                    2967.35938
cell physical IO interconnect bytes returned by smart scan       .003051758

It would have surprised me a lot if Bind Variables would introduce any issues with Storage Indexes, because they are processed on the Database Layer. In fact, from the Optimizers viewpoint all these statements lead to the same execution plan: Full Table Scan. There could be an issue (but not exclusively related to Bind Variables), if we would have conventional indexes present here as well, because the Optimizer might favor them over the Storage Index. The Optimizer is not aware of Storage Indexes. Before I get another comment from Kevin suggesting to show joins with the example, let’s just do that:

SQL> select * from channels;

CHANNEL_ID CHANNEL_DESC
---------- ----------------------------------------
 0 Telesales
 1 Web
 2 Field
 3 Retail
 4 Promotion
 99 Special
SQL> create table customers as select distinct cust_id, 'John Doe' as cust_name from sales;

Table created.

Channels and Customers are now two dimension tables.

SQL> set timing on
SQL> select product,channel_desc,cust_name
from sales natural join channels
natural join customers
where id=4711;
 2    3    4
PRODUCT                   CHANNEL_DESC                             CUST_NAM
------------------------- ---------------------------------------- --------
Oracle Enterprise Edition Web                                      John Doe

Elapsed: 00:00:00.06
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name in
('cell physical IO interconnect bytes returned by smart scan',
'cell physical IO bytes saved by storage index');  2    3    4

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                    1483.67969
cell physical IO interconnect bytes returned by smart scan       .001495361

Storage Indexes can obviously be used in this case as well. When we select on columns with non unique values, it depends on how well clustered the values are, whether it is beneficial to use a Storage Index. The values of order_date column of the sales table are spread across the whole table, which makes the usage of a Storage Index impossible to speed up the query:

SQL> connect adam/adam
Connected.
SQL> set timing on
SQL> select count(*) from sales where order_date=to_date('11.03.2009','dd.mm.yyyy');

 COUNT(*)
----------
 2000

Elapsed: 00:00:02.30
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name in
('cell physical IO interconnect bytes returned by smart scan',
'cell physical IO bytes saved by storage index');
 2    3    4
NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                             0
cell physical IO interconnect bytes returned by smart scan       .672538757

When we create the same table, but ordered by order_date, this changes:

SQL> create table sales_order as select * from sales where 1=2;
Table created.
SQL> alter table sales_order nologging;
Table altered.
SQL> insert /*+ append */ into sales_order select * from sales order by order_date;

20000000 rows created.
SQL> commit;

Commit complete.

Now we need to access the new table multiple times to get the Storage Index built:

SQL> connect adam/adam
Connected.
SQL>  select count(*) from sales_order where order_date=to_date('11.03.2009','dd.mm.yyyy');

 COUNT(*)
----------
 2000

Elapsed: 00:00:02.02
SQL>  select count(*) from sales_order where order_date=to_date('11.03.2009','dd.mm.yyyy');

 COUNT(*)
----------
 2000

Elapsed: 00:00:01.73
SQL>  select count(*) from sales_order where order_date=to_date('11.03.2009','dd.mm.yyyy');

 COUNT(*)
----------
 2000

Elapsed: 00:00:00.03
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name in
('cell physical IO interconnect bytes returned by smart scan',
'cell physical IO bytes saved by storage index');  2    3    4

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                    1488.77344
cell physical IO interconnect bytes returned by smart scan       .506607056

Now the Storage has „learned“ that it is beneficial to have a Storage Index on the sales_order table. This index will be kept as long as the Cells are up. The Storage Index is created over all the table_columns, apparently. Please notice that I now access the column ship_date (strongly correlated to order_date) for the first time:

SQL> select count(*) from sales_order where ship_date=to_date('11.03.2009','dd.mm.yyyy');

 COUNT(*)
----------
 2000

Elapsed: 00:00:00.37
SQL> select name,value/1024/1024 as mb from v$statname natural join v$mystat
where name in
('cell physical IO interconnect bytes returned by smart scan',
'cell physical IO bytes saved by storage index');   2    3    4

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                    2765.77344
cell physical IO interconnect bytes returned by smart scan       .562469482

Conclusion: Storage Indexes are another Smart Storage feature of Exadata Database Machine. They may make many conventional indexes obsolete. Although they are created and maintained automatically by the Storage Servers, it is important to know their behavior. Especially, it is relevant that (non unique) column values that we later on query for ocurr ordered to be able to benefit from Storage Indexes.

, ,

25 Kommentare