No Smart Scan with AUTO_SAMPLE_SIZE

During the Exadata Database Machine Admin Workshop that I deliver this week in Munich, I got the question whether the gathering of Optimizer Statistics could be done via Smart Scan on Exadata. I didn’t know that and that particular topic seems not to be covered much yet in the docs or other Blogs that I researched briefly. Therefore I came up with this little test case here:

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE    11.2.0.2.0      Production
TNS for Linux: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production
SQL> alter session set workarea_size_policy=manual;

Session altered.

SQL> alter session set sort_area_size=1000000000;

Session altered.

SQL> create table sales as
select
'Oracle Enterprise Edition' as product,
mod(rownum,5) as channel_id,
mod(rownum,1000) as cust_id ,
5000 as amount_sold,
to_date
('01.' || lpad(to_char(mod(rownum,12)+1),2,'0') || '.2010' ,'dd.mm.yyyy')
as time_id
from dual connect by level<=1e7;

Table created.

We go on with the recommended method to gather statistics and compare that with 30 % and 100 % sample size:

SQL> connect adam/adam
Connected.
SQL> exec dbms_stats.gather_table_stats('ADAM','SALES',estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE)

PL/SQL procedure successfully completed.

SQL> select name,value/1024/1024 as mb from v$mystat natural join v$statname
    where name='cell physical IO interconnect bytes returned by smart scan';

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan                0

SQL> exec dbms_stats.delete_table_stats('ADAM','SALES')

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('ADAM','SALES',estimate_percent=>30)

PL/SQL procedure successfully completed.

SQL> select name,value/1024/1024 as mb from v$mystat natural join v$statname
    where name='cell physical IO interconnect bytes returned by smart scan';

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan                0

SQL> exec dbms_stats.delete_table_stats('ADAM','SALES')

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats('ADAM','SALES',estimate_percent=>100)

PL/SQL procedure successfully completed.

SQL> select name,value/1024/1024 as mb from v$mystat natural join v$statname
    where name='cell physical IO interconnect bytes returned by smart scan';

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO interconnect bytes returned by smart scan       477.690498

Conclusion: It is possible to gather Optimizer Statistics via Smart Scan, but only with a 100 % sample size. Now the question is, whether AUTO_SAMPLE_SIZE will gather statistics faster although running without Smart Scan. The quality of the resulting stats will be very similar in both cases. If you are using ESTIMATE_PERCENT < 100 presently, consider to change to AUTO_SAMPLE_SIZE or 100 % on Exadata, as this may complete faster there with Smart Scan than your non-AUTO_SAMPLE_SIZE value does. As always: Don’t believe it, test it 🙂

 

,

  1. #1 von alexandruneda am Juli 4, 2012 - 19:31

    Excellent! Very useful, thank you!

  2. #2 von mwidlake am Juli 4, 2012 - 23:07

    Hi Uwe,

    As I understand it, you only get the benefit of the faster one-pass NDV (number of distinct values) algorithm if you leave the sample size at auto. So going to 100% and gaining the benefit of Smart scan may well be cancelled out by losing the NDV functionality.

    As you say, something to test!

  3. #3 von Uwe Hesse am Juli 4, 2012 - 23:15

    Alexandru, you’re welcome 🙂

  4. #4 von Uwe Hesse am Juli 4, 2012 - 23:20

    Martin, I suppose that AUTO_SAMPLE_SIZE will also be (much) faster than 100 % sample size even with Smart Scan on most production systems that run on Exadata. So the question is probably kind of academic, but I couldn’t resist to check it and share the answer 🙂

  5. #5 von mwidlake am Juli 4, 2012 - 23:57

    I was having an extended discussion with Julian Dyke (during a user group meeting as we both presented!) about the benefits or not of smart scan on exadata. It’s jolly fast but of course nothing gets put in the buffer cache by it. So maybe it is not so great in some circumstances. The more we understand, the better choices we can make 🙂 I’ve only been using it for 4 months so I claim no expertise.

  6. #6 von Bosson am Juli 11, 2012 - 13:41

    slt moi j’aimerais avoir une bourse d’étude afin de pouvoir une bonne formation concernant les bases de données sur “ ORACLE“

  7. #7 von Yury Pudovchenko am Oktober 24, 2012 - 10:02

    Hello, Uwe,

    0. Thanks for interesting point !
    1. I use the DB Control/Performancs/SQL Monitor to analyze smart scan. In my experiments I see the smart scan in DB Control for 10% and for 30% and for 100% and not for AUTO_SAMPLE_SIZE .
    2. The times I obtained:
    10% = 148 sec, cell offloading = -5.26%
    100% = 1434 sec, cell offloading = -22%
    AUTO_SAMPLE_SIZE = 181 sec , No cel offloading

  8. #8 von Yury Pudovchenko am Oktober 24, 2012 - 10:47

    Hello,

    Add the degree=>DBMS_STATS.AUTO_DEGREE to the estimate_percent=>DBMS_STATS.AUTO_SAMPLE_SIZE
    and you will have the much more faster time + SMART SCAN !

  9. #9 von Yury Pudovchenko am Oktober 24, 2012 - 11:39

    But if I run estimate_percent=>100,degree=>DBMS_STATS.AUTO_DEGREE then I don’t see smart scan !
    I detect smart scan as the column in SQL Monitor.

  10. #10 von Yury Pudovchenko am Oktober 24, 2012 - 11:48

    And – important thing – I see the negative smart scan % (less than 0), while gathering stats run in 1 thread ( no parallel), because of many sorting on the disk.

    Thanks Uwe for interesting problem ! 🙂

  11. #11 von Uwe Hesse am Oktober 29, 2012 - 15:33

    Yury, thank YOU for taking the effort to share your findings here 🙂

  12. #12 von Rich Headrick am Februar 2, 2013 - 01:09

    I just logged a bug with development on this issue. It is not the expected behavior.

  13. #13 von Uwe Hesse am Februar 4, 2013 - 09:26

    Thank you for stopping by, Rich, and for logging that bug!

  1. Exadata « Oracle Scratchpad

Hinterlasse einen Kommentar

Diese Seite verwendet Akismet, um Spam zu reduzieren. Erfahre, wie deine Kommentardaten verarbeitet werden..