Partition-Pruning: Do & Don’t

do-dont

This is about how to write SQL in a way that supports Partition-Pruning – and what should be avoided. The playing field looks as follows:

SQL> select * from v$version;
BANNER
 --------------------------------------------------------------------------------
 Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
 PL/SQL Release 11.2.0.2.0 - Production
 CORE    11.2.0.2.0      Production
 TNS for Linux: Version 11.2.0.2.0 - Production
 NLSRTL Version 11.2.0.2.0 - Production

SQL> select table_name,partitioning_type,partition_count from user_part_tables where table_name='SALES_YEAR';
TABLE_NAME                     PARTITION PARTITION_COUNT
 ------------------------------ --------- ---------------
 SALES_YEAR                     RANGE             1048575

SQL> select segment_name,partition_name,sum(bytes)/1024/1024 as mb
 from user_segments where segment_name='SALES_YEAR'
 group by rollup (segment_name,partition_name)
 order by 1,2; 

SEGMENT_NAME                   PARTITION_NAME                         MB
 ------------------------------ ------------------------------ ----------
 SALES_YEAR                     P1                                     16
 SALES_YEAR                     SYS_P181                               32
 SALES_YEAR                     SYS_P182                               32
 SALES_YEAR                     SYS_P183                               32
 SALES_YEAR                     SYS_P184                               32
 SALES_YEAR                     SYS_P185                               32
 SALES_YEAR                     SYS_P186                               32
 SALES_YEAR                     SYS_P187                               32
 SALES_YEAR                     SYS_P188                               32
 SALES_YEAR                     SYS_P189                               32
 SALES_YEAR                     SYS_P190                               32
 SALES_YEAR                     SYS_P191                               32
 SALES_YEAR                     SYS_P192                               32
 SALES_YEAR                     SYS_P193                               32
 SALES_YEAR                     SYS_P194                               32
 SALES_YEAR                     SYS_P195                               32
 SALES_YEAR                     SYS_P196                               32
 SALES_YEAR                     SYS_P197                               32
 SALES_YEAR                     SYS_P198                               32
 SALES_YEAR                     SYS_P199                               32
 SALES_YEAR                     SYS_P200                               32
 SALES_YEAR                     SYS_P201                               32
 SALES_YEAR                     SYS_P202                               32
 SALES_YEAR                     SYS_P203                               32
 SALES_YEAR                     SYS_P204                               32
 SALES_YEAR                     SYS_P205                               32
 SALES_YEAR                     SYS_P206                               32
 SALES_YEAR                     SYS_P207                               24
 SALES_YEAR                                                           872
 872
30 rows selected.

SQL> select to_char(order_date,'yyyy'),count(*) from sales_year group by to_char(order_date,'yyyy') order by 1;


TO_C   COUNT(*)
 ---- ----------
 1985     158000
 1986     365000
 1987     365000
 1988     366000
 1989     365000
 1990     365000
 1991     365000
 1992     366000
 1993     365000
 1994     365000
 1995     365000
 1996     366000
 1997     365000
 1998     365000
 1999     365000
 2000     366000
 2001     365000
 2002     365000
 2003     365000
 2004     366000
 2005     365000
 2006     365000
 2007     365000
 2008     366000
 2009     365000
 2010     365000
 2011     365000
 2012     346000
28 rows selected.

My moderately sized table is Interval partitioned (therefore PARTITION_COUNT in USER_PART_TABLES shows the possible maximum number) by the year on ORDER_DATE with 28 partitions. Now imagine we want to have the summarized AMOUNT_SOLD of the year 2011. What about this statement?

SQL> set timing on
SQL> select sum(amount_sold) from sales_year where to_char(order_date,'yyyy')='2011';

SUM(AMOUNT_SOLD)
----------------
      1825000000

Elapsed: 00:00:05.15
SQL> select plan_table_output from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------
SQL_ID  cv54q4mt7ajjr, child number 0
-------------------------------------
select sum(amount_sold) from sales_year where
to_char(order_date,'yyyy')='2011'

Plan hash value: 3345868052

---------------------------------------------------------------------------------------------------
| Id  | Operation            | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |            |       |       | 24384 (100)|          |       |       |
|   1 |  SORT AGGREGATE      |            |     1 |    22 |            |          |       |       |
|   2 |   PARTITION RANGE ALL|            |   287K|  6181K| 24384   (2)| 00:00:07 |     1 |1048575|
|*  3 |    TABLE ACCESS FULL | SALES_YEAR |   287K|  6181K| 24384   (2)| 00:00:07 |     1 |1048575|
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(TO_CHAR(INTERNAL_FUNCTION("ORDER_DATE"),'yyyy')='2011')

It produces the required result, but using a Full Table Scan across all partitions. Very much better instead:

SQL> select sum(amount_sold) from sales_year where order_date between to_date('01.01.2011','dd.mm.yyyy') and to_date('31.12.2011','dd.mm.yyyy');

SUM(AMOUNT_SOLD)
----------------
      1825000000

Elapsed: 00:00:00.11
SQL> select plan_table_output from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------
SQL_ID  6rwm3z7rhgmd6, child number 0
-------------------------------------
select sum(amount_sold) from sales_year where order_date between
to_date('01.01.2011','dd.mm.yyyy') and
to_date('31.12.2011','dd.mm.yyyy')

Plan hash value: 767904852

------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |            |       |       |  1033 (100)|          |       |       |
|   1 |  SORT AGGREGATE         |            |     1 |    22 |            |          |       |       |
|   2 |   PARTITION RANGE SINGLE|            |   378K|  8128K|  1033  (16)| 00:00:01 |    27 |    27 |
|*  3 |    TABLE ACCESS FULL    | SALES_YEAR |   378K|  8128K|  1033  (16)| 00:00:01 |    27 |    27 |
------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter(("ORDER_DATE">=TO_DATE(' 2011-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              "ORDER_DATE"<=TO_DATE(' 2011-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))

The same result but much faster, scanning only one partition!

Conclusion: It is quite important to have no functions around the partition key in the WHERE-clause here. Personally, the first SQL looks easier to me and has less coding, but it is obviously not as good as the second. Might be worth to spend some time thinking and adding some more characters to the code to make Partition-Pruning possible. Don’t believe it, test it! With some big enough tables, I mean 🙂

partitioning, Performance Tuning

Dieser Eintrag wurde erstellt am Dezember 13, 2012, 15:58 und wurde abgelegt unter TOI. Du kannst die Antworten auf diesen Beitrag über RSS 2.0 verfolgen. Du kannst eine Antwort schreiben oder einen Trackback von deiner eigenen Seite schicken.

#1 von Matthias Rogel am Dezember 13, 2012 - 18:15

Hi Uwe,

date literals are useful, why does nobody uses them ?

select sum(amount_sold) from sales_year where order_date between
date’2011-01-01′ and date’2011-12-31′

looks easier than your first *and* your second sql ( and results in partition prunning ), no need to use a type-conversion at all !

Regards
Matthias
#2 von Uwe Hesse am Dezember 13, 2012 - 18:43

I am sorry, Matthias, but your code above fails. There is no such function like date’2011-01-01′ apparently. Should you mean the mere literal ’2011-12-31′ ? This would require implicit type conversion and is very unreliable here, because how is the engine supposed to know what is month and what is day? Probably I just miss your point…
#3 von Christian Antognini am Dezember 13, 2012 - 19:59

Hi Uwe

Matthias is referring to Datetime Literals, i.e. http://docs.oracle.com/cd/E14072_01/server.112/e10592/sql_elements003.htm#BABGIGCJ .

Best,
Chris
#4 von Matthias Rogel am Dezember 13, 2012 - 23:07

Hi Uwe,

exactly,

sokrates@11.2 > select date’2012-12-13′ from dual;

DATE’2012
———
13-DEC-12

1 row selected.

sokrates@11.2 > select timestamp’2012-12-13 22:01:00′ „now“ from dual;

now
————————————————————————–
13-DEC-12 10.01.00.000000000 PM

1 row selected.

work quite good for me.

however, when it comes to partition pruning, you have to be very careful sometimes.
See

sokrates@11.2 > create table hourly_partitioned(d)
2 partition by range(d)
3 interval(interval ‚1‘ hour)
4 (partition p0 values less than (timestamp’2012-12-13 22:00:00′))
5 as
6 select sysdate+level/86400 from dual connect by level select count(*) from user_tab_partitions where table_name=’HOURLY_PARTITIONED‘;

COUNT(*)
———-
29

1 row selected.

sokrates@11.2 > exec dbms_stats.gather_table_stats(null, ‚hourly_partitioned‘)

PL/SQL procedure successfully completed.

sokrates@11.2 > set autotr traceonly explain
sokrates@11.2 > REM now compare the plans of the following 4 selects:
sokrates@11.2 > select * from hourly_partitioned
2 where d between timestamp’2012-12-14 01:00:00′ and timestamp’2012-12-14 01:30:00′;

Execution Plan
———————————————————-
Plan hash value: 3527284174

———————————————————————————————————-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
———————————————————————————————————-
| 0 | SELECT STATEMENT | | 250 | 2000 | 38 (8)| 00:00:01 | | |
| 1 | PARTITION RANGE ALL| | 250 | 2000 | 38 (8)| 00:00:01 | 1 |1048575|
|* 2 | TABLE ACCESS FULL | HOURLY_PARTITIONED | 250 | 2000 | 38 (8)| 00:00:01 | 1 |1048575|
———————————————————————————————————-

Predicate Information (identified by operation id):
—————————————————

2 – filter(INTERNAL_FUNCTION(„D“)>=TIMESTAMP‘ 2012-12-14 01:00:00.000000000′ AND
INTERNAL_FUNCTION(„D“) select * from hourly_partitioned
2 where d between cast(timestamp’2012-12-14 01:00:00′ as date) and cast(timestamp’2012-12-14 01:30:00′ as date);

Execution Plan
———————————————————-
Plan hash value: 2219087052

—————————————————————————————————————-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
—————————————————————————————————————-
| 0 | SELECT STATEMENT | | 1802 | 14416 | 40 (13)| 00:00:01 | | |
|* 1 | FILTER | | | | | | | |
| 2 | PARTITION RANGE ITERATOR| | 1802 | 14416 | 40 (13)| 00:00:01 | KEY | KEY |
|* 3 | TABLE ACCESS FULL | HOURLY_PARTITIONED | 1802 | 14416 | 40 (13)| 00:00:01 | KEY | KEY |
—————————————————————————————————————-

Predicate Information (identified by operation id):
—————————————————

1 – filter(CAST(TIMESTAMP‘ 2012-12-14 01:00:00.000000000′ AS date)<=CAST(TIMESTAMP' 2012-12-14
01:30:00.000000000' AS date))
3 – filter("D"=CAST(TIMESTAMP‘
2012-12-14 01:00:00.000000000′ AS date))

REM pruning at runtime

sokrates@11.2 > select * from hourly_partitioned
2 where d between date’2012-12-14′ + 1/24 and date’2012-12-14′ + (1+30/60)/24;

Execution Plan
———————————————————-
Plan hash value: 281903001

————————————————————————————————————-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
————————————————————————————————————-
| 0 | SELECT STATEMENT | | 1801 | 14408 | 3 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE SINGLE| | 1801 | 14408 | 3 (0)| 00:00:01 | 5 | 5 |
|* 2 | TABLE ACCESS FULL | HOURLY_PARTITIONED | 1801 | 14408 | 3 (0)| 00:00:01 | 5 | 5 |
————————————————————————————————————-

Predicate Information (identified by operation id):
—————————————————

2 – filter(„D“=TO_DATE(‚
2012-12-14 01:00:00′, ’syyyy-mm-dd hh24:mi:ss‘))

REM pruning at compiletime

sokrates@11.2 > select * from hourly_partitioned
2 where d between to_date(‚2012-12-14 01:00:00‘, ‚yyyy-mm-dd HH24:MI:SS‘) and to_date(‚2012-12-14 01:30:00‘, ‚yyyy-mm-dd HH24:MI:SS‘);

Execution Plan
———————————————————-
Plan hash value: 281903001

————————————————————————————————————-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
————————————————————————————————————-
| 0 | SELECT STATEMENT | | 1801 | 14408 | 3 (0)| 00:00:01 | | |
| 1 | PARTITION RANGE SINGLE| | 1801 | 14408 | 3 (0)| 00:00:01 | 5 | 5 |
|* 2 | TABLE ACCESS FULL | HOURLY_PARTITIONED | 1801 | 14408 | 3 (0)| 00:00:01 | 5 | 5 |
————————————————————————————————————-

Predicate Information (identified by operation id):
—————————————————

2 – filter(„D“=TO_DATE(‚
2012-12-14 01:00:00′, ’syyyy-mm-dd hh24:mi:ss‘))

REM pruning at compiletime
#5 von Uwe Hesse am Dezember 14, 2012 - 09:06

Chris, thank you for showing the right place in the documentation where date literals can be found. I was looking into the SQL Reference http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions.htm#i1482196 and did not see any function named ‚date‘ there
#6 von Uwe Hesse am Dezember 14, 2012 - 09:20

Matthias,
Thank you for your comment that brought Datetime Literals to my attention!

When I briefly checked your suggestion yesterday, I did something like

SQL> create table t (d date);

Table created.

SQL> insert into t values (to_date(‚13.12.2012′,’dd.mm.yyyy‘));

1 row created.

SQL> commit;

Commit complete.

SQL> select * from t where d=date’13.12.2012′;
select * from t where d=date’13.12.2012′
*
ERROR at line 1:
ORA-01861: literal does not match format string

That was of course my mistake, because according to the documentation: „The ANSI date literal contains no time portion, and must be specified in the format ‚YYYY-MM-DD‘.“

Which also explains how the engine can tell which is day and which is month: There is just no alternative.

Probably the answer to your question „date literals are useful, why does nobody uses them ?“
is: Almost nobody knows about them – at least I didn’t.

This also taught me another thing: It is not a good idea to answer (non-trivial) comments at the airport, waiting to get boarded 🙂
#7 von Matthias Rogel am Dezember 14, 2012 - 10:12

Uwe,

‚….Probably the answer to your question “date literals are useful, why does nobody uses them ?”
is: Almost nobody knows about them – at least I didn’t. …‘

True, I found them only recently, though they are available since Version 9 and part of Ansi92.
(I think)

I find them extremely useful, cause they can decrease the headaches about datatype conversions in very many cases.

However, it is a bit annoying, that a date-literal
date’2012-12-14′
does not support a time-portion, so you can express the date – take the example
to_date(‚2012-12-14 09:01:53‘, ‚yyyy-mm-dd HH24:MI:SS‘)
–
with literals only with either

(a) date’2012-12-14′ + (9 + ((1 + 53 / 60) / 60)) / 24 or
(b) cast(timestamp’2012-12-14 09:01:53′ as date)

(a) is not very handy when it comes to seconds
(b) is better, but must be explicitly casted to a DATE !! (don’t forget this, it will result in partitioning pruning not to occur for example !!! )

By the way, thanks for your great side, a great resource of cool stuff, one of my favourites concerning DataGuard !

Matthias
#8 von Uwe Hesse am Dezember 14, 2012 - 10:39

Hi Matthias, yes, they look useful. But I suppose I would not use something like a) and b) because in this case, to_date or to_timestamp are not more difficult to use and at the same time more familiar to me. And thank you for the nice words about my Blog! Very much appreciated 🙂
#9 von TIROUMALAI am Dezember 14, 2012 - 11:35

Uwe,

Through this article, I learnt on partitioning; your approach is really nice and well explained.
Thanking you.

Tirou
#10 von Matthias Rogel am Dezember 17, 2012 - 17:34

select date’2012-12-14′ + interval ‚9:1:53‘ hour to second a from dual;
#11 von Uwe Hesse am Dezember 19, 2012 - 09:33

TIROUMALAI, you’re welcome 🙂
Matthias, yes, I like that better than the other format. Thank you for sharing!
#12 von masdoell am September 3, 2014 - 10:41

thanks, i also discovered this after some trial research. I usually do this to replace query which search specific date:
select * from sales_year where order_date between to_date(:inpDate,’dd.mm.yyyy‘) and to_date(:inpDate,’dd.mm.yyyy‘)+1;
do you have any suggestion to replace this solution so i dont have to use two variables for bindvar?
#13 von ctripblog am Mai 9, 2016 - 15:00

HI!, in your SQL „select sum(amount_sold) from sales_year where order_date between to_date(‚01.01.2011′,’dd.mm.yyyy‘) and to_date(‚31.12.2011′,’dd.mm.yyyy‘);“

Are the records that have dates like these: 31/12/2011 14:00:00 included? or just ignored cos it evaluates on 31/12/2011 00:00:00 ?

thanks!