Beiträge getaggt mit Performance Tuning
Automatic DOP in 11gR2
We have a probably very needful new feature introduced in 11g Release 2, related to parallel query: Automatically determined Degree of Parallelism (DOP). In earlier versions of the Oracle Database, we had to determine the DOP more or less manually, either with a parallel hint or by setting a parallel degree with alter table:
select /*+ parallel (sales,2) */ * from sales;
or
alter table sales parallel 2;
There was an automatic computation of the DOP available, derived from the simple formula CPU_COUNT * PARALLEL_THREADS_PER_CPU. That is what’s done internally if we would have said
select /*+ parallel (sales) */ * from sales;
or
alter table sales parallel;
The drawback with these approaches was always, that we could hardly be sure, whether the DOP is appropriate or not for the table, the statement and the hardware, we are running on. It was mostly a case of try & error. Especially problematic was the alter table approach, as this leads to the parallelization of each and every following select on those tables, even if totally inappropriate. A popular pitfall is the creation of tables with a parallel clause on OLTP-systems, because those tables inherit the parallel degree of their creation, which leads to parallel query for every statement accessing the tables afterwards – most likely not desirable for OLTP. In so far, parallelization was quite dumb (from a system-internal perspective) in versions before 11gR2. Now to the new feature:
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
PL/SQL Release 11.2.0.1.0 - Production
CORE 11.2.0.1.0 Production
TNS for Linux: Version 11.2.0.1.0 - Production
NLSRTL Version 11.2.0.1.0 - Production
SQL> select bytes/1024/1024 as mb from user_segments where segment_name='SALES';
MB
----------
563
SQL> select degree from user_tables where table_name='SALES';
DEGREE
----------------------------------------
1
For my tiny machine, this table is huge. I allow automatic determination of the DOP with the following new dynamic parameter:
SQL> alter session set parallel_degree_policy=auto; Session altered. SQL> set autotrace on explain SQL> select sum(amount_sold) from sales SUM(AMOUNT_SOLD) ---------------- 1571293299 Execution Plan ---------------------------------------------------------- Plan hash value: 3130505568 ----------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 5 | 10798 (1)| 00:02:10 | | 1 | SORT AGGREGATE | | 1 | 5 | | | | 2 | PX COORDINATOR | | | | | | | 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | 5 | | | | 4 | SORT AGGREGATE | | 1 | 5 | | | | 5 | PX BLOCK ITERATOR | | 14M| 70M| 10798 (1)| 00:02:10 | | 6 | TABLE ACCESS FULL| SALES | 14M| 70M| 10798 (1)| 00:02:10 | ----------------------------------------------------------------------------------- Note ----- - automatic DOP: Computed Degree of Parallelism is 2 because of degree limit
I have got a moderate DOP, most likely appropriate for my relatively weak hardware, but still speeding up the query on the relatively big table. I could always override the automatic DOP determination by specifying a parallel hint as in earlier versions. [See in my newer posting why the sentence here was crossed] Also, the parameter defaults to manual, so unless we change it, automatic parallelization will not take place. In order to demonstrate the quite intelligent computation of the DOP, compared to earlier versions, i will access another, much smaller table in the same session:
SQL> select count(*) from customers; COUNT(*) ---------- 30501 Execution Plan ---------------------------------------------------------- Plan hash value: 296924608 ------------------------------------------------------------------------ | Id | Operation | Name | Rows | Cost (%CPU)| Time | ------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 218 (0)| 00:00:03 | | 1 | SORT AGGREGATE | | 1 | | | | 2 | TABLE ACCESS FULL| CUSTOMERS | 30501 | 218 (0)| 00:00:03 | ------------------------------------------------------------------------ Note ----- - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold
As you can see, in this case, the system does not think that parallelization is appropriate for the select. So it is much smarter than the old force parallel query:
SQL> alter session set parallel_degree_policy=manual;
Session altered.
SQL> alter session force parallel query;
Session altered.
SQL> select count(*) from customers;
COUNT(*)
----------
30501
Execution Plan
----------------------------------------------------------
Plan hash value: 1221513835
----------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
----------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 121 (0)| 00:00:02 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | PX COORDINATOR | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | | |
| 4 | SORT AGGREGATE | | 1 | | |
| 5 | PX BLOCK ITERATOR | | 30501 | 121 (0)| 00:00:02 |
| 6 | TABLE ACCESS FULL| CUSTOMERS | 30501 | 121 (0)| 00:00:02 |
----------------------------------------------------------------------------
Also, we have a remedy now against inappropriate parallel degrees on tables:
SQL> alter session enable parallel query -- the default, no force;
Session altered.
SQL> alter table customers parallel -- would cause parallel query before 11gR2;
Table altered.
SQL> set autotrace on explain
SQL> select count(*) from customers;
COUNT(*)
----------
30501
Execution Plan
----------------------------------------------------------
Plan hash value: 296924608
------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 218 (0)| 00:00:03 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| CUSTOMERS | 30501 | 218 (0)| 00:00:03 |
------------------------------------------------------------------------
Note
-----
- automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold
Is count(col) better than count(*)?
One question, that surfaces sometimes in my courses (especially in Performance Tuning) is, whether there is a benefit in avoiding count(*) in favor of count(col). The short answer is: No.
Dealing with Oracle DBAs and Developers has tought me, though, that these guys usually are not satisfied with the short answer; they require some kind of proof – which is a good thing, in my view. So here we go:
SQL> select count(*) from sales; COUNT(*) ---------- 14701488 Elapsed: 00:00:33.30 SQL> select count(cust_id) from sales; COUNT(CUST_ID) -------------- 14701488 Elapsed: 00:00:04.06 SQL> select count(*) from sales; COUNT(*) ---------- 14701488 Elapsed: 00:00:34.49 SQL> select count(cust_id) from sales; COUNT(CUST_ID) -------------- 14701488 Elapsed: 00:00:04.20
I think the above observation is responsible for the appearing of the myth that count(col) is superior. In fact, right now it is faster (about 10 times!) as the count(*). I did the two selects twice to show that caching has not much to say here. Unfortunately, the second select with count(col) is faster, but not necessarily correct! Watch it:
SQL> update sales set cust_id=null where rownum<2;
1 row updated.
Elapsed: 00:00:00.23
SQL> commit;
Commit complete.
Elapsed: 00:00:00.01
SQL> select count(*) from sales;
COUNT(*)
----------
14701488
Elapsed: 00:00:34.84
SQL> select count(cust_id) from sales;
COUNT(CUST_ID)
--------------
14701487
Elapsed: 00:00:03.73
The count(cust_id) is still faster by far – but it shows a wrong result, should you be interested in the number of rows of the table You probably now see already the point I am after: There is an index on the cust_id column that is used for count(cust_id), but not for count(*). If the column counted does not contain any NULL values, the result is identical, but the runtime is faster. The origin of the myth! The point is: If you would declare the indexed column as NOT NULL, the optimizer would know that it can use the index for the count(*):
SQL> update sales set cust_id=1 where rownum<2;
1 row updated.
Elapsed: 00:00:00.10
SQL> commit;
Commit complete.
Elapsed: 00:00:00.10
SQL> alter table sales modify (cust_id NOT NULL);
Table altered.
Elapsed: 00:00:38.72
SQL> set autotrace on explain
SQL> select count(*) from sales;
COUNT(*)
----------
14701488
Elapsed: 00:00:03.81
Execution Plan
----------------------------------------------------------
Plan hash value: 2989616816
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8499 (1)| 00:01:42 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| SALES_CUST_ID_IDX | 14M| 8499 (1)| 00:01:42 |
-----------------------------------------------------------------------------------
SQL> alter table sales modify (cust_id NULL);
Table altered.
Elapsed: 00:00:00.27
SQL> select count(*) from sales;
COUNT(*)
----------
14701488
Elapsed: 00:00:36.00
Execution Plan
----------------------------------------------------------
Plan hash value: 1047182207
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 19398 (1)| 00:03:53 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| SALES | 14M| 19398 (1)| 00:03:53 |
--------------------------------------------------------------------
As a general rule of thumb, you should always declare columns NOT NULL if you know that NULL values can’t be in that columns, in order to make the optimizer aware of that important information also. By the way, if you do count(1) instead of count(*), the outcome is the same:
SQL> alter table sales modify (cust_id NOT NULL);
Table altered.
SQL> select count(1) from sales;
COUNT(1)
----------
14701488
Elapsed: 00:00:03.12
Execution Plan
----------------------------------------------------------
Plan hash value: 2989616816
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8499 (1)| 00:01:42 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| SALES_CUST_ID_IDX | 14M| 8499 (1)| 00:01:42 |
-----------------------------------------------------------------------------------
Addendum to SQL Profiles & Hints
In my posting Remedy for bad hints: SQL Profiles and in the PDF with the same content on my Downloads Page, I mentioned the ability of SQL Profiles to speed up statements dramatically, in a special case even if statements are coded with a „bad hint“, forcing the Optimizer to use a certain access path that is not (longer) good.
Although that is still true, I just got aware of an even simpler method to enable the Optimizer to ignore hints. An undocumented parameter can be used to achieve that, as Jonathan Lewis mentions in this article about handling hints. Of course, you should always be careful in using undocumented parameters and set them only with cooperation of Oracle Support.
Following demonstrates the effect of the parameter _OPTIMIZER_IGNORE_HINTS, introduced in 10g:
SQL> set autotrace on explain SQL> select sum(amount_sold) from sales where channel_id=1; SUM(AMOUNT_SOLD) ---------------- 1232.16 Execution Plan ---------------------------------------------------------- Plan hash value: 479458069 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 26 | 4 (0)| 00:00:01 | 1 | SORT AGGREGATE | | 1 | 26 | | | 2 | TABLE ACCESS BY INDEX ROWID| SALES | 1 | 26 | 4 (0)| 00:00:01 |* 3 | INDEX RANGE SCAN | SALES_CHID | 1 | | 3 (0)| 00:00:01 ----------------------------------------------------------------------------------------- SQL> select sum(amount_sold) from sales where channel_id=3; SUM(AMOUNT_SOLD) ---------------- 463000853 Execution Plan ---------------------------------------------------------- Plan hash value: 1047182207 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 26 | 9833 (2)| 00:01:58 | | 1 | SORT AGGREGATE | | 1 | 26 | | | |* 2 | TABLE ACCESS FULL| SALES | 4541K| 112M| 9833 (2)| 00:01:58 | ---------------------------------------------------------------------------- SQL> select /*+ index (sales,SALES_CHID) */ sum(amount_sold) from sales where channel_id=3; SUM(AMOUNT_SOLD) ---------------- 463000853 Execution Plan ---------------------------------------------------------- Plan hash value: 479458069 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 26 | 70278 (1)| 00:14:04 | 1 | SORT AGGREGATE | | 1 | 26 | | | 2 | TABLE ACCESS BY INDEX ROWID| SALES | 4541K| 112M| 70278 (1)| 00:14:04 |* 3 | INDEX RANGE SCAN | SALES_CHID | 4541K| | 8999 (1)| 00:01:48 -----------------------------------------------------------------------------------------
In the same way as already explained in my above mentioned posting, I force the Optimizer with a hint to use an index, which in this case is suboptimal. The underscore parameter lets the Optimizer ignore any hint. Session as DBA:
SQL> alter system set "_optimizer_ignore_hints"=true; System altered.
Same session as „application user“ as before:
SQL> select /*+ index (sales,SALES_CHID) */ sum(amount_sold) from sales where channel_id=3; SUM(AMOUNT_SOLD) ---------------- 463000853 Execution Plan ---------------------------------------------------------- Plan hash value: 1047182207 ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 26 | 9833 (2)| 00:01:58 | | 1 | SORT AGGREGATE | | 1 | 26 | | | |* 2 | TABLE ACCESS FULL| SALES | 4541K| 112M| 9833 (2)| 00:01:58 | ----------------------------------------------------------------------------
As you see, the exactly same valid hint as before is now ignored. You can set the undocumented parameter session- or systemwide. But be aware that it will make the Optimizer ignore any hint, useful or not. That is why you may still prefer to use the SQL Profile method. Also, SQL Profiles can do more than only enable the Optimizer to ignore hints of course.
