Archiv für die Kategorie TOI
Is count(col) better than count(*)?
One question, that surfaces sometimes in my courses (especially in Performance Tuning) is, whether there is a benefit in avoiding count(*) in favor of count(col). The short answer is: No.
Dealing with Oracle DBAs and Developers has tought me, though, that these guys usually are not satisfied with the short answer; they require some kind of proof – which is a good thing, in my view. So here we go:
SQL> select count(*) from sales; COUNT(*) ---------- 14701488 Elapsed: 00:00:33.30 SQL> select count(cust_id) from sales; COUNT(CUST_ID) -------------- 14701488 Elapsed: 00:00:04.06 SQL> select count(*) from sales; COUNT(*) ---------- 14701488 Elapsed: 00:00:34.49 SQL> select count(cust_id) from sales; COUNT(CUST_ID) -------------- 14701488 Elapsed: 00:00:04.20
I think the above observation is responsible for the appearing of the myth that count(col) is superior. In fact, right now it is faster (about 10 times!) as the count(*). I did the two selects twice to show that caching has not much to say here. Unfortunately, the second select with count(col) is faster, but not necessarily correct! Watch it:
SQL> update sales set cust_id=null where rownum<2;
1 row updated.
Elapsed: 00:00:00.23
SQL> commit;
Commit complete.
Elapsed: 00:00:00.01
SQL> select count(*) from sales;
COUNT(*)
----------
14701488
Elapsed: 00:00:34.84
SQL> select count(cust_id) from sales;
COUNT(CUST_ID)
--------------
14701487
Elapsed: 00:00:03.73
The count(cust_id) is still faster by far – but it shows a wrong result, should you be interested in the number of rows of the table You probably now see already the point I am after: There is an index on the cust_id column that is used for count(cust_id), but not for count(*). If the column counted does not contain any NULL values, the result is identical, but the runtime is faster. The origin of the myth! The point is: If you would declare the indexed column as NOT NULL, the optimizer would know that it can use the index for the count(*):
SQL> update sales set cust_id=1 where rownum<2;
1 row updated.
Elapsed: 00:00:00.10
SQL> commit;
Commit complete.
Elapsed: 00:00:00.10
SQL> alter table sales modify (cust_id NOT NULL);
Table altered.
Elapsed: 00:00:38.72
SQL> set autotrace on explain
SQL> select count(*) from sales;
COUNT(*)
----------
14701488
Elapsed: 00:00:03.81
Execution Plan
----------------------------------------------------------
Plan hash value: 2989616816
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8499 (1)| 00:01:42 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| SALES_CUST_ID_IDX | 14M| 8499 (1)| 00:01:42 |
-----------------------------------------------------------------------------------
SQL> alter table sales modify (cust_id NULL);
Table altered.
Elapsed: 00:00:00.27
SQL> select count(*) from sales;
COUNT(*)
----------
14701488
Elapsed: 00:00:36.00
Execution Plan
----------------------------------------------------------
Plan hash value: 1047182207
--------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
--------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 19398 (1)| 00:03:53 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | TABLE ACCESS FULL| SALES | 14M| 19398 (1)| 00:03:53 |
--------------------------------------------------------------------
As a general rule of thumb, you should always declare columns NOT NULL if you know that NULL values can’t be in that columns, in order to make the optimizer aware of that important information also. By the way, if you do count(1) instead of count(*), the outcome is the same:
SQL> alter table sales modify (cust_id NOT NULL);
Table altered.
SQL> select count(1) from sales;
COUNT(1)
----------
14701488
Elapsed: 00:00:03.12
Execution Plan
----------------------------------------------------------
Plan hash value: 2989616816
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 8499 (1)| 00:01:42 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| SALES_CUST_ID_IDX | 14M| 8499 (1)| 00:01:42 |
-----------------------------------------------------------------------------------
Dropping a table during SELECT
In my last course, I had one student asking the question: „What happens if we do select on a table and someone else drops it during it?“ Of course, you cannot drop a table while an open transaction uses the table. But what for a select: will it go through? Although this question seems to be a little academic – which I responded immediately – there was no obvious answer, I was aware of. So I did setup a little test case for it.
I like to answer questions with practical examples if possible. The drop table has no effect at all on the select statement for small tables, of course, if done after the select started, because the select is just too fast to notice the drop. So my test table is more than 4 Gig in size and has got 117611904 rows in it. Surprisingly, there is a difference in the outcome, depending whether you drop the table with purge or not.
SQL> select count(*) from sales;
COUNT(*)
----------
117611904
Elapsed: 00:00:55.74
I have two sessions connected. In the first session, I do select. In the second session, I drop the table. Without dropping, the select needs about 1 minute to complete as shown above. Now the drop table without purge (basically just renaming the table in the data dictionary, enabling flashback to before drop since 10g):
SQL> select count(*) from sales;
COUNT(*)
----------
117611904
Elapsed: 00:01:01.25
SQL> drop table sales;
Table dropped.
The drop table statement completed much faster than the select above. Apparently, the select keeps running, fetching the rows from the „dropped“ table in the recyclebin.
Now I really drop the table:
SQL> flashback table sales to before drop;
Flashback complete.
SQL> select count(*) from sales;
select count(*) from sales
*
ERROR at line 1:
ORA-08103: object no longer exists
Elapsed: 00:00:05.17
SQL> drop table sales purge;
Table dropped.
This interrupts the select statement in the first session. Well, I doubt that there is much practical implication of this, but I found it interesting in spite of. Thanks to Mr. Heinz Schmitz for raising that question 🙂
„Total Recall“: Brief introduction into Flashback Data Archive
With Oracle Database 11g, we have a new Option available, called Total Recall. This option extends the possibility to do Flashback Query, introduced in 9i already.
We can now designate particular tables for being able to track back all the changes on them even after years. This option addresses especially legal requirements to keep history for certain data for several years. With Total Recall resp. Flashback Data Archive, this can be achieved quite comfortable and efficient. In order to demonstrate that, I create a demo user, giving him the DBA role (kind of „quick & dirty“ of course, but makes the demo easier):
SQL> grant dba to quaid identified by quaid; Grant succeeded. SQL> connect quaid/quaid Connected. SQL> create tablespace flatbs datafile '/u01/app/oracle/oradata/orcl/flatbs01.dbf' size 50m; Tablespace created.
This tablespace is just an ordinary tablespace, but I am going to use it to hold the Flashback Archive, created now:
SQL> create flashback archive fla_10y tablespace flatbs retention 10 year; Flashback archive created.
If I designate a table to fla_10y, I will be able to do Flashback Query on that table even after 10 years – provided I have the space to hold that history in the tablespace(s), associated to that Flashback Archive. Following creates two ordinary tables with ordinary rows in them:
SQL> create table norecall (id number, name varchar2(50)); Table created. SQL> create table totalrecall (id number, name varchar2(50)); Table created. SQL> insert into norecall values (1,'QUAID'); 1 row created. SQL> insert into totalrecall values (1,'QUAID'); 1 row created. SQL> commit; Commit complete.
I want to be able to keep the history of the second table for 10 years:
SQL> alter table totalrecall flashback archive fla_10y; Table altered. SQL> select to_char(systimestamp,'yyyy-mm-dd:hh24:mi:ss') from dual; TO_CHAR(SYSTIMESTAM ------------------- 2009-10-21:10:55:47
I will do DML on the two tables now. As you probably know, this leads to the creation of before images in the undo tablespace, which we can use to look into the past of the tables – but most likely not after 10 years, because the before images in the undo tablespace are getting overwritten somewhen.
SQL> delete from norecall;
1 row deleted.
SQL> commit;
Commit complete.
SQL> select * from norecall as of timestamp
to_timestamp('2009-10-21:10:55:47','yyyy-mm-dd:hh24:mi:ss');
ID NAME
---------- --------------------------------------------------
1 QUAID
SQL> delete from totalrecall;
1 row deleted.
SQL> commit;
Commit complete.
The select above is a Flashback Query, as it is possible since 9i. I will now make sure, that there are no before images left in the undo tablespace by creating a new one and dropping the old one.
SQL> create undo tablespace undonew datafile
'/u01/app/oracle/oradata/orcl/undonew01.dbf' size 50m;
Tablespace created.
SQL> alter system set undo_tablespace='UNDONEW';
System altered.
SQL> select * from norecall as of timestamp
to_timestamp('2009-10-21:10:55:47','yyyy-mm-dd:hh24:mi:ss');
ID NAME
---------- --------------------------------------------------
1 QUAID
As long as the old undo tablespace is still there, it may still get used for a conventional Flashback Query. But no longer after the drop tablespace. The before images of the history-tracked table, though, are saved into the Flashback Archive (getting compressed during the transfer), which would be the same, if the content of the old undo tablespace gets overwritten by new before images. That’s why I still can do Flashback Query with totalrecall:
SQL> connect / as sysdba Connected. SQL> shutdown immediate SQL> startup SQL> drop tablespace undotbs1 including contents and datafiles; Tablespace dropped.
SQL> connect quaid/quaid
Connected.
SQL> select * from norecall as of timestamp
to_timestamp('2009-10-21:10:55:47','yyyy-mm-dd:hh24:mi:ss');
select * from norecall as of timestamp
*
ERROR at line 1:
ORA-01555: snapshot too old: rollback segment number with name "" too small
SQL> select * from totalrecall as of timestamp
to_timestamp('2009-10-21:10:55:47','yyyy-mm-dd:hh24:mi:ss');
ID NAME
---------- --------------------------------------------------
1 QUAID
I will insert a new row into the ordinary table to demonstrate further differences between it and the history tracked table:
SQL> insert into norecall values (2,'John Doe');
1 row created.
SQL> commit;
Commit complete.
SQL> select to_char(sysdate,'yyyy-mm-dd:hh24:mi:ss') from dual;
TO_CHAR(SYSDATE,'YY
-------------------
2009-10-23:02:14:28
Upto this point of the story, the shown features are the same in 11g release 1 already available. Following are new features of 11g release 2: We are now able to do DDL statements on the history-tracked tables and can still do Flashback Query!
SQL> alter table norecall drop column name;
Table altered.
SQL> select * from norecall as of timestamp
2 to_timestamp('2009-10-23:02:14:28','yyyy-mm-dd:hh24:mi:ss');
select * from norecall as of timestamp
*
ERROR at line 1:
ORA-01466: unable to read data - table definition has changed
SQL> select to_char(sysdate,'yyyy-mm-dd:hh24:mi:ss') from dual;
TO_CHAR(SYSDATE,'YY
-------------------
2009-10-23:02:17:06
SQL> truncate table norecall;
Table truncated.
SQL> select * from norecall as of timestamp
2 to_timestamp('2009-10-23:02:17:06','yyyy-mm-dd:hh24:mi:ss');
select * from norecall as of timestamp
*
ERROR at line 1:
ORA-01466: unable to read data - table definition has changed
Above demonstrated that DDL on an ordinary table makes Flashback Query behind the DDL impossible. That was nothing new. New is:
SQL> alter table totalrecall drop column name; Table altered. SQL> select * from totalrecall as of timestamp to_timestamp('2009-10-21:10:55:47','yyyy-mm-dd:hh24:mi:ss'); ID NAME ---------- -------------------------------------------------- 1 QUAID SQL> truncate table totalrecall; Table truncated. SQL> select * from totalrecall as of timestamp to_timestamp('2009-10-21:10:55:47','yyyy-mm-dd:hh24:mi:ss'); ID NAME ---------- -------------------------------------------------- 1 QUAID
With 11g release 2, we can do Flashback Query behind DDL on tables in the Flashback Archive! In 11g release 1, DDL on history-tracked tables was prohibited, which was a show-stopper for many customers. One DDL, we still can’t get through is of course:
SQL> drop table totalrecall;
drop table totalrecall
*
ERROR at line 1:
ORA-55610: Invalid DDL statement on history-tracked table
