Is count(col) better than count(*)?

One question, that surfaces sometimes in my courses (especially in Performance Tuning) is, whether there is a benefit in avoiding count(*) in favor of count(col). The short answer is: No.

Dealing with Oracle DBAs and Developers has tought me, though, that these guys usually are not satisfied with the short answer; they require some kind of proof – which is a good thing, in my view. So here we go:

SQL> select count(*) from sales;

 COUNT(*)
----------
 14701488

Elapsed: 00:00:33.30
SQL> select count(cust_id) from sales;

COUNT(CUST_ID)
--------------
 14701488

Elapsed: 00:00:04.06
SQL> select count(*) from sales;

 COUNT(*)
----------
 14701488

Elapsed: 00:00:34.49
SQL> select count(cust_id) from sales;

COUNT(CUST_ID)
--------------
 14701488

Elapsed: 00:00:04.20

I think the above observation is responsible for the appearing of the myth that count(col) is superior. In fact, right now it is faster (about 10 times!) as the count(*). I did the two selects twice to show that caching has not much to say here. Unfortunately, the second select with count(col) is faster, but not necessarily correct! Watch it:

SQL> update sales set cust_id=null where rownum<2;
1 row updated.
Elapsed: 00:00:00.23
SQL> commit;
Commit complete.
Elapsed: 00:00:00.01
SQL> select count(*) from sales;
 COUNT(*)
----------
 14701488
Elapsed: 00:00:34.84
SQL> select count(cust_id) from sales;
COUNT(CUST_ID)
--------------
 14701487
Elapsed: 00:00:03.73

The count(cust_id) is still faster by far – but it shows a wrong result, should you be interested in the number of rows of the table  You probably now see already the point I am after: There is an index on the cust_id column that is used for count(cust_id), but not for count(*). If the column counted does not contain any NULL values, the result is identical, but the runtime is faster. The origin of the myth! The point is: If you would declare the indexed column as NOT NULL, the optimizer would know that it can use the index for the count(*):

SQL> update sales set cust_id=1  where rownum<2;
1 row updated.
Elapsed: 00:00:00.10
SQL> commit;
Commit complete.
Elapsed: 00:00:00.10
SQL> alter table sales modify (cust_id NOT NULL);
Table altered.
Elapsed: 00:00:38.72
SQL> set autotrace on explain
SQL> select count(*) from sales;
 COUNT(*)
----------
 14701488
Elapsed: 00:00:03.81
Execution Plan
----------------------------------------------------------
Plan hash value: 2989616816
-----------------------------------------------------------------------------------
| Id  | Operation             | Name              | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                   |     1 |  8499   (1)| 00:01:42 |
|   1 |  SORT AGGREGATE       |                   |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| SALES_CUST_ID_IDX |    14M|  8499   (1)| 00:01:42 |
-----------------------------------------------------------------------------------
SQL> alter table sales modify (cust_id NULL);
Table altered.
Elapsed: 00:00:00.27
SQL> select count(*) from sales;
 COUNT(*)
----------
 14701488
Elapsed: 00:00:36.00
Execution Plan
----------------------------------------------------------
Plan hash value: 1047182207
--------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Cost (%CPU)| Time     |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |     1 | 19398   (1)| 00:03:53 |
|   1 |  SORT AGGREGATE    |       |     1 |            |          |
|   2 |   TABLE ACCESS FULL| SALES |    14M| 19398   (1)| 00:03:53 |
--------------------------------------------------------------------

As a general rule of thumb, you should always declare columns NOT NULL if you know that NULL values can’t be in that columns, in order to make the optimizer aware of that important information also. By the way, if you do count(1) instead of count(*), the outcome is the same:

SQL>  alter table sales modify (cust_id NOT NULL);
Table altered.
SQL> select count(1) from sales;
 COUNT(1)
----------
 14701488
Elapsed: 00:00:03.12
Execution Plan
----------------------------------------------------------
Plan hash value: 2989616816
-----------------------------------------------------------------------------------
| Id  | Operation             | Name              | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                   |     1 |  8499   (1)| 00:01:42 |
|   1 |  SORT AGGREGATE       |                   |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| SALES_CUST_ID_IDX |    14M|  8499   (1)| 00:01:42 |
-----------------------------------------------------------------------------------

  1. #1 von Laurent Schneider am Oktober 29, 2009 - 18:03

    the optimizer is getting quite clever… even if you do a select count(id) it will do a full table scan if this is faster 😉

  2. #2 von Uwe Hesse am Oktober 29, 2009 - 18:08

    Hi Laurent, you are right – although this will probably be a rare case: FTS causing less cost than INDEX FAST FULL SCAN 🙂

  3. #3 von dombrooks am Oktober 29, 2009 - 18:41

    > The count(cust_id) is still faster by far – but it shows a wrong result

    Not wrong, surely. It’s just answering a different question than the one you’re suggesting that people think it’s answering.

  4. #4 von Uwe Hesse am Oktober 29, 2009 - 20:39

    @dombrooks: If you look at the whole sentence, it goes:
    „The count(cust_id) is still faster by far – but it shows a wrong result, should you be interested in the number of rows of the table“
    So I made that quite clear, don’t you think? Of course is the counted number of the cust_ids correct.

  5. #5 von Laurent Schneider am Oktober 29, 2009 - 23:47

    well, sometimes the index is bigger than the table, but true it is fairly rare 🙂

  6. #6 von Noons am Oktober 30, 2009 - 00:52

    These things keep changing from patch level to patch level, we always have to test what is the latest.

    It’s not too long ago that
    select count(*) from …
    was marginally faster (a couple less logical reads) than
    select count(1) from …

    Yet in 10.2.0.3 I get exactly the same stats (set autotrace on statistics) with both.
    It’s a RPITA but unfortunately Oracle keeps making small silent changes that can affect how things work.
    I guess it keeps their consultants employed, but it’s still a pain…

  7. #7 von Uwe Hesse am Oktober 30, 2009 - 08:23

    @Noons: I agree with you regarding the necessity of tests if new releases come out – that is almost unavoidable, although we do in fact communicate a lot about the changes every time (every new release has a new features guide). But the guys, responsible for the documentation are only human beings also, so they can and will not spot every little changed aspect of the new release. In 11g, there are some enhancements in order to adress these kinds of problems: SQL Performance Analyzer, SQL Plan Baselines and Database Replay all help to make the change to the new release less cumbersome.

  8. #8 von Laurent Schneider am Oktober 30, 2009 - 08:34

    count(1) used to be very slow in antic version, because the expression 1 IS NOT NULL was evaluated for each row…

    a bit like count(f()) where f is not deterministic

    ====
    if you have a big primary key and a small non unique index, Oracle may choses the small index even if you say COUNT(PK).

    Recently I had the following issue :

    update t set x=:0 where c1=:1 and c2=:2 and c3+:3

    with c1,c2,c3 an unique primakey index.

    so far so good.

    but if the first query run is doing something like

    update t set x=100 where c1=200 and c2=300 and c3=400

    and there is a non unique index on c3 and something like 20 values out of 1000000 for c3=400, the query was using the non-unique index instead of pk.

    due to bind mechanism in 10g, next time the query runs, it will reuse the small index, and the small index return 50000 rows then it is taking a huge time.

    Annoying. My solution was to use the NO_INDEX hint to exclude this „smaller“ index.

  9. #9 von Uwe Hesse am Oktober 30, 2009 - 18:05

    Laurent, thanks for sharing that experience! In 11g those kind of problems will probably be a little less annoying thanks to Adaptive Cursor Sharing, I suppose.

  10. #10 von Javid am Juni 10, 2010 - 06:00

    Very nice article. Thanks.

  11. #11 von Uwe Hesse am Juni 10, 2010 - 07:40

    Javid, you’re welcome 🙂

Kommentar verfassen

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

WordPress.com-Logo

Du kommentierst mit deinem WordPress.com-Konto. Abmelden /  Ändern )

Facebook-Foto

Du kommentierst mit deinem Facebook-Konto. Abmelden /  Ändern )

Verbinde mit %s

Diese Seite verwendet Akismet, um Spam zu reduzieren. Erfahre, wie deine Kommentardaten verarbeitet werden..