Re: count(*) slow on large tables

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Christopher Browne <cbbrowne(at)libertyrms(dot)info>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: count(*) slow on large tables
Date: 2003-10-03 05:13:08
Message-ID: 87brsyrjiz.fsf@stark.dyndns.tv
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance


Christopher Browne <cbbrowne(at)libertyrms(dot)info> writes:

> It would be very hairy to implement it correctly, and all this would
> cover is the single case of "SELECT COUNT(*) FROM SOME_TABLE;"
>
> If you had a single WHERE clause attached, you would have to revert to
> walking through the tuples looking for the ones that are live and
> committed, which is true for any DBMS.

Well it would be handy for a few other cases as well.

1 It would be useful for the case where you have a partial index with a
matching where clause. The optimizer already considers using such indexes
but it has to pay the cost of the tuple lookup, which is substantial.

2 It would be useful for the very common queries of the form
WHERE x IN (select id from foo where some_indexed_expression)

(Or the various equivalent forms including outer joins that test to see if
the matching record was found and don't retrieve any other columns in the
select list.)

3 It would be useful for many-many relationships where the intermediate table
has only the two primary key columns being joined. If you create a
multi-column index on the two columns it shouldn't need to look up the
tuple. This would be effectively be nearly equivalent to an "index organized
table".

4 It would be useful for just about all the referential integrity queries...

I don't mean to say this is definitely a good thing. The tradeoff in
complexity and time to maintain the index pages would be large. But don't
dismiss it as purely a count(*) optimization hack.

I know Oracle is capable of it and it can speed up your query a lot when you
remove that last unnecessary column from a join table allowing oracle to skip
the step of reading the table.

--
greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shridhar Daithankar 2003-10-03 06:29:02 Re: count(*) slow on large tables
Previous Message Greg Stark 2003-10-03 04:50:12 Re: minor view creation weirdness

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2003-10-03 05:27:08 Re: TPC-R benchmarks
Previous Message Dror Matalon 2003-10-03 04:27:54 Re: count(*) slow on large tables