Quick Links

Re: Slow count(*) again...

From:	david(at)lang(dot)hm
To:	Joe Uhl <joeuhl(at)gmail(dot)com>
Cc:	Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Slow count(*) again...
Date:	2010-10-12 15:48:34
Message-ID:	alpine.DEB.2.00.1010120846410.21889@asgard.lang.hm
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

On Tue, 12 Oct 2010, Joe Uhl wrote:

>>> The biggest single problem with "select count(*)" is that it is
>>> seriously overused. People use that idiom to establish existence, which
>>> usually leads to a performance disaster in the application using it,
>>> unless the table has no more than few hundred records. SQL language, of
>>> which PostgreSQL offers an excellent implementation, offers [NOT]
>>> EXISTS clause since its inception in the Jurassic era. The problem is
>>> with the sequential scan, not with counting. I'd even go as far as to
>>> suggest that 99% instances of the "select count(*)" idiom are probably
>>> bad use of the SQL language.
>>
>> I agree, I have seen many very bad examples of using count(*). I will go so
>> far as to question the use of count(*) in my examples here. It there a
>> better
>> way to come up with a page list than using count(*)? What is the best
>> method
>> to make a page of results and a list of links to other pages of results? Am
>> I
>> barking up the wrong tree here?
> One way I have dealt with this on very large tables is to cache the count(*)
> at the application level (using memcached, terracotta, or something along
> those lines) and then increment that cache whenever you add a row to the
> relevant table. On application restart that cache is re-initialized with a
> regular old count(*). This approach works really well and all large systems
> in my experience need caching in front of the DB eventually. If you have a
> simpler system with say a single application/web server you can simply store
> the value in a variable, the specifics would depend on the language and
> framework you are using.

this works if you know ahead of time what the criteria of the search is
going to be.

so it will work for

select count(*) from table;

what this won't work for is cases wher the criteria of the search is
unpredictable, i.e.

ask the user for input

select count(*) from table where field=$input;

David Lang

> Another more all-DB approach is to create a statistics tables into which you
> place aggregated statistics rows (num deleted, num inserted, totals, etc) at
> an appropriate time interval in your code. So you have rows containing
> aggregated statistics information for the past and some tiny portion of the
> new data happening right now that hasn't yet been aggregated. Queries then
> look like a summation of the aggregated values in the statistics table plus a
> count(*) over just the newest portion of the data table and are generally
> very fast.
>
> Overall I have found that once things get big the layers of your app stack
> start to blend together and have to be combined in clever ways to keep speed
> up. Postgres is a beast but when you run into things it can't do well just
> find a way to cache it or make it work together with some other persistence
> tech to handle those cases.
>
>
>

In response to

Re: Slow count(*) again... at 2010-10-12 14:19:57 from Joe Uhl

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	david	2010-10-12 15:52:48	Re: Slow count(*) again...
Previous Message	Dan Harris	2010-10-12 15:39:19	Re: Slow count(*) again...

Browse pgsql-performance by date

	From	Date	Subject
Next Message	david	2010-10-12 15:52:48	Re: Slow count(*) again...
Previous Message	Dan Harris	2010-10-12 15:39:19	Re: Slow count(*) again...