From: | david(at)lang(dot)hm |
---|---|
To: | Joe Uhl <joeuhl(at)gmail(dot)com> |
Cc: | Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Slow count(*) again... |
Date: | 2010-10-12 15:48:34 |
Message-ID: | alpine.DEB.2.00.1010120846410.21889@asgard.lang.hm |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
On Tue, 12 Oct 2010, Joe Uhl wrote:
>>> The biggest single problem with "select count(*)" is that it is
>>> seriously overused. People use that idiom to establish existence, which
>>> usually leads to a performance disaster in the application using it,
>>> unless the table has no more than few hundred records. SQL language, of
>>> which PostgreSQL offers an excellent implementation, offers [NOT]
>>> EXISTS clause since its inception in the Jurassic era. The problem is
>>> with the sequential scan, not with counting. I'd even go as far as to
>>> suggest that 99% instances of the "select count(*)" idiom are probably
>>> bad use of the SQL language.
>>
>> I agree, I have seen many very bad examples of using count(*). I will go so
>> far as to question the use of count(*) in my examples here. It there a
>> better
>> way to come up with a page list than using count(*)? What is the best
>> method
>> to make a page of results and a list of links to other pages of results? Am
>> I
>> barking up the wrong tree here?
> One way I have dealt with this on very large tables is to cache the count(*)
> at the application level (using memcached, terracotta, or something along
> those lines) and then increment that cache whenever you add a row to the
> relevant table. On application restart that cache is re-initialized with a
> regular old count(*). This approach works really well and all large systems
> in my experience need caching in front of the DB eventually. If you have a
> simpler system with say a single application/web server you can simply store
> the value in a variable, the specifics would depend on the language and
> framework you are using.
this works if you know ahead of time what the criteria of the search is
going to be.
so it will work for
select count(*) from table;
what this won't work for is cases wher the criteria of the search is
unpredictable, i.e.
ask the user for input
select count(*) from table where field=$input;
David Lang
> Another more all-DB approach is to create a statistics tables into which you
> place aggregated statistics rows (num deleted, num inserted, totals, etc) at
> an appropriate time interval in your code. So you have rows containing
> aggregated statistics information for the past and some tiny portion of the
> new data happening right now that hasn't yet been aggregated. Queries then
> look like a summation of the aggregated values in the statistics table plus a
> count(*) over just the newest portion of the data table and are generally
> very fast.
>
> Overall I have found that once things get big the layers of your app stack
> start to blend together and have to be combined in clever ways to keep speed
> up. Postgres is a beast but when you run into things it can't do well just
> find a way to cache it or make it work together with some other persistence
> tech to handle those cases.
>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | david | 2010-10-12 15:52:48 | Re: Slow count(*) again... |
Previous Message | Dan Harris | 2010-10-12 15:39:19 | Re: Slow count(*) again... |
From | Date | Subject | |
---|---|---|---|
Next Message | david | 2010-10-12 15:52:48 | Re: Slow count(*) again... |
Previous Message | Dan Harris | 2010-10-12 15:39:19 | Re: Slow count(*) again... |