Quick Links

Re: Performance of count(*)

From:	"Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Performance of count(*)
Date:	2007-03-22 15:16:51
Message-ID:	46029DE3.9030505@modgraph-usa.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Michael Stone wrote:
> On Thu, Mar 22, 2007 at 01:30:35PM +0200, ismo(dot)tuononen(at)solenovo(dot)fi wrote:
>> approximated count?????
>>
>> why? who would need it? where you can use it?
>
> Do a google query. Look at the top of the page, where it says "results N
> to M of about O". For user interfaces (which is where a lot of this
> count(*) stuff comes from) you quite likely don't care about the exact
> count...

Right on, Michael.

One of our biggest single problems is this very thing. It's not a Postgres problem specifically, but more embedded in the idea of a relational database: There are no "job status" or "rough estimate of results" or "give me part of the answer" features that are critical to many real applications.

In our case (for a variety of reasons, but this one is critical), we actually can't use Postgres indexing at all -- we wrote an entirely separate indexing system for our data, one that has the following properties:

1. It can give out "pages" of information (i.e. "rows 50-60") without
rescanning the skipped pages the way "limit/offset" would.
2. It can give accurate estimates of the total rows that will be returned.
3. It can accurately estimate the time it will take.

For our primary business-critical data, Postgres is merely a storage system, not a search system, because we have to do the "heavy lifting" in our own code. (To be fair, there is no relational database that can handle our data.)

Many or most web-based search engines face these exact problems.

Craig

In response to

Re: Performance of count(*) at 2007-03-22 14:18:10 from Michael Stone

Responses

Re: Performance of count(*) at 2007-03-22 15:31:39 from Tino Wildenhain
Re: Performance of count(*) at 2007-03-22 16:10:51 from Brian Hurt

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Andreas Kostyrka	2007-03-22 15:17:17	Re: Performance of count(*)
Previous Message	Alvaro Herrera	2007-03-22 15:12:50	Re: Parallel Vacuum