Re: Slow count(*) again...

From: Samuel Gendler <sgendler(at)ideasculptor(dot)com>
To: Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow count(*) again...
Date: 2010-10-11 22:03:38
Message-ID: AANLkTimRuS-z3-8x7RpgqsQ9-rxVxWB3VyHG806UUyvR@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Mon, Oct 11, 2010 at 12:54 PM, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>wrote:

>
> 2. You need a slice of the data which requires another scan to the table to
> get, and using the same WHERE clause as above. This seems like a total
> waste,
> because we just did that with the exception of actually fetching the data.
>
> Why do it twice when if there was a way to get a slice using OFFSET and
> LIMIT
> and get the amount of rows that matched before the OFFSET and LIMIT was
> applied you could do the scan once? I think that this would improve things
> and
> give Postgres an edge over other systems.
>
>
I'd go even farther with number 2 and suggest that a form of offset/limit
which can return the total count OR have a total count be passed in to be
returned the same way as if total count were being computed would make the
use of that api even easier, since you could keep re-using the number
returned the first time without changing the api that gets used depending
upon context. Of course, you could contrive to set that up via a stored
proc relatively easily by simply doing the count(*) once, then appending it
to each row of the offset/limit query by including it in the select
statement. Let it optionally receive the total to be used as an input
parameter, which if not null will result in the count(*) block being skipped
in the proc. You'd incur the full cost of the table scan plus offset/limit
query once, but then not for each and every page. Since the modified api
you suggest for offset/limit would surely have to perform the table scan
once, that solution really isn't giving much more value than implementing
as a stored proc other than the flexibility of executing an arbitrary query.
Modified offset/limit combined with the count_estimate functionality would
be very useful in this circumstance, though - especially if the estimate
would just do a full count if the estimate is under a certain threshold. A
25% discrepancy when counting millions of rows is a lot less of an issue
than a 25% discrepancy when counting 10 rows.

One issue with an estimation is that you must be certain that the estimate
>= actual count or else the app must always attempt to load the page BEYOND
the last page of the estimate in order to determine if the estimate must be
revised upward. Otherwise, you risk leaving rows out entirely. Probably ok
when returning search results. Not so much when displaying a list of
assets.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2010-10-11 23:16:50 Re: Issues with two-server Synch Rep
Previous Message Josh Berkus 2010-10-11 21:40:06 Re: Issues with two-server Synch Rep

Browse pgsql-performance by date

  From Date Subject
Next Message Damon Snyder 2010-10-11 23:10:08 Stored procedure declared as VOLATILE => no good optimization is done
Previous Message Mladen Gogala 2010-10-11 20:58:37 Re: Slow count(*) again...