Re: Slow count(*) again...

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow count(*) again...
Date: 2010-10-10 01:47:34
Message-ID: AANLkTimayTM__hHu9RoiLaNhzfpNY6Vo5y4OkbX_sV=o@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com> wrote:
> I know that there haven been many discussions on the slowness of count(*) even
> when an index is involved because the visibility of the rows has to be
> checked. In the past I have seen many suggestions about using triggers and
> tables to keep track of counts and while this works fine in a situation where
> you know what the report is going to be ahead of time, this is simply not an
> option when an unknown WHERE clause is to be used (dynamically generated).
> I ran into a fine example of this when I was searching this mailing list,
> "Searching in 856,646 pages took 13.48202 seconds. Site search powered by
> PostgreSQL 8.3." Obviously at some point count(*) came into play here because
> the site made a list of pages (1 2 3 4 5 6 > next). I very commonly make a
> list of pages from search results, and the biggest time killer here is the
> count(*) portion, even worse yet, I sometimes have to hit the database with
> two SELECT statements, one with OFFSET and LIMIT to get the page of results I
> need and another to get the amount of total rows so I can estimate how many
> pages of results are available. The point I am driving at here is that since
> building a list of pages of results is such a common thing to do, there need
> to be some specific high speed ways to do this in one query. Maybe an
> estimate(*) that works like count but gives an answer from the index without
> checking visibility? I am sure that this would be good enough to make a page
> list, it is really no big deal if it errors on the positive side, maybe the
> list of pages has an extra page off the end. I can live with that. What I
> can't live with is taking 13 seconds to get a page of results from 850,000
> rows in a table.

99% of the time in the situations you don't need an exact measure, and
assuming analyze has run recently, select rel_tuples from pg_class for
a given table is more than close enough. I'm sure wrapping that in a
simple estimated_rows() function would be easy enough to do.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mladen Gogala 2010-10-10 01:54:15 Re: Slow count(*) again...
Previous Message James Cloos 2010-10-10 00:27:22 Re: .gitignore files, take two

Browse pgsql-performance by date

  From Date Subject
Next Message Mladen Gogala 2010-10-10 01:54:15 Re: Slow count(*) again...
Previous Message Neil Whelchel 2010-10-09 23:26:18 Slow count(*) again...