Quick Links

Re: Slow count(*) again...

From:	Craig James <craig_james(at)emolecules(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Slow count(*) again...
Date:	2010-10-11 17:46:17
Message-ID:	4CB34D69.6060108@emolecules.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

On 10/9/10 6:47 PM, Scott Marlowe wrote:
> On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel<neil(dot)whelchel(at)gmail(dot)com> wrote:
>> I know that there haven been many discussions on the slowness of count(*) even
>> when an index is involved because the visibility of the rows has to be
>> checked. In the past I have seen many suggestions about using triggers and
>> tables to keep track of counts and while this works fine in a situation where
>> you know what the report is going to be ahead of time, this is simply not an
>> option when an unknown WHERE clause is to be used (dynamically generated).
>> I ran into a fine example of this when I was searching this mailing list,
>> "Searching in 856,646 pages took 13.48202 seconds. Site search powered by
>> PostgreSQL 8.3." Obviously at some point count(*) came into play here because
>> the site made a list of pages (1 2 3 4 5 6> next). I very commonly make a
>> list of pages from search results, and the biggest time killer here is the
>> count(*) portion, even worse yet, I sometimes have to hit the database with
>> two SELECT statements, one with OFFSET and LIMIT to get the page of results I
>> need and another to get the amount of total rows so I can estimate how many
>> pages of results are available. The point I am driving at here is that since
>> building a list of pages of results is such a common thing to do, there need
>> to be some specific high speed ways to do this in one query. Maybe an
>> estimate(*) that works like count but gives an answer from the index without
>> checking visibility? I am sure that this would be good enough to make a page
>> list, it is really no big deal if it errors on the positive side, maybe the
>> list of pages has an extra page off the end. I can live with that. What I
>> can't live with is taking 13 seconds to get a page of results from 850,000
>> rows in a table.
>
> 99% of the time in the situations you don't need an exact measure, and
> assuming analyze has run recently, select rel_tuples from pg_class for
> a given table is more than close enough. I'm sure wrapping that in a
> simple estimated_rows() function would be easy enough to do.

First of all, it's not true. There are plenty of applications that need an exact answer. Second, even if it is only 1%, that means it's 1% of the queries, not 1% of people. Sooner or later a large fraction of developers will run into this. It's probably been the most-asked question I've seen on this forum in the four years I've been here. It's a real problem, and it needs a real solution.

I know it's a hard problem to solve, but can we stop hinting that those of us who have this problem are somehow being dense?

Thanks,
Craig

In response to

Re: Slow count(*) again... at 2010-10-10 01:47:34 from Scott Marlowe

Responses

Re: Slow count(*) again... at 2010-10-11 19:54:57 from Neil Whelchel
Re: Slow count(*) again... at 2010-10-12 07:56:13 from Vitalii Tymchyshyn

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2010-10-11 17:48:12	Re: wip: functions median and percentile
Previous Message	Dean Rasheed	2010-10-11 17:42:10	Re: wip: functions median and percentile

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Josh Berkus	2010-10-11 17:50:08	Re: XFS vs Ext3, and schedulers, for WAL
Previous Message	Pierre C	2010-10-11 10:09:02	Re: Slow count(*) again...