Re: Slow count(*) again...

From: Віталій Тимчишин <tivv00(at)gmail(dot)com>
To: Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow count(*) again...
Date: 2010-10-10 12:02:03
Message-ID: AANLkTimikExy7yCkB23R-Zmx+DaRh0uQhZE-jAE1B_xj@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

2010/10/10 Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>

> On Saturday 09 October 2010 18:47:34 Scott Marlowe wrote:
> > On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
> wrote:
> > > I know that there haven been many discussions on the slowness of
> count(*)
> > > even when an index is involved because the visibility of the rows has
> to
> > > be checked. In the past I have seen many suggestions about using
> > > triggers and tables to keep track of counts and while this works fine
> in
> > > a situation where you know what the report is going to be ahead of
> time,
> > > this is simply not an option when an unknown WHERE clause is to be used
> > > (dynamically generated). I ran into a fine example of this when I was
> > > searching this mailing list, "Searching in 856,646 pages took 13.48202
> > > seconds. Site search powered by PostgreSQL 8.3." Obviously at some
> point
> > > count(*) came into play here because the site made a list of pages (1 2
> > > 3 4 5 6 > next). I very commonly make a list of pages from search
> > > results, and the biggest time killer here is the count(*) portion, even
> > > worse yet, I sometimes have to hit the database with two SELECT
> > > statements, one with OFFSET and LIMIT to get the page of results I need
> > > and another to get the amount of total rows so I can estimate how many
> > > pages of results are available. The point I am driving at here is that
> > > since building a list of pages of results is such a common thing to do,
> > > there need to be some specific high speed ways to do this in one query.
> > > Maybe an estimate(*) that works like count but gives an answer from the
> > > index without checking visibility? I am sure that this would be good
> > > enough to make a page list, it is really no big deal if it errors on
> the
> > > positive side, maybe the list of pages has an extra page off the end. I
> > > can live with that. What I can't live with is taking 13 seconds to get
> a
> > > page of results from 850,000 rows in a table.
> >
> > 99% of the time in the situations you don't need an exact measure, and
> > assuming analyze has run recently, select rel_tuples from pg_class for
> > a given table is more than close enough. I'm sure wrapping that in a
> > simple estimated_rows() function would be easy enough to do.
>
> This is a very good approach and it works very well when you are counting
> the
> entire table, but when you have no control over the WHERE clause, it
> doesn't
> help. IE: someone puts in a word to look for in a web form.
>
> From my perspective, this issue is the biggest problem there is when using
> Postgres to create web pages, and it is so commonly used, I think that
> there
> should be a specific way to deal with it so that you don't have to run the
> same WHERE clause twice.
> IE: SELECT count(*) FROM <table> WHERE <clause>; to get the total amount of
> items to make page navigation links, then:
> SELECT <columns> FROM table WHERE <clause> LIMIT <items_per_page> OFFSET
> <(page_no-1)*items_per_page>; to get the actual page contents.
>
> How about
select * from (select *, count(*) over () as total_count from <table> where
<clause) a LIMIT <items_per_page> OFFSET
<(page_no-1)*items_per_page>
It will return you total_count column with equal value in each row. You may
have problems if no rows are returned (e.g. page num is too high).
--
Best regards,
Vitalii Tymchyshyn

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hitoshi Harada 2010-10-10 12:46:18 Re: Which file does the SELECT?
Previous Message Marko Tiikkaja 2010-10-10 11:48:17 Re: Review: Fix snapshot taking inconsistencies

Browse pgsql-performance by date

  From Date Subject
Next Message Reid Thompson 2010-10-10 15:02:32 Re: Slow count(*) again...
Previous Message Florian Weimer 2010-10-10 11:45:01 Re: large dataset with write vs read clients