Skip site navigation (1) Skip section navigation (2)

Re: Slow count(*) again...

From: Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow count(*) again...
Date: 2010-10-10 21:59:48
Message-ID: 201010101459.49336.neil.whelchel@gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
On Sunday 10 October 2010 05:02:03 Віталій Тимчишин wrote:
> 2010/10/10 Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
> 
> > On Saturday 09 October 2010 18:47:34 Scott Marlowe wrote:
> > > On Sat, Oct 9, 2010 at 5:26 PM, Neil Whelchel <neil(dot)whelchel(at)gmail(dot)com>
> > 
> > wrote:
> > > > I know that there haven been many discussions on the slowness of
> > 
> > count(*)
> > 
> > > > even when an index is involved because the visibility of the rows has
> > 
> > to
> > 
> > > > be checked. In the past I have seen many suggestions about using
> > > > triggers and tables to keep track of counts and while this works fine
> > 
> > in
> > 
> > > > a situation where you know what the report is going to be ahead of
> > 
> > time,
> > 
> > > > this is simply not an option when an unknown WHERE clause is to be
> > > > used (dynamically generated). I ran into a fine example of this when
> > > > I was searching this mailing list, "Searching in 856,646 pages took
> > > > 13.48202 seconds. Site search powered by PostgreSQL 8.3." Obviously
> > > > at some
> > 
> > point
> > 
> > > > count(*) came into play here because the site made a list of pages (1
> > > > 2 3 4 5 6 > next). I very commonly make a list of pages from search
> > > > results, and the biggest time killer here is the count(*) portion,
> > > > even worse yet, I sometimes have to hit the database with two SELECT
> > > > statements, one with OFFSET and LIMIT to get the page of results I
> > > > need and another to get the amount of total rows so I can estimate
> > > > how many pages of results are available. The point I am driving at
> > > > here is that since building a list of pages of results is such a
> > > > common thing to do, there need to be some specific high speed ways
> > > > to do this in one query. Maybe an estimate(*) that works like count
> > > > but gives an answer from the index without checking visibility? I am
> > > > sure that this would be good enough to make a page list, it is
> > > > really no big deal if it errors on
> > 
> > the
> > 
> > > > positive side, maybe the list of pages has an extra page off the end.
> > > > I can live with that. What I can't live with is taking 13 seconds to
> > > > get
> > 
> > a
> > 
> > > > page of results from 850,000 rows in a table.
> > > 
> > > 99% of the time in the situations you don't need an exact measure, and
> > > assuming analyze has run recently, select rel_tuples from pg_class for
> > > a given table is more than close enough.  I'm sure wrapping that in a
> > > simple estimated_rows() function would be easy enough to do.
> > 
> > This is a very good approach and it works very well when you are counting
> > the
> > entire table, but when you have no control over the WHERE clause, it
> > doesn't
> > help. IE: someone puts in a word to look for in a web form.
> > 
> > From my perspective, this issue is the biggest problem there is when
> > using Postgres to create web pages, and it is so commonly used, I think
> > that there
> > should be a specific way to deal with it so that you don't have to run
> > the same WHERE clause twice.
> > IE: SELECT count(*) FROM <table> WHERE <clause>; to get the total amount
> > of items to make page navigation links, then:
> > SELECT <columns> FROM table WHERE <clause> LIMIT <items_per_page> OFFSET
> > <(page_no-1)*items_per_page>; to get the actual page contents.
> > 
> > How about
> 
> select * from (select *, count(*) over () as total_count from <table> where
> <clause)  a LIMIT <items_per_page> OFFSET
> <(page_no-1)*items_per_page>
> It will return you total_count column with equal value in each row. You may
> have problems if no rows are returned (e.g. page num is too high).

I have done this before, but the speedup from the two hits to the database 
that I mentioned above is tiny, just a few ms. It seems to end up doing about 
the same thing on the database end. The reason that I don't commonly do this 
is what you said about not getting a count result if you run off the end.
-Neil-

In response to

pgsql-performance by date

Next:From: Craig RingerDate: 2010-10-10 22:41:16
Subject: Re: Slow count(*) again...
Previous:From: Reid ThompsonDate: 2010-10-10 18:33:04
Subject: Re: Slow count(*) again...

pgsql-hackers by date

Next:From: Marios VodasDate: 2010-10-10 22:36:05
Subject: $libdir under linux
Previous:From: Tom LaneDate: 2010-10-10 21:33:51
Subject: Re: Debugging initdb breakage

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group