Re: Indices for select count(*)?

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: Indices for select count(*)?
Date: 2005-12-23 22:16:44
Message-ID: 200512232216.jBNMGin28411@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Martijn van Oosterhout wrote:
-- Start of PGP signed section.
> On Fri, Dec 23, 2005 at 11:04:50AM -0500, Tom Lane wrote:
> > It's not that easy --- in the MVCC world there simply isn't a unique
> > count that is the right answer for every observer. But the idea of
> > packaging a count(*) mechanism as an index type seems like it might be
> > a good one. I don't think the planner objection need be taken too
> > seriously: we already have a good big wart in there for recognizing
> > MIN/MAX indexability, and this sort of transformation would fit pretty
> > naturally with what's already done in planagg.c.
>
> AFAICS two big problems with using an index type:
>
> 1. The index isn't told when the tuple is deleted.
> 2. The server expects to be able to lookup an index.
>
> Other than that...

I think our TODO has a good summary of the issues:

---------------------------------------------------------------------------

* Speed up COUNT(*)

We could use a fixed row count and a +/- count to follow MVCC
visibility rules, or a single cached value could be used and
invalidated if anyone modifies the table. Another idea is to
get a count directly from a unique index, but for this to be
faster than a sequential scan it must avoid access to the heap
to obtain tuple visibility information.

* Add estimated_count(*) to return an estimate of COUNT(*)

This would use the planner ANALYZE statistics to return an estimated
count.

* Allow data to be pulled directly from indexes

Currently indexes do not have enough tuple visibility information
to allow data to be pulled from the index without also accessing
the heap. One way to allow this is to set a bit on index tuples
to indicate if a tuple is currently visible to all transactions
when the first valid heap lookup happens. This bit would have to
be cleared when a heap tuple is expired.

Another idea is to maintain a bitmap of heap pages where all rows
are visible to all backends, and allow index lookups to reference
that bitmap to avoid heap lookups, perhaps the same bitmap we might
add someday to determine which heap pages need vacuuming. Frequently
accessed bitmaps would have to be stored in shared memory. One 8k
page of bitmaps could track 512MB of heap pages.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2005-12-23 22:45:48 Re: Indices for select count(*)?
Previous Message Peter Eisentraut 2005-12-23 21:36:12 Re: newbie : setting access for users in a web enviroment