Re: Much Ado About COUNT(*)

From: Jeff Davis <jdavis-pgsql(at)empires(dot)org>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: "Jonah H(dot) Harris" <jharris(at)tvi(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Much Ado About COUNT(*)
Date: 2005-01-12 20:41:38
Message-ID: 1105562498.2886.344.camel@jeff
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-announce pgsql-hackers pgsql-patches

> No, sequential scans require slightly more i/o than index scans. More
> importantly they require random access i/o instead of sequential i/o which is
> much slower.
>

Just to clear it up, I think what you meant was the index requires
random i/o, not the table. And the word "slightly" depends on the size
of the table I suppose. And of course it also depends on how many tuples
you actually need to retrieve (in this case we're talking about
retrieving all the tuples ragardless).

> Though this depends. If the tuple is very wide then the index might be faster
> to scan since it would only contain the data from the fields being indexed.
>

That, and it seems strange on the surface to visit every entry in an
index, since normally indexes are used to find only a small fraction of
the tuples.

> This brings to mind another approach. It might be handy to split the heap for
> a table into multiple heaps. The visibility information would only be in one
> of the heaps. This would be a big win if many of the fields were rarely used,
> especially if they're rarely used by sequential scans.

Except then the two heaps would have to be joined somehow for every
operation. It makes sense some times to (if you have a very wide table)
split off the rarely-accessed attributes into a seperate table to be
joined one-to-one when those attributes are needed. To have the system
do that automatically would create problems if the attributes that are
split off are frequently accessed, right?

Perhaps you could optionally create a seperate copy of the same tuple
visibility information linked in a way similar to an index. It still
seems like you gain very little, and only in some very rare situation
that I've never encountered (I've never had the need to do frequent
unqualified count()s at the expense of other operations).

Now, it seems like it might make a little more sense to use an index for
min()/max(), but that's a different story.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-announce by date

  From Date Subject
Next Message Jonah H. Harris 2005-01-12 20:42:58 Re: Much Ado About COUNT(*)
Previous Message Andrew Dunstan 2005-01-12 20:14:51 Re: Much Ado About COUNT(*)

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonah H. Harris 2005-01-12 20:42:58 Re: Much Ado About COUNT(*)
Previous Message Andrew Dunstan 2005-01-12 20:14:51 Re: Much Ado About COUNT(*)

Browse pgsql-patches by date

  From Date Subject
Next Message Jonah H. Harris 2005-01-12 20:42:58 Re: Much Ado About COUNT(*)
Previous Message Andrew Dunstan 2005-01-12 20:14:51 Re: Much Ado About COUNT(*)