Skip site navigation (1) Skip section navigation (2)

Re: Much Ado About COUNT(*)

From: Jeff Davis <jdavis-pgsql(at)empires(dot)org>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: "Jonah H(dot) Harris" <jharris(at)tvi(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Much Ado About COUNT(*)
Date: 2005-01-12 20:41:38
Message-ID: 1105562498.2886.344.camel@jeff (view raw or flat)
Thread:
Lists: pgsql-announcepgsql-hackerspgsql-patches

> No, sequential scans require slightly more i/o than index scans. More
> importantly they require random access i/o instead of sequential i/o which is
> much slower.
> 

Just to clear it up, I think what you meant was the index requires
random i/o, not the table. And the word "slightly" depends on the size
of the table I suppose. And of course it also depends on how many tuples
you actually need to retrieve (in this case we're talking about
retrieving all the tuples ragardless).

> Though this depends. If the tuple is very wide then the index might be faster
> to scan since it would only contain the data from the fields being indexed.
> 

That, and it seems strange on the surface to visit every entry in an
index, since normally indexes are used to find only a small fraction of
the tuples.

> This brings to mind another approach. It might be handy to split the heap for
> a table into multiple heaps. The visibility information would only be in one
> of the heaps. This would be a big win if many of the fields were rarely used,
> especially if they're rarely used by sequential scans.

Except then the two heaps would have to be joined somehow for every
operation. It makes sense some times to (if you have a very wide table)
split off the rarely-accessed attributes into a seperate table to be
joined one-to-one when those attributes are needed. To have the system
do that automatically would create problems if the attributes that are
split off are frequently accessed, right?

Perhaps you could optionally create a seperate copy of the same tuple
visibility information linked in a way similar to an index. It still
seems like you gain very little, and only in some very rare situation
that I've never encountered (I've never had the need to do frequent
unqualified count()s at the expense of other operations). 

Now, it seems like it might make a little more sense to use an index for
min()/max(), but that's a different story.

Regards,
	Jeff Davis



In response to

Responses

pgsql-announce by date

Next:From: Jonah H. HarrisDate: 2005-01-12 20:42:58
Subject: Re: Much Ado About COUNT(*)
Previous:From: Andrew DunstanDate: 2005-01-12 20:14:51
Subject: Re: Much Ado About COUNT(*)

pgsql-hackers by date

Next:From: Jonah H. HarrisDate: 2005-01-12 20:42:58
Subject: Re: Much Ado About COUNT(*)
Previous:From: Andrew DunstanDate: 2005-01-12 20:14:51
Subject: Re: Much Ado About COUNT(*)

pgsql-patches by date

Next:From: Jonah H. HarrisDate: 2005-01-12 20:42:58
Subject: Re: Much Ado About COUNT(*)
Previous:From: Andrew DunstanDate: 2005-01-12 20:14:51
Subject: Re: Much Ado About COUNT(*)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group