Skip site navigation (1) Skip section navigation (2)

Re: Much Ado About COUNT(*)

From: Jeff Davis <jdavis-pgsql(at)empires(dot)org>
To: "Jonah H(dot) Harris" <jharris(at)tvi(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Much Ado About COUNT(*)
Date: 2005-01-12 20:59:51
Message-ID: 1105563592.2886.361.camel@jeff (view raw or flat)
Thread:
Lists: pgsql-announcepgsql-hackerspgsql-patches
> >
> I recognize the added cost of implementing index only scans.  As storage 
> is relatively cheap these days, everyone I know is more concerned about 
> faster access to data.  Similarly, it would still be faster to scan the 
> indexes than to perform a sequential scan over the entire relation for 
> this case.  I also acknowledge that it would be a negative impact to 
> indexes where this type of acces isn't required, as you suggested and 
> which is more than likely not the case.  I just wonder what more people 
> would be happier with and whether the added 16-20 bytes would be 
> extremely noticable considering most 1-3 year old hardware.

I think perhaps you missed the point: it's not about price. If an index
takes up more space, it will also take more time to read that index.
16-20 bytes per index entry is way too much extra overhead for most
people, no matter what hardware they have. That overhead also tightens
the performace at what is already the bottleneck for almost every DB:
i/o bandwidth.

The cost to count the tuples is the cost to read that visibility
information for each tuple in the table. A seqscan is the most efficient
way to do that since it's sequential i/o, rather than random i/o. The
only reason the word "index" even comes up is because it is inefficient
to retrieve a lot of extra attributes you don't need from a table.

You might be able to pack that visibility information a little bit more
densely in an index than a table, assuming that the table has more than
a couple columns. But if you shoehorn the visibility information into an
index, you destroy much of the value of an index to most people, who
require the index to be compact to be efficient.

An index isn't really the place for something when all you really want
to do is a sequential scan over a smaller amount of data (so that the
visibility information is more dense). Make a narrow table, and seqscan
over that. Then, if you need more attributes in the table, just do a
one-to-one join with a seperate table.

Regards,
	Jeff Davis


In response to

pgsql-announce by date

Next:From: Jon JensenDate: 2005-01-12 21:02:23
Subject: Re: Much Ado About COUNT(*)
Previous:From: Jonah H. HarrisDate: 2005-01-12 20:59:07
Subject: Re: Much Ado About COUNT(*)

pgsql-hackers by date

Next:From: Jon JensenDate: 2005-01-12 21:02:23
Subject: Re: Much Ado About COUNT(*)
Previous:From: Jonah H. HarrisDate: 2005-01-12 20:59:07
Subject: Re: Much Ado About COUNT(*)

pgsql-patches by date

Next:From: Jon JensenDate: 2005-01-12 21:02:23
Subject: Re: Much Ado About COUNT(*)
Previous:From: Jonah H. HarrisDate: 2005-01-12 20:59:07
Subject: Re: Much Ado About COUNT(*)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group