Re: Much Ado About COUNT(*)

From: Jeff Davis <jdavis-pgsql(at)empires(dot)org>
To: "Jonah H(dot) Harris" <jharris(at)tvi(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Much Ado About COUNT(*)
Date: 2005-01-12 20:59:51
Message-ID: 1105563592.2886.361.camel@jeff
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-announce pgsql-hackers pgsql-patches


> >
> I recognize the added cost of implementing index only scans. As storage
> is relatively cheap these days, everyone I know is more concerned about
> faster access to data. Similarly, it would still be faster to scan the
> indexes than to perform a sequential scan over the entire relation for
> this case. I also acknowledge that it would be a negative impact to
> indexes where this type of acces isn't required, as you suggested and
> which is more than likely not the case. I just wonder what more people
> would be happier with and whether the added 16-20 bytes would be
> extremely noticable considering most 1-3 year old hardware.

I think perhaps you missed the point: it's not about price. If an index
takes up more space, it will also take more time to read that index.
16-20 bytes per index entry is way too much extra overhead for most
people, no matter what hardware they have. That overhead also tightens
the performace at what is already the bottleneck for almost every DB:
i/o bandwidth.

The cost to count the tuples is the cost to read that visibility
information for each tuple in the table. A seqscan is the most efficient
way to do that since it's sequential i/o, rather than random i/o. The
only reason the word "index" even comes up is because it is inefficient
to retrieve a lot of extra attributes you don't need from a table.

You might be able to pack that visibility information a little bit more
densely in an index than a table, assuming that the table has more than
a couple columns. But if you shoehorn the visibility information into an
index, you destroy much of the value of an index to most people, who
require the index to be compact to be efficient.

An index isn't really the place for something when all you really want
to do is a sequential scan over a smaller amount of data (so that the
visibility information is more dense). Make a narrow table, and seqscan
over that. Then, if you need more attributes in the table, just do a
one-to-one join with a seperate table.

Regards,
Jeff Davis

In response to

Browse pgsql-announce by date

  From Date Subject
Next Message Jon Jensen 2005-01-12 21:02:23 Re: Much Ado About COUNT(*)
Previous Message Jonah H. Harris 2005-01-12 20:59:07 Re: Much Ado About COUNT(*)

Browse pgsql-hackers by date

  From Date Subject
Next Message Jon Jensen 2005-01-12 21:02:23 Re: Much Ado About COUNT(*)
Previous Message Jonah H. Harris 2005-01-12 20:59:07 Re: Much Ado About COUNT(*)

Browse pgsql-patches by date

  From Date Subject
Next Message Jon Jensen 2005-01-12 21:02:23 Re: Much Ado About COUNT(*)
Previous Message Jonah H. Harris 2005-01-12 20:59:07 Re: Much Ado About COUNT(*)