Re: Much Ado About COUNT(*)

From: "Dann Corbit" <DCorbit(at)connx(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jonah H(dot) Harris" <jharris(at)tvi(dot)edu>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Much Ado About COUNT(*)
Date: 2005-01-12 19:57:11
Message-ID: D425483C2C5C9F49B5B7A41F894415470557D3@postal.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

A notion for indices that are not unique... (won't help much on select
count(*) but might be helpful for other types of query optimization)

Put a count in the index for each distinct type.
In the worst case, the index is actually unique and you have 8 wasted
bytes per index entry and all the entries are in the leaves (perhaps it
could be an OPTION for some tables). I don't know enough about the
structure of PostgreSQL's indexes to know if my suggestion is pure
hogwash, so don't laugh to hard if it is pure stupidity.

The most practical value of SELECT COUNT(*) is for updating statistics
(and looking good in phony-baloney benchmarks). But the statistics only
need to be updated when you vacuum, so it hardly seems a crucial issue
to me.

-----Original Message-----
From: pgsql-hackers-owner(at)postgresql(dot)org
[mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Tom Lane
Sent: Wednesday, January 12, 2005 11:42 AM
To: Jonah H. Harris
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Much Ado About COUNT(*)

"Jonah H. Harris" <jharris(at)tvi(dot)edu> writes:
> My thinking is that we may be able to implement index usage for not
only
> unqualified counts, but also on any query that can be satisfied by the

> index itself.

The fundamental problem is that you can't do it without adding at least
16 bytes, probably 20, to the size of an index tuple header. That would
double the physical size of an index on a simple column (eg an integer
or timestamp). The extra I/O costs and extra maintenance costs are
unattractive to say the least. And it takes away some of the
justification for the whole thing, which is that reading an index is
much cheaper than reading the main table. That's only true if the index
is much smaller than the main table ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if
your
joining column's datatypes do not match

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-01-12 19:59:42 Re: [HACKERS] segfault caused by heimdal (was: SUSE port)
Previous Message Jonah H. Harris 2005-01-12 19:52:53 Re: Much Ado About COUNT(*)