Re: Aggregates and Indexes

From: Ron Johnson <ron(dot)l(dot)johnson(at)cox(dot)net>
To: PgSQL Novice ML <pgsql-novice(at)postgresql(dot)org>
Subject: Re: Aggregates and Indexes
Date: 2002-08-03 00:27:37
Message-ID: 1028334457.17801.92.camel@haggis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice pgsql-sql

On Fri, 2002-08-02 at 15:43, Tom Lane wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
> > For Postgres custom aggregates, using a standard index is impossible, for
> > reasons I think are obvious.
>
> > That leaves MAX, MIN, and COUNT. All of these aggregates should, in an
> > ideal world, be index-responsive for large data sets.
>
> While it's fairly clear how one might use an index for MAX/MIN
> (basically, make the optimizer transform it into a SELECT ... ORDER BY
> ... LIMIT 1 operation, which can then be done with an indexscan),
> I really don't see how an index can help for COUNT.
>
> The real problem with COUNT is that any attempt to maintain such a value
> on-the-fly creates a single-point bottleneck for all insertions and
> deletions on the table. The perspective of most of the developers is
> that that cost outweighs any possible savings from having an
> instantaneous COUNT operation.

Blecch! Who wants to keep current counts????? You are right
that that is totally silly. The closed-source database I use
(Rdb/VMS on Alpha VMS) takes great advantage of indexes when
processing COUNT, SUM, MIN & MAX.

For example:
CREATE INDEX ndx1 on table1 (fld1, fld2);
SELECT fld1, COUNT(*)
FROM table1
WHERE fld1 = 'bar'
GROUP BY fld1;

Since the index stores key-values and oids that point to the
data, Rdb looks at the "wedge" of ndx1 that matches "fld1 = 'bar'"
and counts up the oids that meet the criteria.

In more complicated queries, where there is only partial index
support for the query, yes, you will have to hit the live table,
but the index should minimize the number of pages in the table
that the must read. Thus, the index still speeds up the query.

> When you add in the issue that under MVCC there isn't a unique COUNT
> that's the same for all transactions, it's just not worth thinking
> about. (And do I need to point out that with WHERE conditions,
> GROUP BY, or a variable COUNT argument, all hope of such optimization
> disappears anyway? A global rowcount doesn't help in those cases.)
>
> The MAX/MIN issue will probably be addressed someday, but since there
> is a good workaround it's not very high on anyone's TODO queue. We have
> many more-pressing problems.

--
+-----------------------------------------------------------------+
| Ron Johnson, Jr. Home: ron(dot)l(dot)johnson(at)cox(dot)net |
| Jefferson, LA USA |
| |
| "The greatest dangers to liberty lurk in insidious encroachment |
| by men of zeal, well-meaning, but without understanding." |
| Justice Louis Brandeis, dissenting, Olmstead v US (1928) |
+-----------------------------------------------------------------+

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Bruce Momjian 2002-08-03 01:36:12 Re: [SQL] Aggregates and Indexes
Previous Message Tom Lane 2002-08-02 20:43:51 Re: Aggregates and Indexes

Browse pgsql-sql by date

  From Date Subject
Next Message Bruce Momjian 2002-08-03 01:36:12 Re: [SQL] Aggregates and Indexes
Previous Message Tom Lane 2002-08-02 23:52:38 Re: What about this?