Skip site navigation (1) Skip section navigation (2)

Re: Aggregates and Indexes

From: Ron Johnson <ron(dot)l(dot)johnson(at)cox(dot)net>
To: PgSQL Novice ML <pgsql-novice(at)postgresql(dot)org>
Subject: Re: Aggregates and Indexes
Date: 2002-08-03 00:27:37
Message-ID: 1028334457.17801.92.camel@haggis (view raw, whole thread or download thread mbox)
Lists: pgsql-novicepgsql-sql
On Fri, 2002-08-02 at 15:43, Tom Lane wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
> > For Postgres custom aggregates, using a standard index is impossible, for 
> > reasons I think are obvious.
> > That leaves MAX, MIN, and COUNT.    All of these aggregates should, in an 
> > ideal world, be index-responsive for large data sets.
> While it's fairly clear how one might use an index for MAX/MIN
> (basically, make the optimizer transform it into a SELECT ... ORDER BY
> ... LIMIT 1 operation, which can then be done with an indexscan),
> I really don't see how an index can help for COUNT.
> The real problem with COUNT is that any attempt to maintain such a value
> on-the-fly creates a single-point bottleneck for all insertions and
> deletions on the table.  The perspective of most of the developers is
> that that cost outweighs any possible savings from having an
> instantaneous COUNT operation.

Blecch!  Who wants to keep current counts?????  You are right 
that that is totally silly.  The closed-source database I use 
(Rdb/VMS on Alpha VMS) takes great advantage of indexes when 
processing COUNT, SUM, MIN & MAX.

For example:
CREATE INDEX ndx1 on table1 (fld1, fld2);
SELECT fld1, COUNT(*) 
FROM table1 
WHERE fld1 = 'bar'
GROUP BY fld1;

Since the index stores key-values and oids that point to the
data, Rdb looks at the "wedge" of ndx1 that matches "fld1 = 'bar'" 
and counts up the oids that meet the criteria.

In more complicated queries, where there is only partial index
support for the query, yes, you will have to hit the live table,
but the index should minimize the number of pages in the table
that the must read.  Thus, the index still speeds up the query.
> When you add in the issue that under MVCC there isn't a unique COUNT
> that's the same for all transactions, it's just not worth thinking
> about.  (And do I need to point out that with WHERE conditions,
> GROUP BY, or a variable COUNT argument, all hope of such optimization
> disappears anyway?  A global rowcount doesn't help in those cases.)
> The MAX/MIN issue will probably be addressed someday, but since there
> is a good workaround it's not very high on anyone's TODO queue.  We have
> many more-pressing problems.

| Ron Johnson, Jr.        Home: ron(dot)l(dot)johnson(at)cox(dot)net             |
| Jefferson, LA  USA                                              |
|                                                                 |
| "The greatest dangers to liberty lurk in insidious encroachment |
|  by men of zeal, well-meaning, but without understanding."      |
|   Justice Louis Brandeis, dissenting, Olmstead v US (1928)      |

In response to

pgsql-novice by date

Next:From: Bruce MomjianDate: 2002-08-03 01:36:12
Subject: Re: [SQL] Aggregates and Indexes
Previous:From: Tom LaneDate: 2002-08-02 20:43:51
Subject: Re: Aggregates and Indexes

pgsql-sql by date

Next:From: Bruce MomjianDate: 2002-08-03 01:36:12
Subject: Re: [SQL] Aggregates and Indexes
Previous:From: Tom LaneDate: 2002-08-02 23:52:38
Subject: Re: What about this?

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group