Quick Links

Re: no index-usage on aggregate-functions?

From:	"Scott Marlowe" <smarlowe(at)qwest(dot)net>
To:	"Harald Lau (Sector-X)" <harald(at)sector-x(dot)de>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: no index-usage on aggregate-functions?
Date:	2004-06-29 07:25:03
Message-ID:	1088493903.12350.5.camel@localhost.localdomain
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Tue, 2004-06-29 at 00:42, Harald Lau (Sector-X) wrote:
> Hi,
>
> I've experienced that PG up to current release does not make use of an index when aggregating. Which of course may result in unacceptable answering times
>
> This behaviour is reproducable on any table with any aggregat function in all of my databases on every machine (PostgreSQL 7.4.2 on i386-redhat-linux-gnu and PostgreSQL 7.2.1 on i686-pc-linux-gnu)
>
> f.e. querying against a 2.8-mio-records (2.800.000) table the_table
> SELECT count(*) FROM the_table
> => Seq scan -> takes about 12 sec
>
> SELECT Avg(num_found) AS NumFound FROM the_table --(index on num_found)
> => Seq scan -> takes about 10 sec
>
> SELECT Sum(num_found) AS TotalFound FROM the_table --(index on num_found)
> => Seq scan -> takes about 11 sec
>
> SELECT Max(date_) AS LatestDate FROM the_table --(index on date_)
> => Seq scan -> takes about 14 sec
>
> But
> SELECT date_ AS LatestDate FROM the_table ORDER BY date_ DESC LIMIT 1;
> => Index scan -> takes 0.18 msec
>
> MS SQLServer 2000: Use of an appropriate index _whenever_ aggregating.
>
> Am I doing something wrong?

Yes, you're expecting an MVCC database to behave like a row locking
database.

Due to the way PostgreSQL is put together, it can't count on an index
giving it values, only pointers to values, so to speak. This means it
can use an index, but it will still go to the table to get the right
value.

On the other hand, the trade off is that MVCC can handle much higher
parallel loads, usually.

Note that if you're aggregate is on sub subset of a table, then an index
scan can often be a big win, such as:

create table z(a int, b int);
insert into z values (1,1); (repeat a couple thousand times)
select avg(b) from z where a=3; <-- this can use the index

But note that in the above, the table's rows will still have to be
accessed to get the right values.

In response to

no index-usage on aggregate-functions? at 2004-06-29 06:42:14 from Harald Lau (Sector-X)

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Christopher Kings-Lynne	2004-06-29 07:36:16	Re: no index-usage on aggregate-functions?
Previous Message	Harald Lau (Sector-X)	2004-06-29 06:42:14	no index-usage on aggregate-functions?