Re: Why does a simple query not use an obvious index?

From: "Scott Marlowe" <smarlowe(at)qwest(dot)net>
To: "Greg Stark" <gsstark(at)mit(dot)edu>
Cc: "Jack Kerkhof" <jack(dot)kerkhof(at)guest-tek(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Why does a simple query not use an obvious index?
Date: 2004-08-29 21:38:00
Message-ID: 1093815480.5493.29.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sun, 2004-08-29 at 15:12, Greg Stark wrote:
> "Scott Marlowe" <smarlowe(at)qwest(dot)net> writes:
>
> > Also, count(*) is likely to always generate a seq scan due to the way
> > aggregates are implemented currently in pgsql. you might want to try:
>
> Huh? I'm curious to know what you're talking about here.

This has been discussed ad infinitum on the lists in the past. And
explained by better minds than mine, but I'll give it a go.

PostgreSQL has a "generic" aggregate method. Imagine instead doing a
select count(id1+id2-id3) from table where ... In that instance, it's
not a simple shortcut to just grab the number of rows anymore. Since
PostgreSQL uses a generic aggregate method that can be expanded by the
user with custom aggregates et. al., it has no optimizations to make
simple count(*) fast, like many other databases.

Add to that the fact that even when postgresql uses an index it still
has to hit the data store to get the actual value of the tuple, and
you've got very few instances in which an index scan of more than some
small percentage of the table is worth while. I.e. a sequential scan
tends to "win" over an index scan quicker in postgresql than in other
databases like Oracle, where the data store is serialized and the
indexes have the correct information along with the application of the
transaction / roll back segments.

> > select somefield from sometable where timestampfield > now()-'60
> > seconds'::interval
> >
> > and count the number of returned rows. If there's a lot, it won't be
> > any faster, if there's a few, it should be a win.
>
> Why would this ever be faster? And how could postgres ever calculate that
> without doing a sequential scan when count(*) would force it to do a
> sequential scan?

Because, count(*) CANNOT use an index. So, if you're hitting, say,
0.01% of the table (let's say 20 out of 20,000,000 rows or something
like that) then the second should be MUCH faster.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2004-08-29 21:44:15 Re: Why does a simple query not use an obvious index?
Previous Message Greg Stark 2004-08-29 21:12:05 Re: Why does a simple query not use an obvious index?