Re: Bad query optimisation

From: Greg Stark <gsstark(at)mit(dot)edu>
To: "Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au>
Cc: <pgsql-hackers(at)postgresql(dot)org>, "Greg Stark" <gsstark(at)mit(dot)edu>
Subject: Re: Bad query optimisation
Date: 2002-11-30 23:23:56
Message-ID: 87u1hywsj7.fsf@stark.dyndns.tv
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


"Christopher Kings-Lynne" <chriskl(at)familyhealth(dot)com(dot)au> writes:

> > This is weird, it seems like min and max aren't being optimised
> > symmetrically. It seems like both of these should result in similar plans
> > and run equally fast. Instead the first is actually really slow and the
> > second is perfectly quick.
>
> Without knowing anything about your data, if Postgres knows (from its stats
> tables) that 90% of the values in your column are above 'K0C1N2' then it
> will of course do a seq scan for the second query.

Oops, you're right that was a bad diagnosis. When I use the midpoint of the
data set they both get optimized into the same plan.

However I still think there's something wrong. It looks like postgres doesn't
know that it's possible to calculate min and max without scanning every
record.

When I run the queries below there's a big variance between the 2.3s required
to find the minimum for the whole dataset, and the .098s required to find the
maximum of the small subset below K0C1N2. Because there's an index on
postalcode it should be possible to find the minimum of any range in a single
index lookup.

It seems like this should be an important optimization given the number of
applications that request max(foo) in a broken attempt to implement sequences.
Occasionally it's not even a broken attempt too.

Incidentally, this is Postgres 7.2. Is this improved in 7.3?

bash-2.05b$ time psql -d salesoutlook -c "select min(postalcode) from postalcodes"
min
--------
K0A1A0
(1 row)

real 0m2.334s
user 0m0.040s
sys 0m0.010s

bash-2.05b$ time psql -d salesoutlook -c "select max(postalcode) from postalcodes where postalcode < 'K0C1N2'"
max
--------
K0C1N0
(1 row)

real 0m0.098s
user 0m0.030s
sys 0m0.020s

bash-2.05b$ time psql -d salesoutlook -c "select max(postalcode) from postalcodes where postalcode < 'L9J1J2'"
max
--------
L9J1J1
(1 row)

real 0m2.128s
user 0m0.030s
sys 0m0.010s

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message ow 2002-11-30 23:39:48 Re: 7.4 Wishlist
Previous Message Christopher Kings-Lynne 2002-11-30 22:57:05 Re: Bad query optimisation