Re: Index oddity

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: ken <southerland(at)samsixedd(dot)com>
Cc: Rod Taylor <pg(at)rbt(dot)ca>, Postgresql Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Index oddity
Date: 2004-06-10 05:01:58
Message-ID: 19759.1086843718@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

ken <southerland(at)samsixedd(dot)com> writes:
> ... and here is the plan with statistics set to 1000 ...

> Seq Scan on nrgfeature f (cost=0.00..31675.57 rows=18608 width=218)
> (actual time=63.544..1002.701 rows=225 loops=1)
> Filter: ((upperrightx > 321264.236977215::double precision) AND
> (lowerleftx < 324046.799812083::double precision) AND (upperrighty >
> 123286.261898636::double precision) AND (lowerlefty <
> 124985.927450476::double precision) AND (diagonalsize > 49.999::double
> precision))

> ... so yeah, its obviously finding way, way less rows than it thinks it
> will.

Yup. I think your problem here is that the conditions on the different
columns are highly correlated, but the planner doesn't know anything
about that, because it has no cross-column statistics.

You could check that the individual conditions are accurately estimated
by looking at the estimated and actual row counts in

explain analyze
SELECT * FROM nrgfeature f WHERE upperRightX > 321264.23697721504;

explain analyze
SELECT * FROM nrgfeature f WHERE lowerLeftX < 324046.79981208267;

etc --- but I'll bet lunch that they are right on the money with the
higher statistics targets, and probably not too far off even at the
default. The trouble is that the planner is simply multiplying these
probabilities together to get its estimate for the combined query,
and when the columns are not independent that leads to a very bad
estimate.

In particular it sounds like you have a *whole lot* of rows that have
diagonalSize just under 50, and so changing the diagonalSize condition
to include those makes for a big change in the predicted number of rows,
even though for the specific values of upperRightX and friends in your
test query there isn't any change in the actual number of matching rows.

I don't have any advice to magically solve this problem. I would
suggest experimenting with alternative data representations -- for
example, maybe it would help to store "leftX" and "width" in place
of "leftX" and "rightX", etc. What you want to do is try to decorrelate
the column values. leftX and rightX are likely to have a strong
correlation, but maybe leftX and width don't.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Frank van Vugt 2004-06-10 14:24:21 *very* inefficient choice made by the planner (regarding IN(...))
Previous Message Joshua D. Drake 2004-06-10 03:32:10 Re: Index oddity