Re: Query optimization problem

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Yeb Havinga <yebhavinga(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Zotov <zotov(at)oe-it(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query optimization problem
Date: 2010-07-28 14:34:56
Message-ID: 13738.1280327696@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Yeb Havinga <yebhavinga(at)gmail(dot)com> writes:
> Robert Haas wrote:
>> On Wed, Jul 28, 2010 at 7:24 AM, Yeb Havinga <yebhavinga(at)gmail(dot)com> wrote:
>>> Wouldn't it be relatively easy, to rewrite the filter expression by adding
>>> expressions, instead of replacing constants, in the disjunctive case, so the
>>> example at hand would become:
>>>
>>> WHERE (d1.ID=234409763) or (d2.ID=234409763)
>>> AND (d2.BasedOnID=234409763) or (d2.ID=234409763)

>> Yeah, that could be done, but it's not necessarily a win from a
>> performance standpoint.

> Not necessarily a win, but on the test case no significant increase in
> planning time.

The problem is that it could cost you a lot in execution time, because
of the useless extra filter conditions that will be applied. The
planner isn't going to notice that those conditions are redundant.
An even worse problem is that because it doesn't know that, it's going
to underestimate the combined selectivity of the two WHERE conditions,
resulting in drastic underestimates of the numbers of rows emitted,
possibly resulting in horribly bad plan choices that kill whatever
performance improvement you got at the bottom level.

What the EquivalenceClass machinery actually buys us is the ability to
deal with a set of partially-redundant possible filter conditions and
apply only enough of them to get a correct plan. As an example, if the
query has A=B and B=C, we could deduce A=C, but we don't want to apply
all three equality conditions at runtime. Instead we put all three
variables into an EC, and then there is logic to determine which of the
equality clauses implied by the EC should actually get applied where.
This avoids both the useless-checks-at-runtime problem and the problem
of wrong selectivity estimates.

To do something like this without generating stupid plans, we'd need
some sort of generalized EC mechanism that could figure out which
variants of the clause made the most sense in different contexts.
Or maybe something else entirely --- but just generating a lot of
variants of a clause and throwing them all into the existing mechanism
is not workable.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2010-07-28 14:57:22 Re: Toward a column reorder solution
Previous Message Simon Riggs 2010-07-28 13:49:57 Re: ALTER TABLE SET STATISTICS requires AccessExclusiveLock