Re: Query optimization problem

From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Zotov <zotov(at)oe-it(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Query optimization problem
Date: 2010-07-28 12:02:54
Message-ID: 4C501C6E.6030108@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:
> On Wed, Jul 28, 2010 at 7:24 AM, Yeb Havinga <yebhavinga(at)gmail(dot)com> wrote:
>
>>>> Sorry? I though what Equivalence Class provides is the "proving" that
>>>> using this qualification or another will *not* affect the output.
>>>>
>>> In a query like...
>>>
>>> SELECT d1.ID, d2.ID
>>> FROM DocPrimary d1
>>> JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
>>> WHERE (d1.ID=234409763) or (d2.ID=234409763)
>>>
>>> ...you're going to scan d1, scan d2, and then join the results. The
>>> scan of d1 is going to produce different results depending on whether
>>> you evaluate or not d1.ID=234409763, and the scan of d2 is going to
>>> produce different results depending on whether or not you evaluate
>>> d2.BasedOn=234409763.
>>>
>> Wouldn't it be relatively easy, to rewrite the filter expression by adding
>> expressions, instead of replacing constants, in the disjunctive case, so the
>> example at hand would become:
>>
>> WHERE (d1.ID=234409763) or (d2.ID=234409763)
>> AND (d2.BasedOnID=234409763) or (d2.ID=234409763)
>>
>
> Yeah, that could be done, but it's not necessarily a win from a
> performance standpoint.
>
Not necessarily a win, but on the test case no significant increase in
planning time. It somehow feels like a good idea to give the planner as
much information as possible, i.e. for each rel as much baserestrictinfo's.

I earlier forgot parentheses, the correct query is

SELECT d1.ID, d2.ID
FROM DocPrimary d1
JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
WHERE ((d1.ID=234409763) or (d2.ID=234409763))
AND ((d2.BasedOn=234409763) or (d2.ID=234409763));

by doing this in the rewrite step, triple planning would be avoided. I
suspect that a copyObject of the expression + expression tree mutator
call time during rewrite is negligible compared to plan time, assuming
this is minimal, in this particulare case there doesn't seem to be much
planning time between the three variants.

I ran the script below a number of times, the third time is the one with
expanded expression:

Time: 0.820 ms
Time: 0.859 ms
Time: 0.877 ms
---
Time: 0.617 ms
Time: 0.662 ms
Time: 0.737 ms
---
Time: 0.817 ms
Time: 0.766 ms
Time: 0.826 ms
---
Time: 0.638 ms
Time: 0.700 ms
Time: 0.706 ms
---
Time: 0.463 ms
Time: 0.847 ms
Time: 0.793 ms
---
Time: 0.629 ms
Time: 0.671 ms
Time: 0.703 ms

this was the script (on the relation and index supplied by the OP)

-- warm catalog
explain SELECT d1.ID, d2.ID
FROM DocPrimary d1
JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
WHERE (d1.ID=234409763) or (d2.ID=234409763);

\timing

explain SELECT d1.ID, d2.ID
FROM DocPrimary d1
JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
WHERE (d1.ID=234409763) or (d2.ID=234409763);

explain SELECT d1.ID, d2.ID
FROM DocPrimary d1
JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
WHERE (d2.BasedOn=234409763) or (d2.ID=234409763);

explain SELECT d1.ID, d2.ID
FROM DocPrimary d1
JOIN DocPrimary d2 ON d2.BasedOn=d1.ID
WHERE ((d1.ID=234409763) or (d2.ID=234409763))
AND ((d2.BasedOn=234409763) or (d2.ID=234409763));

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zotov 2010-07-28 12:19:29 Re: **[SPAM]*(8.2)** Re: Query optimization problem
Previous Message Pavel Stehule 2010-07-28 11:43:14 patch saved in commitfest application isn't actual now