Quick Links

Re: Discussion on missing optimizations

From:	Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Adam Brusselback <adambrusselback(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Discussion on missing optimizations
Date:	2017-10-07 21:22:52
Message-ID:	6d8d660f-f278-1074-5c06-67ea250d1f08@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 07/10/17 19:59, Tom Lane wrote:
> Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com> writes:
>> On 07/10/17 18:15, Tom Lane wrote:
>>> No, I'm afraid you didn't read that comment closely enough. This will
>>> flat out fail for cases like "select ... where x=x order by x", because
>>> there will already be a single-element EC for x and so the clause will
>>> just disappear entirely. If that doesn't happen, then instead you're
>>> creating an EC with duplicate entries, which is an idea I seriously
>>> dislike; the semantics of that don't seem clear at all to me.
>
>> Hmm it did not disappear (and worked fine in SQL level tests).
>
> I may not be correctly remembering what the query would need to look
> like for there to be single-element ECs in existence at this point; but
> I surely would not like this code to assume that none will exist till
> later.
>
>> I don't
>> think I fully understand the "EC with duplicate entries" part and what's
>> the problem with it so I'll defer to your wisdom there.
>
> Well, as one example, assume that we use your patch and consider what
> happens with
> where x = y and x = x
> vs
> where x = x and x = y
>
> In the first case we end up with an EC that's just {x,y}, because the
> second process_equivalence() will find both sides in the same EC and
> conclude that the second clause is redundant. (Which it is, if the
> equality operators have the same notion of what to do with nulls.)
> In the second case we end up with an EC containing {x,x,y}, which
> at minimum will result in emitting redundant generated quals. I'm
> not sure if it could have any worse consequences than that, but I'm
> not sure it couldn't either. But this is bogus in any case, because
> those WHERE clauses surely should produce identical results.
>
> Looking further ahead, if ECs have to support being multisets rather
> than pure sets, that would put a crimp in ever improving this logic to
> use a smarter UNION-FIND algorithm. (I've not yet seen queries where the
> number of EC members is large enough to make that a serious issue, but
> I think it could happen, particularly with inheritance/partitioning.)
>

Okay, that makes sense, thanks for explanation. Your patch is the way to
go then.

>>> This passes the smell test for me in the sense of not adding any
>>> significant number of planner cycles except when the weird case occurs.
>>> It leaves something on the table in that there are some situations
>>> where X=X could be converted, but we won't do it because we don't see
>>> the clause as being a potential EC (because it's not at top level),
>>> as in the second new regression test case below. I think that's
>>> probably all right; I don't see any way to be more thorough without
>>> adding a lot of new cycles all the time, and I don't believe this is
>>> worth that.
>
>> My code had same issue. I think going deeper would require quite a bit
>> of new code (and cycles) for something that's even less likely to happen
>> than simple X=X while the current patch is quite cheap win.
>
> Yeah. I'm not really convinced it's a big win, but it's so cheap we
> might as well do it. The only case where it would expend cycles and
> fail to give an improvement is if we have X=X with a non-strict operator,
> which I think is a case that never arises in practice at present,
> because btree effectively assumes that all btree operators are strict.
> (Someday we might want to relax that, though, which is why I don't
> want to let the planner just assume it.)
>

In real-life probably not, but seems like these small bits can be useful
marketing wise, given articles like the one which started this. And it
should not hurt anything either.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Re: Discussion on missing optimizations at 2017-10-07 17:59:20 from Tom Lane

Responses

Re: Discussion on missing optimizations at 2017-10-08 15:17:01 from Adam Brusselback
Re: Discussion on missing optimizations at 2017-10-08 16:24:50 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jack Christensen	2017-10-07 21:56:26	Prepared statements assume text type in PG10
Previous Message	Tom Lane	2017-10-07 20:46:55	Re: Horrible CREATE DATABASE Performance in High Sierra