Re: convert EXSITS to inner join gotcha and bug

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: convert EXSITS to inner join gotcha and bug
Date: 2017-04-29 03:39:40
Message-ID: 14538.1493437180@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
>> (On 29 April 2017 at 02:26, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Seems related to the unconditional setting of extra.inner_unique to
>> true for JOIN_UNIQUE_INNER jointypes in add_paths_to_joinrel()
>> Setting this based on the return value of innerrel_is_unique() as done
>> with the other join types seems to fix the issue.

> Yes, I think that's correct.

Well, "make check-world" disabused me of that notion: there are several
test cases in postgres_fdw that lost perfectly-valid inner_unique
markings. The reason is that create_unique_path will create uniqueness
even when you couldn't prove it from the underlying rel itself. So my
previous thought about comparing the outerrel to sjinfo->min_lefthand is
really necessary to avoid regressions from what we had before.

However, while that seems to be enough to generate correct plans, it
doesn't address Teodor's performance complaint: he's wishing the planner
would notice that the semijoin inner rel is effectively unique, even when
the best plan involves initially joining the semijoin inner rel to just
a subset of the semijoin outer --- against which that inner rel is *not*
unique. Applying innerrel_is_unique() helps for some simpler cases, but
not this one.

Really, the way to fix Teodor's complaint is to recognize that the
semijoin inner rel is effectively unique against the whole outer rel,
and then strength-reduce the semijoin to a plain join. The infrastructure
we built for unique joins is capable of proving that, we just weren't
applying it in the right way.

Attached are two alternative patches. The first just does the minimum
necessary to fix the bug; the second adds some code to perform
strength-reduction of semijoins. The second patch is capable of finding
the plan Teodor wanted for his test case --- in fact, left to its own
devices, it finds a *better* plan, both by cost and actual runtime.

I'm kind of strongly tempted to apply the second patch; but it would
be fair to complain that reduce_unique_semijoins() is new development
and should wait for v11. Opinions?

regards, tom lane

Attachment Content-Type Size
minimum-inner-unique-fix.patch text/x-diff 4.1 KB
reduce-semijoins.patch text/x-diff 17.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-04-29 03:56:18 Re: Transition tables for triggers on foreign tables and views
Previous Message Robert Haas 2017-04-29 01:07:17 Re: [PROPOSAL] Use SnapshotAny in get_actual_variable_range