Re: <> join selectivity estimate question

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: <> join selectivity estimate question
Date: 2017-07-20 22:40:43
Message-ID: CAEepm=3=NHHko3oOzpik+ggLy17AO+px3rGYrg3x_x05+Br9-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 21, 2017 at 8:21 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> writes:
>> On Thu, Jul 20, 2017 at 5:30 PM, Thomas Munro
>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>>> Does anyone know how to test a situation where the join is reversed according to
>>> get_join_variables, or "complicated cases where we can't tell for sure"?
>
>> explain select * from pg_class c right join pg_type t on (c.reltype =
>> t.oid); would end up with *join_is_reversed = true; Is that what you
>> want? For a semi-join however I don't know how to induce that. AFAIU,
>> in a semi-join there is only one direction in which join can be
>> specified.
>
> You just have to flip the <> clause around, eg instead of
>
> explain analyze select * from tenk1 t
> where exists (select 1 from int4_tbl i where t.ten <> i.f1);
>
> do
>
> explain analyze select * from tenk1 t
> where exists (select 1 from int4_tbl i where i.f1 <> t.ten);
>
> No matter what the surrounding query is like exactly, one or the
> other of those should end up "join_is_reversed".

Ahh, I see. Thanks for the explanation.

> This would be a bit harder to trigger for equality clauses, where you'd
> have to somehow defeat the EquivalenceClass logic's tendency to rip the
> clauses apart and reassemble them according to its own whims. But for
> neqjoinsel that's not a problem.
>
>> I didn't get the part about "complicated cases where we can't tell for sure".
>
> You could force that with mixed relation membership on one or both sides
> of the <>, for instance "(a.b + b.y) <> a.c". I don't think it's
> especially interesting for the present purpose though, since we're going
> to end up with 1.0 selectivity in any case where examine_variable can't
> find stats.

Thanks. Bearing all that in mind, I ran through a series of test
scenarios and discovered that my handling for JOIN_ANTI was wrong: I
thought that I had to deal with inverting the result, but I now see
that that's handled elsewhere (calc_joinrel_size_estimate() I think).
So neqjoinsel should just treat JOIN_SEMI and JOIN_ANTI exactly the
same way.

That just leaves the question of whether we should try to handle the
empty RHS and single-value RHS cases using statistics. My intuition
is that we shouldn't, but I'll be happy to change my intuition and
code that up if that is the feedback from planner gurus.

Please find attached a new version, and a test script I used, which
shows a bunch of interesting cases. I'll add this to the commitfest.

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
neqjoinsel-fix-v3.patch application/octet-stream 2.7 KB
test-neqjoinsel.sql application/octet-stream 3.1 KB
test-neqjoinsel.out application/octet-stream 14.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2017-07-20 23:09:14 Better error message for trying to drop a DB with open subscriptions?
Previous Message Alvaro Herrera 2017-07-20 22:23:05 Re: PgFDW connection invalidation by ALTER SERVER/ALTER USER MAPPING