Converting NOT IN to anti-joins during planning

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Converting NOT IN to anti-joins during planning
Date: 2019-03-05 23:54:49
Message-ID: CAKJS1f82pqjqe3WT9_xREmXyG20aOkHc-XqkKZG_yMA7JVJ3Tw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Way back in [1] I proposed that we allow NOT IN subqueries to be
converted into an anti-join where the subquery cannot return any NULL
values. As Tom pointed out to me, I had neglected to consider that
the outer side producing NULLs can cause the anti-join plan to produce
incorrect results. The difference is that a NOT IN where the subquery
returns no results filters nothing, otherwise it filters the nulls,
plus the records that exist in the subquery.

More recently over on [2], Jim and Zheng have re-proposed making
improvements in this area. Their ideas are slightly different from
mine as they propose to add an OR .. IS NULL clause to the join
condition to handle the outer side being NULL with empty subquery
problem. Before Jim and Zheng's patch arrived I managed to fix the
known problems with my 4-year-old patch thinking it would have been
welcome, but it seems that's not the case, perhaps due to the
differing ideas we have on how this should work. At that time I didn't
think the other patch actually existed yet... oops

Anyway, I don't really want to drop my patch as I believe what it does
is correct and there's debate on the other thread about how good an
idea adding these OR clauses to the join quals is... (forces nested
loop plan (see [3])), but it appears Jim and Zheng are fairly set on
that idea. Hence...

I'm moving my patch here, so it can be debated without interfering
with the other work that's going on in this area. There has also been
some review of my patch in [4], and of course, originally in [1].

The background is really.

1. Seems fine to do this transformation when there are no nulls.
2. We don't want to cost anything to decide on to do the
transformation or not, i.e do it regardless, in all possible cases
where it's valid to do so. We already do that for NOT EXISTS, no
apparent reason to think this case is any different.
3. Need to consider what planner overhead there is from doing this and
failing to do the conversion due lack of evidence for no NULLs.

I've not done #3, at least not with the latest patch.

There's already a CF entry [5] for this patch, although its targeting PG13.

The latest patch is attached.

[1] https://www.postgresql.org/message-id/CAApHDvqRB-iFBy68%3DdCgqS46aRep7AuN2pou4KTwL8kX9YOcTQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/1550706289606-0.post@n3.nabble.com
[3] https://www.postgresql.org/message-id/CAKJS1f_ZwXtzPz6wDpBXgAVYuxforsqpc6hBw05Y6aPGcOONfA%40mail.gmail.com
[4] https://www.postgresql.org/message-id/18203.1551543939%40sss.pgh.pa.us
[5] https://commitfest.postgresql.org/22/2020/

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
not_in_anti_join_v1.3.patch application/octet-stream 50.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-03-05 23:56:11 Re: Allowing extensions to supply operator-/function-specific info
Previous Message Paul Ramsey 2019-03-05 23:38:12 Re: Allowing extensions to supply operator-/function-specific info