Re: NOT IN subquery optimization

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Jim Finnerty <jfinnert(at)amazon(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: NOT IN subquery optimization
Date: 2019-02-25 12:31:53
Message-ID: CAKJS1f_OA5VeZx8A8H8mkj3uqEgOtmHBGCUA6+xqgmUJ6JQURw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 22 Feb 2019 at 09:44, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> I've attached the rebased and still broken version.

I set about trying to make a less broken version of this.

A quick reminder of the semantics of NOT IN:

1. WHERE <nullable_column> NOT IN(SELECT <not null column> FROM table);

If table is non-empty:
will filter out rows where <nullable_column> is NULL
and only show values that are not in <not null column>

If table is empty:
Filters nothing.

2. WHERE <nonnullable_column> NOT IN(SELECT <null column> FROM table);

If table contains NULLs in the <null column> no records will match.

The previous patch handled #2 correctly but neglected to do anything
about #1. For #1 the only way we can implement this as a planner only
change is to insist that the outer side expressions also are not null.
If we could have somehow tested if "table" was non-empty then we could
have added a IS NOT NULL clause to the outer query and converted to an
anti-join, but ... can't know that during planning and can't add the
IS NOT NULL regardless as, if "table" is empty we will filter NULLs
when we shouldn't.

In the attached, I set about fixing #1 by determining if the outer
expressions could be NULL by checking

1. If expression is a Var from an inner joined relation it can't be
NULL if there's a NOT NULL constraint on the column; or
2. If expression is a Var from an inner joined relation and there is a
strict WHERE/ON clause, the expression can't be NULL; or
3. If expression is a Var from an outer joined relation check for
quals that were specified in the same syntactical level as the NOT IN
for proofs that NULL will be filtered.

An example of #3 is:

SELECT * FROM t1 LEFT JOIN t2 on t1.a = t2.a WHERE t2.a IS NOT NULL
AND t2.a NOT IN(SELECT a FROM t3); -- t2 becomes INNER JOINed later in
planning, but...
or;
SELECT * FROM t1 LEFT JOIN t2 on t1.a = t2.a AND t2.a NOT IN(SELECT a FROM t3);

In the latter of the two, the t1.a = t2.a join conditions ensures that
NULLs can't exist where the NOT IN is evaluated.

I implemented #3 by passing the quals down to
pull_up_sublinks_qual_recurse(). At the top level call 'node' and
'notnull_proofs' are the same, but that changes in recursive calls
like the one we make inside the is_andclause() condition.

Comments welcome.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
not_in_anti_join_v1.1.patch application/octet-stream 47.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-02-25 12:47:41 Re: INSERT ... OVERRIDING USER VALUE vs GENERATED ALWAYS identity columns
Previous Message Ibrar Ahmed 2019-02-25 12:21:41 Re: Temporal Table Proposal