Re: Converting NOT IN to anti-joins during planning

From: Antonin Houska <ah(at)cybertec(dot)at>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Converting NOT IN to anti-joins during planning
Date: 2019-05-27 08:43:40
Message-ID: 28175.1558946620@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:

> On Wed, 6 Mar 2019 at 12:54, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> > The latest patch is attached.
>
> Rebased version after pgindent run.

I've spent some time looking into this.

One problem I see is that SubLink can be in the JOIN/ON clause and thus it's
not necessarily at the top of the join tree. Consider this example:

CREATE TABLE a(i int);
CREATE TABLE b(j int);
CREATE TABLE c(k int NOT NULL);
CREATE TABLE d(l int);

SELECT *
FROM
a
JOIN b ON b.j NOT IN
( SELECT
c.k
FROM
c)
JOIN d ON b.j = d.l;

Here the b.j=d.l condition makes the planner think that the "b.j NOT IN
(SELECT c.k FROM c)" sublink cannot receive NULL values of b.j, but that's not
true: it's possible that ((a JOIN b) ANTI JOIN c) is evaluated before "d" is
joined to the other tables, so the NULL values of b.j are not filtered out
early enough.

I thought it would help if find_innerjoined_rels(), when called from
expressions_are_not_nullable(), only collected rels (and quals) from the
subtree below the sublink, but that does not seem to help:

CREATE TABLE e(m int);

SELECT *
FROM
a
JOIN e ON a.i = e.m
JOIN b ON a.i NOT IN
( SELECT
c.k
FROM
c)
JOIN d ON COALESCE(a.i, 0) = COALESCE(d.l, 0);

Here it might seem that the a.i=e.m condition eliminates NULL values from the
ANTI JOIN input, but it's probably hard to prove at query preparation time
that

(((a JOIN e) JOIN b) ANTI JOIN c) JOIN d

won't eventually be optimized to

(((a JOIN d) JOIN b) ANTI JOIN c) JOIN e

Since the join condition between "a" and "d" is not strict in this case, the
ANTI JOIN will receive the NULL values of a.i.

It seems tricky, I've got no idea of an alternative approach right now.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2019-05-27 09:26:58 Pinned files at Windows
Previous Message Michael Paquier 2019-05-27 08:33:30 Re: Why does pg_checksums -r not have a long option?