Re: Allowing NOT IN to use ANTI joins

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allowing NOT IN to use ANTI joins
Date: 2014-06-24 10:32:21
Message-ID: CAApHDvqX1vB6qYx6GhFNf0W4ygATDWVJFom809OPL-AGrSeiuA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 11, 2014 at 9:32 PM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:

> On Sun, Jun 8, 2014 at 3:36 PM, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > Currently pull_up_sublinks_qual_recurse only changes the plan for NOT
> EXISTS
> > queries and leaves NOT IN alone. The reason for this is because the
> values
> > returned by a subquery in the IN clause could have NULLs.
>
> There's a bug in targetListIsGuaranteedNotToHaveNulls, you have to
> drill deeper into the query to guarantee the nullability of a result
> column. If a table is OUTER JOINed, it can return NULLs even if the
> original column specification has NOT NULL.
>
> This test case produces incorrect results with your patch:
>
> create table a (x int not null);
> create table b (x int not null, y int not null);
> insert into a values(1);
> select * from a where x not in (select y from a left join b using (x));
>
> Unpatched version correctly returns 0 rows since "y" will be NULL.
> Your patch returns the value 1 from a.
>
>
I'm a bit stuck on fixing this and I can't quite figure out how I should
tell if the TargetEntry is coming from an outer join.

My first attempt does not work as it seems that I'm looking up the wrong
RangeTblEntry with the following:

rte = rt_fetch(tlevar->varno, query->rtable);

if (IS_OUTER_JOIN(rte->jointype))
return true; /* Var from an outer join */

The jointype returns JOIN_INNER when loooking up the RangeTblEntry from the
TargetEntry's varno. It seems that the RangeTblEntry that I need is stored
in query->rtable, but I've just no idea how to tell which item in the list
it is. So if anyone can point me in the right direction then that would be
really useful.

On a more positive or even slightly exciting note I think I've managed to
devise a way that ANTI JOINS can be used for NOT IN much more often. It
seems that find_nonnullable_vars will analyse a quals list to find
expressions that mean that the var cannot be NULL. This means we can
perform ANTI JOINS for NOT IN with queries like:

SELECT * FROM a WHERE id NOT IN(SELECT nullable_col FROM b WHERE
nullable_col = 1);
or
SELECT * FROM a WHERE id NOT IN(SELECT nullable_col FROM b WHERE
nullable_col IS NOT NULL);

(The attached patch implements this)

the nullable_col =1 will mean that nullable_col cannot be NULL, so the ANTI
JOIN can be performed safely. I think this combined with the NOT NULL check
will cover probably just about all valid uses of NOT IN with a subquery...
unless of course I've assumed something wrongly about
find_nonnullable_vars. I just need the correct RangeTblEntry in order to
determine if the TargetEntry is from an out join.

The attached patch is a broken implemention that still needs the lookup
code fixed to reference the correct RTE. The failing regression tests show
where the problems lie.

Any help on this would be really appreciated.

Regards

David Rowley

Attachment Content-Type Size
not_in_anti_join_v0.5_broken.patch application/octet-stream 21.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-06-24 10:32:45 Re: A question about code in DefineRelation()
Previous Message Kevin Grittner 2014-06-24 10:10:20 Re: Add a filed to PageHeaderData