Re: Converting NOT IN to anti-joins during planning

From: Antonin Houska <ah(at)cybertec(dot)at>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Converting NOT IN to anti-joins during planning
Date: 2019-08-13 09:49:24
Message-ID: 3793.1565689764@linux-edt6
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:

> On Mon, 27 May 2019 at 20:43, Antonin Houska <ah(at)cybertec(dot)at> wrote:
> > I've spent some time looking into this.
>
> Thank you for having a look at this.
>
> > One problem I see is that SubLink can be in the JOIN/ON clause and thus it's
> > not necessarily at the top of the join tree. Consider this example:
> >
> > CREATE TABLE a(i int);
> > CREATE TABLE b(j int);
> > CREATE TABLE c(k int NOT NULL);
> > CREATE TABLE d(l int);
> >
> > SELECT *
> > FROM
> > a
> > JOIN b ON b.j NOT IN
> > ( SELECT
> > c.k
> > FROM
> > c)
> > JOIN d ON b.j = d.l;
>
> hmm yeah. Since the proofs that are being used in
> expressions_are_not_nullable assume the join has already taken place,
> then we'll either need to not use the join conditions are proofs in
> that case, or just disable the optimisation instead. I think it's
> fine to just disable the optimisation since it seem rather unlikely
> that someone would write a join condition like that. Plus it seems
> quite a bit more complex to validate that the optimisation would even
> be correct if NULLs were not possible.
>
> I've attached a patch which restricts the pullups to FromExpr quals.
> Anything below a JoinExpr disables the optimisation now.

ok. The planner pulls-up other sublinks located in the ON clause, but it'd be
quite tricky to do the same for the NOT IN case.

Now that we only consider the WHERE clause, I wonder if the code can be
simplified a bit more. In particular, pull_up_sublinks_jointree_recurse()
passes valid pointer for notnull_proofs to pull_up_sublinks_qual_recurse(),
while it also passes under_joinexpr=true. The latter should imply that NOT IN
won't be converted to ANTI JOIN anyway, so no notnull_proofs should be needed.

BTW, I'm not sure if notnull_proofs=j->quals is correct in cases like this:

case JOIN_LEFT:
j->quals = pull_up_sublinks_qual_recurse(root, j->quals,
&j->rarg,
rightrelids,
NULL, NULL, j->quals,
true);

Even if j->quals evaluates to NULL or FALSE (due to NULL value on its input),
it does not remove any rows (possibly containing NULL values) from the input
of the SubLink's expression.

I'm not even sure that expressions_are_not_nullable() needs the notnull_proofs
argument. Now that we only consider SubLinks in the WHERE clause, it seems to
me that nonnullable_vars is always a subset of nonnullable_inner_vars, isn't
it?

A few more minor findings:

@@ -225,10 +227,13 @@ pull_up_sublinks(PlannerInfo *root)
*
* In addition to returning the possibly-modified jointree node, we return
* a relids set of the contained rels into *relids.
+ *
+ * under_joinexpr must be passed as true if 'jtnode' is or is under a
+ * JoinExpr.
*/
static Node *
pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode,
- Relids *relids)
+ Relids *relids, bool under_joinexpr)
{
if (jtnode == NULL)
{

The comment "if 'jtnode' is or is under ..." is unclear.

* is_NOTANY_compatible_with_antijoin()

** "'outerquery' is the parse of the query" -> "'outerquery' is the parse tree of the query"

** "2. We assume that each join qual is an OpExpr" -> "2. We assume that
each sublink expression is an OpExpr" ?

** (OpExpr *) lfirst(lc) -> lfirst_node(OpExpr, lc)

** The kind of bool expression (AND_EXPR) might need to be tested here too:

+ /* Extract exprs from multiple expressions ANDed together */
+ else if (IsA(testexpr, BoolExpr))

* find_innerjoined_rels()

"we locate all WHERE and JOIN/ON quals that constrain these rels add them to"
->
" ... and add them ..."

* get_attnotnull()

The comment says that FALSE is returned if the attribute is dropped, however
the function it does not test att_tup->attisdropped. (This patch should not
call the function for a dropped attribute, so I'm only saying that the
function code is not consistent with the comment.)

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Feike Steenbergen 2019-08-13 09:50:18 Feature: Use DNS SRV records for connecting
Previous Message amul sul 2019-08-13 09:38:34 Re: Some memory not freed at the exit of RelationBuildPartitionDesc()