Re: pull-up subquery if JOIN-ON contains refs to upper-query

From: Peter Petrov <pspetrov91(at)gmail(dot)com>
To: Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
Cc: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: pull-up subquery if JOIN-ON contains refs to upper-query
Date: 2026-05-08 17:45:31
Message-ID: CAJ-KsAw0zS_OEcEraRdgD0wJF5-BHW9nWy286cyQryzo_HTD2Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Alena!

Sorry for my long silence, if you are interested in my thoughts about your
patch then here they are:

1) First of all, let's read the comment below.

/*
* Separate out the WHERE clause. (We could theoretically also remove
* top-level plain JOIN/ON clauses, but it's probably not worth the
* trouble.)
*/

We need to separate two things: the jointree and the WHERE clause, so it's
possible to do something like this

whereClause = subselect->jointree->quals;
subselect->jointree->quals = NULL;

jointree = subselect->jointree;
subselect->jointree = NULL;

if (contain_vars_of_level((Node *) subselect, 1))
return NULL;

/* Do our checks in the jointree and stop if we can't do pullup */

/* Return the jointree back */
subselect->jointree = jointree;

I think it's more clear than in the patch right now.

2) We don't need to use get_relids_in_jointree() and nullable_above since
the tree's traversing is from the top to the bottom and we know that

in LEFT JOIN and FULL JOIN LHS is nullable
in RIGHT JOIN and FULL JOIN RHS is nullable

So we can use a boolean variables like this

rarg_is_nullable = (is_nullable_side ||
j->jointype == JOIN_FULL ||
j->jointype == JOIN_LEFT);

larg_is_nullable = (is_nullable_side ||
j->jointype == JOIN_FULL ||
j->jointype == JOIN_RIGHT);

And then work with them. I suspect it will be much easier to follow that in
the patch right now.

I fear that you don't check FULL JOINS here.

3) To be honest, we just work with the top jointree, we don't descend to
subqueries, therefore, I am not sure that the mutator is a good name here.

AFAICS, mutators were designed to modify something including some parts
of the subqueries but it's not the case here.

It's a simple jointree traversal, we don't need HoistJoinQualsContext
as well.

I think, we need three things here:

Node *node - the node in the jointree we are working with

bool is_nullable_side - are we on the nullable side of some outer join

List **exprs - the list in which we collect JoinExpr and FromExpr with
outer references.

I propose just traverse the jointree, make our checks and then collect
JoinExpr and FromExpr for the further processing if everything is good.

4) After checking the WHERE clause and the jointree we can traverse our
list, make a new whereClause by appending quals with outer references and
then replace quals in JoinExprs to constant true.

Something like make_and_qual()

5) I have also noticed that you are using canonicalize_qual()

I suspect we don't need it either since it will be called later. And
here is the stack

subquery_planner
pull_up_sublinks(root)
preprocess_qual_conditions(root, (Node *) parse->jointree)
j->quals = preprocess_expression(root, j->quals, EXPRKIND_QUAL)
expr = (Node *) canonicalize_qual((Expr *) expr, false)
find_duplicate_ors(qual, is_check)

So we will flatten all nested BoolExpr with the type BOOL_AND.

6) I have noticed the new output from one regression test. The previous
output was

Merge Anti Join
Merge Cond: (t1.c1 = t2.c2)
-> Sort
Sort Key: t1.c1
-> Seq Scan on tt4x t1
-> Sort

and the new output is the Hash Anti Join

Hash Anti Join
Hash Cond: (t1.c1 = t2.c2)
-> Seq Scan on tt4x t1
-> Hash

This is very suspicious, why does this patch cause such changes?

7) There is a SubLink which won't be pulled up, you can see it below

explain (COSTS OFF)
SELECT *
FROM tenk1 A
WHERE EXISTS (SELECT 1
FROM tenk2 B, (SELECT f.hundred
FROM (SELECT
A.hundred) AS f
) AS v
WHERE B.hundred = v.hundred
);
And the reason is the following. (SELECT A.hundred) is the subquery and
won't be processed until the function pull_up_sublinks() finishes its job.

But it leads us to an interesting observation. Maybe we could process it
somewhere in the SS_process_sublinks().

Perhaps, f and v subqueries could be pulled up and then the EXISTS sublink
could be replaced with a SEMI JOIN.

I don't know whether we should solve this problem right now. I have never
seen such queries in real workloads but it doesn't mean that they don't
exist.

In short, I think your patch can be simplified.

пт, 8 мая 2026 г. в 20:37, Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>:

>
> Feel free to review my patch!
>
> -----------
> Best regards,
> Yandex Cloud
> Alena Rybakina
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Previous Message Petr Petrov 2026-05-08 17:27:48 Re: pull-up subquery if JOIN-ON contains refs to upper-query