Re: Pushing down a subquery relation's ppi_clauses, and more ...

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Pushing down a subquery relation's ppi_clauses, and more ...
Date: 2025-07-28 13:29:06
Message-ID: 6d607488-0442-452b-b536-7f16a172e05f@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 26/7/2025 05:09, Richard Guo wrote:
> Here, I'd like to discuss whether it's worthwhile to also consider
> pushing down a subquery relation's ppi_clauses if the subquery is
> LATERAL.
In my opinion, this direction makes sense. Moreover, I have seen
sophisticated cases where SQL Server pushes parameterisation through
GROUP BY down into a subquery, significantly speeding up execution.

> First, it's important to note that pushing down ppi_clauses doesn't
> always result in a better execution plan. While doing so can reduce
> the amount of data processed in each aggregation invocation within the
> subquery, it also means that the aggregation needs to be re-evaluated
> for every outer tuple. If t1 is very small and t2 is large, pushing
> down ppi_clauses can be a win. As t1 gets larger, this gets less
> attractive, and eventually it will have a higher cost than the current
> plan, where the aggregation is evaluated only once.
Heh, let me propose a way to mitigate the issue I implemented in the
Postgres fork. Instead of implementing numerous 'subplan flattening'
transformations, I found that we can smooth the performance cliff by
inserting a Memoise node at the top of the subplan. It reduces subplan
evaluations in case we have duplicated parameter values.
It is designed close to the subplan hashing feature, but, of course,
logically distinct: it requires a top-down step after the bottom-up
planning. It has some limitations, but if you have the resources to
restructure the planning procedure slightly, it may be feasible in the
Postgres core as well.

>
> Therefore, if we decide to pursue this approach, we would need to
> generate two paths: one with the ppi_clauses pushed down, and one
> without, and then compare their costs. A potential concern is that
> this might require re-planning the subquery twice, which could
> increase planning overhead.
Here, we also designed an approach that may provide some insights for
the further core development.
Correlated subquery pull-up techniques always have bad corner cases
(like you proposed). We added an extension list field to PlannerGlobal
and PlannedStmt, enabling features to report the upper-level caller. The
caller may build a series of plans with and without these contradictory
features applied and compare the costs.
I implemented the 'plan probing' technique in the GetCachedPlan, which
is obviously has the most chances to be profitable because it is reused
multiple times and has the infrastructure to track previous planning
efforts. At the high architectural level, it seems close to the current
plan cache auto mode logic: try options, compare costs, and remember
decisions.

I'm not sure it provides any answers - just existing techniques to ponder.

--
regards, Andrei Lepikhov

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitrios Apostolou 2025-07-28 13:34:20 Re: [PING] [PATCH v2] parallel pg_restore: avoid disk seeks when jumping short distance forward
Previous Message Andrew Dunstan 2025-07-28 13:27:58 Re: Explicitly enable meson features in CI