Re: CTE push down

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE push down
Date: 2021-04-14 13:01:46
Message-ID: CAExHW5vBx3RuEO2cAG0hhHY5MayrMcvdeGrapZUc5rGsS6sSTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 13, 2021 at 6:58 PM Alexander Pyhalov
<a(dot)pyhalov(at)postgrespro(dot)ru> wrote:
>
> Hi.
>
> Currently PostgreSQL supports CTE push down for SELECT statements, but
> it is implemented as turning each CTE reference into subquery.
>
> When CTE is referenced multiple times, we have choice - to materialize
> CTE (and disable quals distribution to the CTE query) or inline it (and
> so run CTE query multiple times,
> which can be inefficient, for example, when CTE references foreign
> tables).
>
> I was looking if it is possible to collect quals referencing CTE,
> combine in OR qual and add them to CTE query.
>
> So far I consider the following changes.
>
> 1) Modify SS_process_ctes() to add a list of RestrictInfo* to
> PlannerInfo - one NULL RestrictInfo pointer per CTE (let's call this
> list cte_restrictinfos for now)/
> 2) In distribute_restrictinfo_to_rels(), when we get rel of RTE_CTE
> relkind and sure that can safely pushdown restrictinfo, preserve
> restrictinfo in cte_restrictinfos, converting multiple restrictions to
> "OR" RestrictInfos.
> 3) In the end of subquery_planner() (after inheritance_planner() or
> grouping_planner()) we can check if cte_restrictinfos contain some
> non-null RestrictInfo pointers and recreate plan for corresponding CTEs,
> distributing quals to relations inside CTE queries.
>
> For now I'm not sure how to handle vars mapping when we push
> restrictinfos to the level of cte root or when we push it down to the
> cte plan, but properly mapping vars seems seems to be doable.

I think similar mapping happens when we push quals that reference a
named JOIN down to join rels. I didn't take a look at it, but I think
it happens before planning time. But some similar machinary might help
in this case.

I believe step2 is needed to avoid materializing rows which will never
be selected. That would be a good improvement. However, care needs to
be taken for volatile quals. I think, the quals on CTE will be
evaluated twice, once when materializing the CTE result and second
time when scanning the materialized result. volatile quals may produce
different results when run multiple times.

>
> Is there something else I miss?
> Does somebody work on alternative solution or see issues in such
> approach?

IMO, a POC patch will help understand your idea.

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2021-04-14 13:20:59 Re: [PATCH] Identify LWLocks in tracepoints
Previous Message Noah Misch 2021-04-14 12:58:11 Re: Converting contrib SQL functions to new style