Re: CTE inlining

From: Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>
To: Vik Fearing <vik(dot)fearing(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Fetter <david(at)fetter(dot)org>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: CTE inlining
Date: 2017-05-09 18:26:24
Message-ID: 8550391494354384@web15m.yandex.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Ok, it seems that most people in discussion are agree that removing optimization
fence is a right thing to do. But so far the main topic was whether it worth to
make "inlining" by default, and how we should enable it.

Nonetheless I still hoping to discuss the algorithm and its implementation.

I suppose, in case of a single reference we can validate CTE subquery and inline it
just before SS_process_ctes() in subquery_planner() and then process remaining
CTEs as before.

The case of multiple reference is more interesting.
Ideally, we would decide whether to inline just before pull_up_sublinks(), so all
the optimizations can be applied to inlined subquery. But It is impossible as we
have no information to build subquery paths and estimate they costs at this point.
All necessary initialization is performed in query_planner(), that invoked far
later in grouping_planner(). (As far as I understand.)

The most straighforward way is to compare CTE scan cost with subquery execution
and result scan cost in set_rel_size(), just after set_cte_pathlist(), and alter
RelOptInfo, if we choose to inline.
(e.g (CTE scan) < (cheapest_path(subquery) + subquery scan))
This way we still can push down predicates as it is performed in
set_subquery_pathlist(), but we missed pull_up_subquery().
Besides, it seems like a dirty quick solution.

Maybe it possible to add subquery scan to RTE_CTE RelOptInfo, but I'm not sure.

So what is a right way to conduct comparison between CTE scan and subquery
execution with subsequent scan?

I am new to PostgreSQL development, so I need a guidance from someone who
familiar with optimizer infrastructure to ensure that I moving in a right
direction and not making something weird.

P.S. There is a paper [1] describing implementation of CTE optimization in Orca
optimizer. It may be useful, though architecture is completely different.

[1] Optimization of Common Table Expressions in MPP Database Systems
(http://www.vldb.org/pvldb/vol8/p1704-elhelw.pdf)

Ilya Shkuratov

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Tiikkaja 2017-05-09 18:31:19 Re: COMPRESS VALUES feature request
Previous Message Erez Segal 2017-05-09 18:18:02 COMPRESS VALUES feature request