Re: CTE inlining

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>
Cc: Vik Fearing <vik(dot)fearing(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Fetter <david(at)fetter(dot)org>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CTE inlining
Date: 2017-05-09 21:14:19
Message-ID: 21113.1494364459@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru> writes:
> Ok, it seems that most people in discussion are agree that removing optimization
> fence is a right thing to do.
> Nonetheless I still hoping to discuss the algorithm and its implementation.

Yeah, so far we've mainly discussed whether to do that and how to control
it, not what the actual results would be.

> I suppose, in case of a single reference we can validate CTE subquery and inline it
> just before SS_process_ctes() in subquery_planner() and then process remaining
> CTEs as before.

Yeah, something like that ought to be possible. I'd prefer to think of it
as related to the work prepjointree.c does, ie pull_up_sublinks, which
raises the question why we do SS_process_ctes before that not after it.

You might want to start by postponing planning of CTE subqueries as late
as possible, without any other changes. In particular, I think it'd be
a good thing to postpone Path-to-Plan conversion of the subqueries
until much later, ideally not till the final toplevel create_plan() call.
This is tied up with early Plan creation for regular subqueries too, so
that might have to be an all-or-nothing conversion. But we aren't really
going to have much flexibility of planning for subqueries until we do
that.

> The case of multiple reference is more interesting.
> Ideally, we would decide whether to inline just before pull_up_sublinks(), so all
> the optimizations can be applied to inlined subquery. But It is impossible as we
> have no information to build subquery paths and estimate they costs at this point.

TBH, I would just ignore that completely, at least until the simpler
case is done and committed. Trying to bite that off as part of the
initial patch is likely to lead to never getting anything done at all.
And I'm not exactly convinced either that it will win often enough to
be worth the trouble, or that users would thank you for rewriting their
queries that way.

When and if we get to this, you could imagine tackling it a bit like
preprocess_minmax_aggregates, or like what I did with OR clauses in
<22002(dot)1487099934(at)sss(dot)pgh(dot)pa(dot)us>, ie just repeat a lot of the work
over again to get several complete Path trees, and then pick one.
This is kind of expensive and ugly but it's hard to see how to do it
sensibly in a bottom-up fashion.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2017-05-09 21:15:30 Re: [GSOC 17] Eliminate O(N^2) scaling from rw-conflict tracking in serializable transactions
Previous Message Daniel Verite 2017-05-09 21:00:31 Re: export import bytea from psql