Re: CTE inlining

From: Serge Rielau <serge(at)rielau(dot)com>
To: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Mario Becroft <mb(at)true(dot)group>
Subject: Re: CTE inlining
Date: 2017-05-04 15:04:09
Message-ID: 651A1657-3645-40B2-8202-3BE0B2A35FD1@rielau.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On May 4, 2017, at 3:02 AM, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz> wrote:
>
> On 30/04/17 16:28, Tom Lane wrote:
>> Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com> writes:
>>> - as you noted, it is hard to decide when it's worth inlining vs
>>> materializing for CTE terms referenced more than once.
>> [ raised eyebrow... ] Please explain why the answer isn't trivially
>> "never".
>>
>> There's already a pretty large hill to climb here in the way of
>> breaking peoples' expectations about CTEs being optimization
>> fences. Breaking the documented semantics about CTEs being
>> single-evaluation seems to me to be an absolute non-starter.
>>
>> regards, tom lane
>>
>>
> Could not each CTE be only evaluated once, but restricted (as far as is practicable) to the rows actually needed by the body of the SELECT?
>
Tom,

Are you worried about semantics or performance?
With proper detection of mutating functions and snapshot isolation I do not see how a user would detect “lack of” single evaluation.
As for performance we’d be talking about what? An uncorrelated inner of a nested loop join?

Anyway it seems to me that there a multiple properties at play here which are quite orthogonal.

1. Full materialization/Slow materialization/pipelining
I cannot come up with any scenario where full materialization would be beneficial from a performance point of view (which speaks to Gavin’s view).
I can see it from a semantic point of view when order of execution may matter (for example with embedded DML and triggers present).
As soon as semantics are at play having syntax is absolutely the right thing: +1 for MATERIALIZE
2.Pushing predicates (or other operators) into the CTE.
All this can ever do is reduce the number of rows being looked at.
As long as the optimizer is careful, not to do what it isn’t supposed to do in a nested query (push past a mutating function) I don’t see the harm
3. Replicating the CTE to push distinct operators from different consumers.
Again this can only be done if there are no mutators or non deterministic operators.

To summarize: big +1 to preserve the existing behavior with MATERIALIZE and to set CTE’s free by default with the onus on the optimizer to prove semantic equivalence.

Cheers
Serge Rielau
Salesforce.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2017-05-04 15:36:39 Re: Should pg_current_wal_location() become pg_current_wal_lsn()
Previous Message Chapman Flack 2017-05-04 14:59:12 idea: custom log_line_prefix components besides application_name