Re: CTE inlining

From: Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mario Becroft <mb(at)true(dot)group>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ilya Shkuratov <motr(dot)ilya(at)ya(dot)ru>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Serge Rielau <serge(at)rielau(dot)com>
Subject: Re: CTE inlining
Date: 2017-05-05 01:30:47
Message-ID: CAMsr+YG-c+Y4NYWvPtknSpzm13o0zwxhDPkw=dZm7JBtZwg40A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 5 May 2017 at 08:17, Joe Conway <mail(at)joeconway(dot)com> wrote:
> On 05/04/2017 05:03 PM, Craig Ringer wrote:
>> On 5 May 2017 02:52, "Tom Lane" wrote:
>> I haven't been keeping close tabs either, but surely we still have
>> to have
>> the optimization fence in (at least) all these cases:
>>
>> * CTE contains INSERT/UPDATE/DELETE
>> * CTE contains SELECT FOR UPDATE/SHARE (else the set of rows that get
>> locked might change)
>> * CTE contains volatile functions
>>
>> I'm willing to write off cases where, eg, a function should have been
>> marked volatile and was not. That's user error and there are plenty
>> of hazards of that kind already. But if the optimizer has reason
>> to know that discarding the fence might change any query side-effects,
>> it mustn't.
>>
>> I think everyone is in total agreement there.
>
> That's great, but if so, why do we need any change in syntax at all?

Because a lot of people use it as a query hint, and don't want to let
go of the one precious query hint they rely on in PostgreSQL to solve
urgent production issues or work around planner limitations. You can
currently (ab)use a CTE with fencing behaviour to force join order,
force evaluation of expensive functions, etc, if the planner would
otherwise pick a semantically identical plan that lands up running
slower.

Many people don't want to give up their hint because it's an important
tool in their "solve user/customer problems fast" toolbox.

Project policy says we don't want query hints, don't have them, and
won't add them. I understand the reasoning for this (and mostly
agree). But we kind of added one by accident anyway, and it's proving
rather hard to take it away!

We added it by documenting an implementation detail/limitation - that
there's limited or no inlining/pullup/pushdown across CTE terms.
People took that as blessing to rely on that behaviour in future
releases, treating it as an optimisation barrier query hint. Since
then some people have been promoting it as a workaround for
performance issues faced by queries with sub-optimal plans.

People "in the know" used to use OFFSET 0 for that instead, but it was
entirely undocumented and could go away in a future release without
any warning, plus it was uglier. The CTE limitation-as-feature got a
lot more press in blogs, etc since it's documented-ish.

So now we're collectively trying to get to a compromise position where
people who rely on it can keep the functionality by declaring the
no-inline requirement explicitly in their SQL where they depend on it.
That way it won't penalise everyone else who just wants the optimiser
to do its best the rest of the time, so we can just write SQL that
says "I want this outcome" without having to worry about a
semi-documented performance booby-trap. So surprise and confuse users
who're (reasonably) expecting the optimiser to treat a CTE like a
subquery, view or SQL function, those who're writing portable SQL and
those who're porting from other DBMSes.

The sticking point is that this seems to require admitting that it
serves as a hint, of sorts, where it forces the optimiser to choose a
different plan than its cost based estimates would suggest if all
candidate plans are semantically identical.

We're carefully maintaining this bizarre cognitive dissonance where we
justify the need for using this as a planner hint at the same time as
denying that we have a hint. That makes it hard to make progress here.
I think there's fear that we're setting some kind of precedent by
admitting what we already have.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-05-05 01:37:10 Re: modeling parallel contention (was: Parallel Append implementation)
Previous Message Andres Freund 2017-05-05 01:26:36 Re: PG 10 release notes