Re: Early WIP/PoC for inlining CTEs

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Fetter <david(at)fetter(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Early WIP/PoC for inlining CTEs
Date: 2019-02-09 20:52:52
Message-ID: 1445.1549745572@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Tom> After further reflection I really don't like Andrew's suggestion
> Tom> that we not document the rule that multiply-referenced CTEs won't
> Tom> be inlined by default. That would be giving up the principle that
> Tom> WITH calculations are not done multiple times by default, and I
> Tom> draw the line at that. It's an often-useful behavior as well as
> Tom> one that's been documented from day one, so I do not accept the
> Tom> argument that we might someday override it on the basis of nothing
> Tom> but planner cost estimates.

> The case that springs to mind is when a CTE with grouping is then joined
> multiple times in the main query with different conditions. If the
> planner is able to deduce (e.g. via ECs) that restrictions on grouped
> columns can be pushed into the CTE, then inlining the CTE multiple times
> might be a significant win. But if that isn't possible, then inlining
> multiple times might be a significant loss.

Sure, but this is exactly the sort of situation where we should offer
a way for the user to force either decision to be made. I think it's
very unlikely that we'll ever be in a position to make a realistic
cost-based decision for that. Actually planning it out both ways would
be horrendously expensive (and probably none too reliable anyway, given
how shaky ndistinct estimates tend to be); and we certainly don't have
enough info to make a smart choice without doing that.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-02-09 21:06:30 Re: dsa_allocate() faliure
Previous Message Tom Lane 2019-02-09 20:26:55 Re: Fixing findDependentObjects()'s dependency on scan order (regressions in DROP diagnostic messages)