Re: Early WIP/PoC for inlining CTEs

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Fetter <david(at)fetter(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Early WIP/PoC for inlining CTEs
Date: 2019-02-26 14:51:38
Message-ID: 6696.1551192698@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> Here, uncommenting that NOT actually changes the result, from 22 rows to
> 4 rows, because we end up generating multiple worktable scans and the
> recursion logic is not set up to handle that.

Ugh.

> So what I think we need to do here is to forbid inlining if (a) the
> refcount is greater than 1 and (b) the CTE in question contains,
> recursively anywhere inside its rtable or the rtables of any of its
> nested CTEs, a "self_reference" RTE.

That's kind of "ugh" too: it sounds expensive, and doing it in a way
that doesn't produce false positives would be even more complicated.

Idle uncaffeinated speculation: is it practical to fix the restriction
about multiple worktable scans?

Also, I thought of a somewhat-related scenario that the code isn't
accounting for: you can break the restrictions about single evaluation
with nested WITHs, like

with x as not materialized (with y as materialized (select random() r) select * from y)
select * from x, x x1;

In this particular example, we're saved from computing random() twice
by the checks for volatile functions. But without that, y is inlined
and computed twice, e.g.

explain verbose with x as not materialized (with y as (select now() r) select * from y)
select * from x, x x1;
QUERY PLAN
------------------------------------------------
Nested Loop (cost=0.00..0.06 rows=1 width=16)
Output: (now()), (now())
-> Result (cost=0.00..0.01 rows=1 width=8)
Output: now()
-> Result (cost=0.00..0.01 rows=1 width=8)
Output: now()
(6 rows)

As a user I think I'd find that surprising, and bad if y were expensive.

Is it practical to inline the outer "x" level and still compute "y"
only once? If not, I think we need to disallow inlining anything
that contains a "with".

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-02-26 15:20:04 Re: WIP: Avoid creation of the free space map for small tables
Previous Message Kuntal Ghosh 2019-02-26 14:48:42 Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits