Re: Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Implement targetlist SRFs using ROWS FROM() (was Changed SRF in targetlist handling)
Date: 2016-09-14 23:28:25
Message-ID: 557.1473895705@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2016-09-12 19:35:22 -0400, Tom Lane wrote:
>> Anyway I'll draft a prototype and then we can compare.

> Ok, cool.

Here's a draft patch that is just meant to investigate what the planner
changes might look like if we do it in the pipelined-result way.
Accordingly, I didn't touch the executor, but just had it emit regular
Result nodes for SRF-execution steps. However, the SRFs are all
guaranteed to appear at top level of their respective tlists, so that
those Results could be replaced with something that works like
nodeFunctionscan.

A difficulty with this restriction is that if you have a query like
"select f1, generate_series(1,2) / 10 from tab" then you end up with both
a SRF-executing Result and a separate scalar-projection Result above it,
because the division-by-ten has to happen in a separate plan level.
The planner's notions about the cost of Result make it think that this is
quite expensive --- mainly because the upper Result will be iterated once
per SRF output row, so that you get hit with cpu_tuple_cost per output row.
And that in turn leads it to change plans in one or two cases in the
regression tests. Maybe that's fine. I'm worried though that it's right
that this will be unduly expensive. So I'm kind of tempted to define the
SRF-executing node as acting more like, say, Agg or WindowFunc, in that
it has a list of SRFs to execute and then it has the ability to project a
scalar tlist on top of those results. That would likely save some cycles
at execution, and it would also eliminate a few of the planner warts seen
below, like the rule about not pushing a new scalar tlist down onto a
SRF-executing Result. I'd have to rewrite split_pathtarget_at_srfs(),
because it'd be implementing quite different rules about how to refactor
targetlists, but that's not a big problem.

On the whole I'm pretty pleased with this approach, at least from the
point of view of the planner. The net addition of planner code is
smaller than what you had, and though I'm no doubt biased, I think this
version is much cleaner. Also, though this patch doesn't address exactly
how we might do it, it's fairly clear that it'd be possible to allow
FDWs and CustomScans to implement SRF execution, eg pushing a SRF down to
a foreign server, in a reasonably straightforward extension of the
existing upper-pathification hooks. If we go with the lateral function
RTE approach, that's going to be somewhere between hard and impossible.

So I think we should continue investigating this way of doing things.
I'll try to take a look at the executor end of it tomorrow. However
I'm leaving Friday for a week's vacation, and may not have anything to
show before that.

regards, tom lane

Attachment Content-Type Size
put-srfs-in-separate-result-nodes-1.patch text/x-diff 39.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-09-14 23:37:08 Re: Tracking timezone abbreviation removals in the IANA tz database
Previous Message Robert Haas 2016-09-14 23:20:58 Re: Tracking timezone abbreviation removals in the IANA tz database