Re: Re: Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Query fails when SRFs are part of FROM clause (Commit id: 69f4b9c85f)
Date: 2017-04-12 22:58:55
Message-ID: 20170412225855.oklyfkbwj2dmbrpu@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017-04-05 09:39:37 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2017-04-05 02:47:55 -0400, Noah Misch wrote:
> >> [Action required within three days. This is a generic notification.]
>
> > I've a very preliminary patch. I'd like to only start polishing it up
> > once the code freeze is over, so I can work on getting some patches in -
> > note that I myself have no pending patches. Once frozen I'll polish it
> > up and send that within a few days.
>
> FWIW, I'm willing to help out on this.

Help would be appreciated. I've pondered, and partially implemented,
several approaches so far, and my conclusion is that we should just do
nothing. The amount of corner cases is just too big, and the utility of
the feature too small.

To recap, the issue is that in previous versions it was, by accident
(there's no test, code comments "supporting" the feature talk about a
different corner case, and the behaviour isn't correct in some cases)
allowed to do something like:
SELECT * FROM CAST(generate_series(1,3) * 5 AS int);
while
SELECT * FROM generate_series(1,3) * 5;
is not allowed.

The reason that that works from the gram.y perspective is that CAST etc
types of func_expr's. The reason that it worked from a code perspective
is that execQual.c:ExecMakeTableFunctionResult() has the following:
/*
* Normally the passed expression tree will be a FuncExprState, since the
* grammar only allows a function call at the top level of a table
* function reference. However, if the function doesn't return set then
* the planner might have replaced the function call via constant-folding
* or inlining. So if we see any other kind of expression node, execute
* it via the general ExecEvalExpr() code; the only difference is that we
* don't get a chance to pass a special ReturnSetInfo to any functions
* buried in the expression.
*/
if (funcexpr && IsA(funcexpr, FuncExprState) &&
IsA(funcexpr->expr, FuncExpr))
and back then ExecEvalExpr() was allowed to return sets. It also
depends on some default initializations (e.g. rsinfo.returnMode =
SFRM_ValuePerCall).

This yields plenty weird behaviour in < v10. E.g. the following is
disallowed:
SELECT * FROM int4mul(generate_series(1,3), 1);
ERROR: 0A000: set-valued function called in context that cannot accept a set
as is
SELECT * FROM CAST(int4mul(generate_series(1,3), 1) AS bigint);
ERROR: 0A000: set-valued function called in context that cannot accept a set
because the cast is implemented as int8(expr) which avoids the fallback
path as it's a FuncExpr, but
SELECT * FROM CAST(int4mul(generate_series(1,3), 1) AS text);
works because the cast is implemented as a io coercion, which is not a
funcexpr. Therefore it uses the fallback ExecEvalExpr().

The mismatch between ExecEvalExpr() and nodeFunctionscan.c SRF behaviour
back then also yields odd behaviour, e.g.:
SELECT * FROM generate_series(1,0);
returns zero rows, but
SELECT * FROM CAST(generate_series(1,0) * 5 AS INT);
returns one NULL row.

In v10, as it stands, these all error out, because the SRFs are now only
to be evaluated via either nodeFunctionscan.c (FROM) or via
nodeProjectSet.c (targetlist), not ExecEvalExpr() anymore.

I've basically pondered three different methods of implementing
something akin to the old behaviour:

1) Move the non-SRF part into nodeFunctionscan.c's targetlist, and let
it be evaluated there. E.g. if the expression is
CAST(generate_series(1,5) AS text), evaluate the generate_series(1,5)
using nodeFunctionscan's FunctionScanPerFuncState machinery, but
implement the CAST as CAST(Var(whatever, 1) AS Text).

That doesn't work well for composite/record returning rows, because
RangeTblFunction's returning composites are expanded into multiple
columns. E.g.
SELECT * FROM CAST(CAST(twocol_noset_outline(generate_series(1,3), 'text') AS text) AS twocol);
returns all the columns of twocol, not a single wholerow datum.

There's also some issues around what to do with cases involving
volatile functions when the output is not referenced, or referenced
multiple times e.g.
SELECT f, f FROM CAST(generate_series(1,3) * nextval(...)) AS f;
would evaluate nextval() twice with such an approach...

2) During planning, split of the SRF bit from the !SRF bit and have
nodeFunctionscan.c evaluate both separately, like 1). That allows to
avoid the volatility issue, because the targetlist is still projected
separately.

I've prototyped this to a reasonable point, by having
create_functionscan_plan() process each RangeTblFunction and split
the expression into SRF and non-SRF parts, and then have
FunctionNext() evaluate the non-SRF part.

At the current state of my prototype this happens to allow simple SRF
in arguments cases like SELECT * FROM int4mul(generate_series(1,3), 1)
which are disallowed in < v10.

This is reasonably-ish in complexity for scalar values, but gets
quite complicated for composite/record datums. Suddenly there's two
different types of values that we need to deal with, the type of
datum returned by the SRF, and the type of datum returned by the
final expression - they might be the same, they might not.
Especially for record-returning functions, where ROWS FROM( AS...)
determines the column types, and ExecMakeTableFunctionResult() does a
tupledesc_match() to verify the SRF returned something reasonable,
it gets wild: What would we even be matching the types against?

3) Implement nested FROM SRFs as pipelined executor nodes, similar to
the way targetlist SRFs are now handled. E.g. something like
SELECT * FROM CAST(generate_series(1,10) * 5 AS int);

would be implemented as one nodeFunctionscan.c for the
generate_series(), and then something like nodeProjectSet.c (or a
nodeFunctionscan.c branch that'd make a sub-node available) would
evaluate the CAST(Var() * 5 AS int).

This approach has the advantage that it'd allow us, potentially in
the future, to get rid of the restriction that normal ROWS FROM
doesn't allow SRFs in arguments.

I think this again runs into trouble with row-expansion in the return
value from SRFs, which isn't actually desired till the outermost
"level" of the expression. I guess we could add a mode to
nodeFunctionscan.c that forces composite/record types to be returned
as wholerow vars instead of being split up.

My conclusion here is that it's way too complicated to properly
implement a feature that only seems to exist by accident and has plenty
of weird behaviour. Currently we continue to accept all the !SRF
expressions that were previously accepted, but I'd even consider
insisting that the expression needs to be a proper FuncExpr at
parse-analysis time (before inlining/const evaluation did its thing).

Unless somebody has a radically better idea how to implement this?

- Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-04-12 22:59:32 Re: pg_upgrade vs extension upgrades
Previous Message Peter Eisentraut 2017-04-12 22:32:55 Re: [pgsql-www] Small issue in online devel documentation build