Re: PostgreSQL 9.6 behavior change with set returning (funct).*

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Regina Obe" <lr(at)pcorp(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PostgreSQL 9.6 behavior change with set returning (funct).*
Date: 2016-03-23 17:37:28
Message-ID: 9987.1458754648@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Regina Obe" <lr(at)pcorp(dot)us> writes:
> In the past couple of weeks our PostGIS tests against PostgreSQL 9.6 dev
> started failing. I traced the issue down to a behavior change in 9.6 when
> dealing with output of set returning functions when used with (func).*
> syntax.

> CREATE OR REPLACE FUNCTION dumpset(param_num integer, param_text text)
> RETURNS TABLE(id integer, junk1 text, junk2 text)
> ...
> -- Get 16 rows in 9.6, Get 8 rows in 9.5
> SELECT (dumpset(f.test, 'hello world' || f.test)).*
> FROM generate_series(1,4) As f(test)
> ORDER BY junk2;

I think this is a side-effect of 9118d03a8 ("When appropriate, postpone
SELECT output expressions till after ORDER BY"). Previously, although you
got N evaluations of the SRF which is pretty horrid, they were all in the
same targetlist and hence ran in sync and produced only the expected
number of rows (thanks to the otherwise-indefensible ExecTargetList
behavior by which multiple SRF tlist expressions produce a number of rows
equal to the least common multiple of their periods, not the product).
That commit causes the evaluation of dumpset(...).junk1 to be postponed
till after the Sort step, but the evaluation of dumpset(...).junk2
necessarily can't be. So now you get dumpset(...).junk2 inflating
the original rowcount 2X, and then dumpset(...).junk1 inflating it
another 2X after the Sort.

We could remain bug-compatible with the old behavior by adding some
restriction to keep all the tlist SRFs getting evaluated at the same
plan step, at least to the extent that we can. I think you could get
similar strange behaviors in prior releases if you used GROUP BY or
another syntax that might result in early evaluation of the sort column,
and we're not going to be able to fix that. But we could prevent this
particular optimization from introducing new strangeness.

But I'm not really sure that we should. The way that you *should*
write this query is

SELECT ds.*
FROM generate_series(1,4) AS f(test),
dumpset(f.test, 'hello world' || f.test) AS ds
ORDER BY junk2;

which is both SQL-standard semantics and much more efficient than
SRF-in-tlist. We've more or less deprecated SRF-in-tlist since we
introduced LATERAL in 9.3. How much are we willing to do to stay
bug-compatible with old behaviors here?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-03-23 17:41:52 Re: [PATCH] fix DROP OPERATOR to reset links to itself on commutator and negator
Previous Message Petr Jelinek 2016-03-23 17:36:48 Re: Reworks of CustomScan serialization/deserialization