Re: PostgreSQL 9.6 behavior change with set returning (funct).*

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Regina Obe <lr(at)pcorp(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL 9.6 behavior change with set returning (funct).*
Date: 2016-03-23 18:27:23
Message-ID: CAKFQuwY3t-p0KiZ3qWxrj4KonJiEVBVcoBc_0rh2EV-5zAuKEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, March 23, 2016, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> "Regina Obe" <lr(at)pcorp(dot)us <javascript:;>> writes:
> > In the past couple of weeks our PostGIS tests against PostgreSQL 9.6 dev
> > started failing. I traced the issue down to a behavior change in 9.6
> when
> > dealing with output of set returning functions when used with (func).*
> > syntax.
>
> > CREATE OR REPLACE FUNCTION dumpset(param_num integer, param_text text)
> > RETURNS TABLE(id integer, junk1 text, junk2 text)
> > ...
> > -- Get 16 rows in 9.6, Get 8 rows in 9.5
> > SELECT (dumpset(f.test, 'hello world' || f.test)).*
> > FROM generate_series(1,4) As f(test)
> > ORDER BY junk2;
>
> I think this is a side-effect of 9118d03a8 ("When appropriate, postpone
> SELECT output expressions till after ORDER BY"). Previously, although you
> got N evaluations of the SRF which is pretty horrid, they were all in the
> same targetlist and hence ran in sync and produced only the expected
> number of rows (thanks to the otherwise-indefensible ExecTargetList
> behavior by which multiple SRF tlist expressions produce a number of rows
> equal to the least common multiple of their periods, not the product).
> That commit causes the evaluation of dumpset(...).junk1 to be postponed
> till after the Sort step, but the evaluation of dumpset(...).junk2
> necessarily can't be. So now you get dumpset(...).junk2 inflating
> the original rowcount 2X, and then dumpset(...).junk1 inflating it
> another 2X after the Sort.
>
> We could remain bug-compatible with the old behavior by adding some
> restriction to keep all the tlist SRFs getting evaluated at the same
> plan step, at least to the extent that we can. I think you could get
> similar strange behaviors in prior releases if you used GROUP BY or
> another syntax that might result in early evaluation of the sort column,
> and we're not going to be able to fix that. But we could prevent this
> particular optimization from introducing new strangeness.
>
> But I'm not really sure that we should. The way that you *should*
> write this query is
>
> SELECT ds.*
> FROM generate_series(1,4) AS f(test),
> dumpset(f.test, 'hello world' || f.test) AS ds
> ORDER BY junk2;
>
> which is both SQL-standard semantics and much more efficient than
> SRF-in-tlist. We've more or less deprecated SRF-in-tlist since we
> introduced LATERAL in 9.3. How much are we willing to do to stay
> bug-compatible with old behaviors here?
>
>
My gut reaction is that this is an unnecessary regression for the sake of a
performance optimization that is likely drowned out in the usage presented
anyway.

The pivot for me would be how hard would it be to maintain the old behavior
in this "more or less deprecated" scenario. I have no way to judge that.

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-03-23 18:32:26 Re: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
Previous Message Petr Jelinek 2016-03-23 18:23:40 Re: multivariate statistics v14