Re: inconsistent behaviour of set-returning functions in sub-query with random()

From: Tom van Tilburg <tom(dot)van(dot)tilburg(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: inconsistent behaviour of set-returning functions in sub-query with random()
Date: 2016-09-27 09:19:17
Message-ID: CAP3PPDj3_a1fOM8EPteDj++BYivB9GS_NB8UY6Y_XBEhbP5odQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Good to know and I agree that it is not an urgent case.
I think this practice might be more common in the POSTGIS community where
there are plenty of set-returning-functions used in this way. My use was
taking a random sample of a pointcloud distrubution.

I took the liberty to post your answer at stackexchange.

thanks,
Tom

On Mon, 26 Sep 2016 at 21:38 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Tom van Tilburg <tom(dot)van(dot)tilburg(at)gmail(dot)com> writes:
> > I'm often using the WHERE clause random() > 0.5 to pick a random subset
> of
> > my data. Now I noticed that when using a set-returning function in a
> > sub-query, I either get the whole set or none (meaning that the WHERE
> > random() > 0.5 clause is interpreted *before* the set is being
> generated).
> > e.g.:
> >
> > SELECT num FROM (
> > SELECT unnest(Array[1,2,3,4,5,6,7,8,9,10]) num) AS foo WHERE
> random() > 0.5;
>
> Hmm, I think this is an optimizer bug. There are two legitimate behaviors
> here:
>
> SELECT * FROM unnest(ARRAY[1,2,3,4,5,6,7,8,9,10]) WHERE random() > 0.5;
>
> should (and does) re-evaluate the WHERE for every row output by unnest().
>
> SELECT unnest(ARRAY[1,2,3,4,5,6,7,8,9,10]) WHERE random() > 0.5;
>
> should evaluate WHERE only once, since that happens before expansion of the
> set-returning function in the targetlist. (If you're an Oracle user and
> you imagine this query as having an implicit "FROM dual", the WHERE should
> be evaluated for the single row coming out of the FROM clause.)
>
> In the case you've got here, given the placement of the WHERE in the outer
> query, you'd certainly expect it to be evaluated for each row coming out
> of the inner query. But the optimizer is deciding it can push the WHERE
> clause down to become a WHERE of the sub-select. That is legitimate in a
> lot of cases, but not when there are SRF(s) in the sub-select's
> targetlist, because that pushes the WHERE to occur before the SRF(s),
> analogously to the change between the two queries I wrote.
>
> I'm a bit hesitant to change this in existing releases. Given the lack
> of previous complaints, it seems more likely to break queries that were
> behaving as-expected than to make people happy. But we could change it
> in v10 and up, especially since some other corner-case changes in
> SRF-in-tlist behavior are afoot.
>
> In the meantime, you could force it to work as you wish by inserting the
> all-purpose optimization fence "OFFSET 0" in the sub-select:
>
> =# SELECT num FROM (
> SELECT unnest(Array[1,2,3,4,5,6,7,8,9,10]) num OFFSET 0) AS foo WHERE
> random() > 0.5;
> num
> -----
> 1
> 4
> 7
> 9
> (4 rows)
>
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message PHANIKUMAR G 2016-09-27 11:12:18 Re: need approval to join forums/community
Previous Message Michael Paquier 2016-09-27 05:44:40 Re: Frequent "pg_ctl status" removing(?) semaphores (unlikely)

Browse pgsql-hackers by date

  From Date Subject
Next Message Victor Wagner 2016-09-27 09:19:56 Re: Patch: Implement failover on libpq connect level.
Previous Message Ashutosh Bapat 2016-09-27 09:18:21 Re: Declarative partitioning - another take