Re: Volatile Functions in Parallel Plans

From: Jesse Zhang <sbjesse(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Zhenghua Lyu <zlyu(at)vmware(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Volatile Functions in Parallel Plans
Date: 2020-07-16 15:40:12
Message-ID: CAGf+fX4sEmB3XwNdvgc2xDJu4PSa2zG7gb2vtoXnKSayoO004g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tom and Zhenghua,

On Thu, Jul 16, 2020 at 8:18 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Jesse Zhang <sbjesse(at)gmail(dot)com> writes:
> > You are right, no they are not consistent. But Neither plans is
> > incorrect:
>
> > 1. In the first query, it's semantically permissible to evaluate
> > timeofday() for each pair of (c1, c2), and the plan reflects that.
> > (Notice that the parallel nature of the plan is just noise here, the
> > planner could have gone with a Nested Loop of which the inner side is
> > _not_ materialized).
>
> Yeah, exactly. The short answer here is that refusing to parallelize
> the plan would not make things any more consistent.
>
> In general, using a volatile function in a WHERE clause is problematic
> because we make no guarantees about how often it will be evaluated.
> It could be anywhere between "never" and "once per row of the
> cross-product of the FROM tables". AFAIR, the only concession we've made
> to make that less unpredictable is to avoid using volatile functions in
> index quals. But even that will only make things noticeably more
> predictable for single-table queries. As soon as you get into join cases,
> you don't have much control over when the function will get evaluated.
>
> regards, tom lane

For more kicks, I don't even think this is restricted to volatile
functions only. To stir the pot, it's conceivable that planner might
produce the following plan

Seq Scan on pg_temp_3.foo
Output: foo.a
Filter: (SubPlan 1)
SubPlan 1
-> WindowAgg
Output: sum(bar.d) OVER (?)
-> Seq Scan on pg_temp_3.bar
Output: bar.d

For the following query

SELECT a FROM foo WHERE b = ALL (
SELECT sum(d) OVER (ROWS UNBOUNDED PRECEDING) FROM bar
);

N.B. that the WindowAgg might produce a different set of numbers each
time depending on the scan order of bar, which means that for two
different "foo" tuples of equal b value, one might be rejected by the
filter whereas another survives.

I think the crux of the discussion should be whether we can reasonably
expect a subquery (subquery-like structure, for example the inner side
of nest loops upthread) to be evaluated only once. IMHO, no. The SQL
standard only broadly mandates that each "execution" of a subquery to be
"atomic".

Zhenghua and Tom, would you suggest the above plan is wrong (not
suboptimal, but wrong) just because we don't materialize the WindowAgg
under the subplan?

Cheers,
Jesse

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-07-16 15:43:59 Re: renaming configure.in to configure.ac
Previous Message Andrew Dunstan 2020-07-16 15:36:29 Re: renaming configure.in to configure.ac