Re: Invalid optimization of VOLATILE function in WHERE clause?

From: Florian(dot)Schoppmann(at)emc(dot)com (Florian Schoppmann)
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Invalid optimization of VOLATILE function in WHERE clause?
Date: 2012-09-20 01:26:10
Message-ID: 1kqojon.1l30msx12a32miN%Florian.Schoppmann@emc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Florian(dot)Schoppmann(at)emc(dot)com (Florian Schoppmann) writes:
> > [VOLATILE function in WHERE clause *does* get optimized]
>
> I can't get excited about this. Any time you put a volatile function
> into WHERE, you're playing with fire. The docs warn against it:
> http://www.postgresql.org/docs/9.2/static/sql-expressions.html#SYNTAX-
> EXPRESS-EVAL

All this section tells me is that one cannot rely on the evaluation
order in expressions, and that side effects are dangerous in WHERE and
HAVING clauses.

I do not read in this section that VOLATILE functions are unsafe per se.
After all, a VOLATILE function is not required to have side effects
(suppose, e.g., somebody implemented a true random nubmer generator).

However,
<http://www.postgresql.org/docs/9.2/interactive/sql-createfunction.html>
is actually very clear in its wording:

| VOLATILE indicates that the function value can change even within a
| single table scan, so no optimizations can be made.

I therefore tend to see the behavior as a bug. I concede though that
there is also good reason to keep the current behavior (VOLATILE is the
default for UDFs, etc.). But then I think the documentation needs to be
changed, and it has to be made explicit where the optimizer may change
semantics and what, on the other hand, is defined behavior.

E.g., users should need to know if the following rewrite causes defined
or undefined behavior. Does it give correct results by accident, or are
correct results guaranteed?

--8<--
WITH source AS (
SELECT i FROM generate_series(1,10) AS i
)
SELECT
i
FROM
source AS _stats
WHERE
random() < 5::DOUBLE PRECISION
/ (SELECT count(*) FROM source);
-->8--

> To do what you want, I'd suggest wrapping the join into a sub-select
> with an "OFFSET 0" clause, which will serve as an optimization fence
> that prevents the random() call from being pushed down.

My interpretation so far is that VOLATILE functions in a WHERE clause
can always be avoided: E.g., move the condition to the SELECT list and
embed in an outer query that then filters on the condition column. Or is
my assumption wrong, and the optimizer could theoretically interfere
even here?

Florian

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2012-09-20 02:56:03 newline conversion in SQL command strings
Previous Message Andres Freund 2012-09-19 23:07:51 Re: [COMMITTERS] pgsql: Fix bufmgr so CHECKPOINT_END_OF_RECOVERY behaves as a shutdown c