Re: Invalid optimization of VOLATILE function in WHERE clause?

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu>
Cc: "Florian Schoppmann" <Florian(dot)Schoppmann(at)emc(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>,<pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Invalid optimization of VOLATILE function in WHERE clause?
Date: 2012-09-19 20:26:03
Message-ID: 5059E40B020000250004A521@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu> wrote:
> On Wed, Sep 19, 2012 at 02:39:12PM -0500, Kevin Grittner wrote:
>> In another thread, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>>> 2. Apply the WHERE condition to each row from 1, and drop rows
>>> that don't pass it.
>>
>> People expect that the results will be consistent with this
>> model, even if the implementation is optimized "under the
>> covers". I think correct semantics should trump performance
>> here.

> It seems like this is what happens here except that the function
> is evaluated once for the WHERE and not once per ROW. Both of
> these meet the criterion for 2 above and Tom's earlier comments
> both hold.

There really needs to be some way to specify that when an expression
is evaluated for each row in a set, a function used within that
expression is not optimized away for some rows. Fortunately we have
a way:

http://www.postgresql.org/docs/9.2/interactive/sql-createfunction.html

| VOLATILE indicates that the function value can change even within
| a single table scan, so no optimizations can be made. Relatively
| few database functions are volatile in this sense; some examples
| are random(), [...]

The behavior in the OP's query would certainly be sane if the
function were not VOLATILE; as it is, I have a hard time seeing this
as anything but a bug.

There is a workaround, if you don't mind ugly:

CREATE FUNCTION random_really_i_mean_it(dummy int)
RETURNS double precision
LANGUAGE plpgsql
VOLATILE
AS $$
BEGIN
-- no need to reference dummy parameter
RETURN random();
END;
$$;

WITH source AS (
SELECT i FROM generate_series(1,10) AS i
)
SELECT
i
FROM
source, (
SELECT
count(*) AS _n
FROM source
) AS _stats
WHERE
random_really_i_mean_it(i) < 5::DOUBLE PRECISION/_n;

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2012-09-19 20:30:34 Re: Invalid optimization of VOLATILE function in WHERE clause?
Previous Message Simon Riggs 2012-09-19 20:13:51 Re: [v9.3] Extra Daemons (Re: elegant and effective way for running jobs inside a database)