Re: Bug with STABLE function using the wrong snapshot (probably during planning)

From: Noah Misch <noah(at)leadboat(dot)com>
To: Matthijs Bomhoff <matthijs(at)quarantainenet(dot)nl>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Bug with STABLE function using the wrong snapshot (probably during planning)
Date: 2011-05-10 09:31:24
Message-ID: 20110510093124.GE5617@tornado.gateway.2wire.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Matthijs,

Thanks for the report.

On Tue, Mar 22, 2011 at 04:31:47PM +0100, Matthijs Bomhoff wrote:
> The bit of SQL below does not behave the way it should on postgres 8.4.4 (tested by me) and 9.0.3 (verified independently on #postgresql).

On git master, too.

> The third statement in the quux() function calls the a_bar() function that should find a single row in the 'bar' table and return its value. This single row is INSERTed into the 'bar' table on the previous line. However, the SELECT statement in the a_bar() function throws the following error: "ERROR: query returned no rows". It thus appears not to see the INSERTed value in the 'bar' table. (The expected behavior is that the a_bar() function returns the value 500 instead of throwing an error.)
>
> Removing the STABLE attribute from a_bar() works around the problem, as does moving the "INSERT INTO bar ..." statement out of the quux() function and executing it before calling the quux() function itself.
>
> Some initial debugging by RhodiumToad on #postgresql led to the following observation: The error occurs only when the "SELECT ... WHERE i = a_bar();" is being planned, not when it is being executed, with the snapshot being used to plan the query apparently being too old to see the result of the preceding insert.

Quite so. All the core procedural languages have _SPI_execute_plan() manage
CommandCounterIncrement() and PushActiveSnapshot()/PopActiveSnapshot() for the
SQL statements they execute. Many statements use a snapshot during planning,
but _SPI_prepare_plan() never pushes one. Therefore, in this example, planning
uses the snapshot pushed in PortalRunSelect(). Expressions evaluated at plan
time miss any changes from earlier in the volatile function. This is fine when
they merely give "wrong" answers: we might get an inferior selectivity estimate.
In your example, a function that actually needs to see the latest data to avoid
throwing an error, we do have a problem.

The simplest fix I can see is to have _SPI_prepare_plan() push/pop a new
snapshot when analyze_requires_snapshot() returns true on the raw parse tree.
That strategy can break down in the other direction if the caller is STABLE;
consider this example:

CREATE TABLE foo(i INTEGER);
CREATE TABLE bar(i INTEGER);
INSERT INTO foo(i) SELECT s.a FROM generate_series(1,2) s(a);
INSERT INTO bar(i) VALUES(500);

BEGIN;
CREATE OR REPLACE FUNCTION a_bar() RETURNS INTEGER AS $EOF$
DECLARE
result INTEGER;
BEGIN
EXECUTE 'SELECT i FROM bar' INTO STRICT result;
RETURN result;
END
$EOF$ LANGUAGE plpgsql STABLE;

CREATE OR REPLACE FUNCTION quux() RETURNS INTEGER AS $EOF$
BEGIN
LOOP
RAISE NOTICE 'iteration';
EXECUTE 'SELECT COUNT(*) FROM foo WHERE i = a_bar()';
PERFORM pg_sleep(3);
END LOOP;
END
$EOF$ LANGUAGE plpgsql STABLE;

SELECT quux();
-- concurrently:
-- INSERT INTO bar VALUES (501);

ROLLBACK;

With the current code, the function call runs indefinitely. With the
_SPI_prepare_plan() change, it fails during planning on the next iteration after
the concurrent change. This seems less severe than the current bug, but it's
still not great. We could preserve the behavior of that example by instead
adding a "read_only" parameter to SPI_prepare* (or defining new functions with
the parameter) and having that parameter control snapshot acquisition as it does
for SPI_execute*. Opinions? Better ideas?

> BEGIN;
>
> CREATE TABLE foo(i INTEGER);
> CREATE TABLE bar(i INTEGER);
>
> CREATE OR REPLACE FUNCTION a_bar() RETURNS INTEGER AS $EOF$
> DECLARE
> result INTEGER;
> BEGIN
> EXECUTE 'SELECT i FROM bar' INTO STRICT result;
> RETURN result;
> END
> $EOF$ LANGUAGE plpgsql STABLE;
>
> CREATE OR REPLACE FUNCTION quux() RETURNS INTEGER AS $EOF$
> DECLARE
> result INTEGER;
> BEGIN
> INSERT INTO foo(i) SELECT s.a FROM generate_series(1,1000,1) s(a);
> INSERT INTO bar(i) VALUES(500);
> SELECT INTO STRICT result COUNT(*) FROM foo WHERE i = a_bar();
> RETURN result;
> END
> $EOF$ LANGUAGE plpgsql;
>
> SELECT quux();
>
> ROLLBACK;

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Fujii Masao 2011-05-10 12:02:49 Re: BUG #6018: ctrl +C cause data inconsistent for sync standby
Previous Message Robert Haas 2011-05-10 02:40:56 Re: BUG #5994: Can't excute DBI->connect to oracle by client site