Re: BUG #4925: "select ... for update" doesn't affect rows from sub-query

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Steve Caligo <steve(dot)caligo(at)ctie(dot)etat(dot)lu>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4925: "select ... for update" doesn't affect rows from sub-query
Date: 2009-07-30 02:54:36
Message-ID: 603c8f070907291954ydc9d467m21499dfbf824ee17@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Jul 16, 2009 at 12:34 PM, Steve Caligo<steve(dot)caligo(at)ctie(dot)etat(dot)lu> wrote:
>
> The following bug has been logged online:
>
> Bug reference:      4925
> Logged by:          Steve Caligo
> Email address:      steve(dot)caligo(at)ctie(dot)etat(dot)lu
> PostgreSQL version: 8.3.7 and 8.4.0
> Operating system:   Archlinux and Gentoo 8.3.7, Gentoo 8.4.0
> Description:        "select ... for update" doesn't affect rows from
> sub-query
> Details:
>
> While trying to guarantee data consistency when doing concurrent processing,
> I stumbled upon your cautions mentionned in your documentation (especially
> "limit" ... "for update"):
> http://www.postgresql.org/docs/8.4/static/sql-select.html
>
> I tried working around this limitation and the statement on the same page
> seemed promising to me:
> "If FOR UPDATE or FOR SHARE is applied to a view or sub-query, it affects
> all tables used in the view or sub-query."
>
> But unfortunately the latter statement doesn't seem to be true and the
> subquery isn't protected by row locks, as the following examples show.
>
> 1) INITIAL SITUATION
>
> Create a simple table with some data. No constraints, no indexes, just the
> bare minimum:
>
> create table test (
>        id integer,
>        name varchar(10),
>        c integer
> );
>
> insert into test values
>        (1, 'test1', 0),
>        (2, 'test2', 0),
>        (3, 'test3', 0),
>        (4, 'test4', 0),
>        (5, 'test5', 0)
> ;
>
>
> 2) FIRST TRY, USING "UPDATE WHERE ID = ( SELECT )"
> The goal is to have two clients set their unique ID to a single/different
> row from the table. First, using "limit" in a slightly different way:
>
> 1=> begin transaction;
> 2=> begin transaction;
>
> 1=> update test set c = 1 where id = (
>        select id from test where c = 0 order by id limit 1
> )
> ; -- updates row id=1
> 2=> update test set c = 2 where id = (
>        select id from test where c = 0 order by id limit 1
> )
> ; -- forced to wait on lock
>
> 1=> commit; -- client #2 continues
> 1=> select * from test where id = 1;
>  id | name  | c
> ----+-------+---
>  1 | test1 | 1
> (1 row)
>
> 2=> commit;
> 2=> select * from test where id = 1;
>  id | name  | c
> ----+-------+---
>  1 | test1 | 2
> (1 row)
>
> 1=> select * from test where id = 1;
>  id | name  | c
> ----+-------+---
>  1 | test1 | 2
> (1 row)
>
>
> Conclusion: didn't work.
> Probably because the subquery is executed before the update and not affected
> by row locking.
>
> Expected behaviour: one client update one row to c=1 and the other client
> updates a different row to c=2.

This is pretty clearly NOT the situation described in the
documentation. There is no FOR UPDATE or FOR SHARE anywhere in this
query. You could argue that we should treat an UPDATE statement as
applying an implicit FOR UPDATE to any subqueries found therein, but
that has nothing to do with whether the current behavior matches the
documentation; it's a discussion of whether the current behavior is
good or bad.

> 3) SECOND TRY, PROTECTING THE UPDATE BY AN ADDITIONAL "SELECT ... FOR
> UPDATE" AND AVOIDING THE "LIMIT"
>
> 1=> begin transaction;
> 2=> begin transaction;
> 1=> select id from test where id = (
>        select min(id) from test where c = 0
> ) for update;
>  id
> ----
>  2
> (1 row)
>
> 2=> select id from test where id = (
>        select min(id) from test where c = 0
> ) for update; -- forced to wait on lock
>
> 1=> update test set c = 1 where id = 2;
> 1=> commit; -- client #2 continues:
> 2=> -- client #2 outputs:
>  id
> ----
>  2
> (1 row)
>
> 2=> select * from test where id = 2;
>  id | name  | c
> ----+-------+---
>  2 | test2 | 1
> (1 row)
> 2=> -- now this isn't what we initially asked for, let's just repeat the
> query once more:
>
> 2=> select id from test where id = ( select min(id) from test where c = 0 )
> for update;
>  id
> ----
>  3
> (1 row)
>
>
> Conclusion: didn't work.
> The situation one ends up in is one that contradicts your above statement,
> but also seemingly violates the "I" in ACID. In 3), client #2 is clearly
> affected by the actions of client #1. While serialized transactions or full
> table locks would avoid this race condition, it either requires large
> changes in the application or impacts performance during contention.

This is pretty weird behavior, and I am among those who think it
sucks. But it is documented.

http://www.postgresql.org/docs/8.4/static/transaction-iso.html#MVCC-SERIALIZABILITY

An interesting fact is that if you stick another "for update" into the
subquery here, the command will fail utterly, with the following error
message:

ERROR: SELECT FOR UPDATE/SHARE is not allowed with aggregate functions

> Adding an additional "c = 0" to the main query of 3) of course suppresses
> the row from the concurrent update, but it shouldn't have been returned with
> a value of "c = 1" because the transaction #2 started prior to the update
> statement of #1.

...Robert

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Steve Caligo 2009-07-30 05:40:01 Re: BUG #4925: "select ... for update" doesn't affect rows from sub-query
Previous Message Scott Mead 2009-07-30 02:25:17 Re: BUG #4950: Problem in Job scheduling in Advanced postgre sql