Re: Row locks, SKIP LOCKED, and transactions

From: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
To: Steven Winfield <Steven(dot)Winfield(at)cantabcapital(dot)com>, "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Row locks, SKIP LOCKED, and transactions
Date: 2019-12-18 00:09:45
Message-ID: e4fb1db7-6952-927d-8f86-6e77dee706a2@aklaver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 12/17/19 8:12 AM, Steven Winfield wrote:
> Hi all,
>
> I'm seeing some unexpected behaviour with SELECT ... FOR UPDATE SKIP LOCKED, and having finding it tricky to boil it down to a simple repro case as there's almost certainly a race condition somewhere (more later). So I thought I would ask if what I'm doing is unsupported (or just plain wrong!), before expending more effort in reproducing it.
>
> I'm running v11.5, RHEL 7.7.
>
> I have two tables jobs and results:
> CREATE TABLE job (id integer PRIMARY KEY);
> CREATE TABLE result (id integer PRIMARY KEY);
> (obviously the real tables have more columns, but that's not too important here)
>
> Something populates the job table with IDs.
> A job is done if its id appears in the result table.
> I would like to have multiple worker processes working on jobs.
>
>
> I thought I could achieve this with each working doing the following:
>
> BEGIN;
>
> SELECT id
> FROM job
> WHERE NOT EXISTS (SELECT 1 FROM result WHERE result.id = job.id)
> LIMIT 1
> FOR UPDATE SKIP LOCKED;
>
> -- worker process does some work for the selected ID here
>
> INSERT INTO result (id) VALUES (the_id_from_above);
>
> COMMIT;
>
>
> However, even with just two worker processes, I quickly found that one worker process would be assigned a job id that had *very* recently been completed by the other worker.
>
> Some more potentially useful information:
> * The LockRows node of the plan for the SELECT query above doesn't receive any tuples until about a second after the query begins executing
> * If worker 2 begins querying for a new job id half a second before worker 1 commits then worker 2 will pick up the job id that worker 1 has just finished with.
> * I observe this even if I crank up the transaction isolation level to repeatable read and serializable.
>
>
> I'm wondering if row locks are not obeying the same transactional semantics as row data, as a potential explanation for the above behaviour is as follows (W1/2 = Worker 1/2):
>
> W1: BEGIN;
> W1: SELECT ...
> W1: (SELECT returns id=1. W1 now has job(id=1) locked.)
> W1: INSERT INTO result (id) VALUES (1)
>
> W2: BEGIN;
> W2: SELECT ...
>
> W1: COMMIT; job(id=1) is now unlocked.
>
> W2: (SELECT returns id=1: W1 had not committed when the SELECT started, so result(id=1) is not visible, but LockRows found that job(id=1) was not locked. W2 now has job(id=1) locked.)
>
>
> ...i.e. W2's SELECT could not see the row that W1 INSERTed (because W2's BEGIN occurs and W2's SELECT begins before W1's commit), but W2's SELECT *could* see the removal of W1's row lock.

Gotta believe it is this:

https://www.postgresql.org/docs/11/transaction-iso.html#XACT-READ-COMMITTED

"UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands ..."

If I read correctly, transactions can see the effects of other
transactions that commit during their lifetime.

>
>
> Perhaps this is a misuse of the locking system, since I'm locking a row "FOR UPDATE" but not actually updating it, but as row locks are released at the end of a transaction (according to the docs) then my expectation was for the unlocking and the visibility of newly committed rows to be atomic.
> I've tried FOR NO KEY UPDATE too, without luck.
>
> If I'm doing something forbidden (and the docs say so) then I'd be grateful if someone could point that out!
>
> Best,
> Steven.
>
>
>
>

--
Adrian Klaver
adrian(dot)klaver(at)aklaver(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Munro 2019-12-18 02:10:27 Re: Row locks, SKIP LOCKED, and transactions
Previous Message George Neuner 2019-12-17 20:06:58 Re: Fast, stable, portable hash function producing 4-byte or 8-byte values?