Re: SELECT FOR UPDATE on rows that don't exist

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Joe <symphony(dot)red+pg(at)gmail(dot)com>
Cc: pgsql-novice <pgsql-novice(at)lists(dot)postgresql(dot)org>
Subject: Re: SELECT FOR UPDATE on rows that don't exist
Date: 2022-03-23 22:18:56
Message-ID: CAKFQuwZ4=c7J-g5=t2veA2-w5tTi3oER34HJvYQ3QsaqSNqEig@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On Wed, Mar 23, 2022 at 12:52 PM Joe <symphony(dot)red+pg(at)gmail(dot)com> wrote:

> 1) What are the semantics of SELECT FOR UPDATE when the row doesn't exist
> yet?
>

You've informed the system you are going to be updating rows on the table
but as yet have not given it specific rows to protect.

>
> Without #*1, a simple experiment shows that two processes can be in the
> critical section at the same time. Add #*1 seems to achieve the desired
> behavior, but is it really?
>

The index will not allow duplicates to be inserted and the first one to try
forces all other potential insertions to wait until the first one commits;

https://www.postgresql.org/docs/current/locking-indexes.html
"Short-term share/exclusive page-level locks are used for read/write
access. Locks are released immediately after each index row is fetched or
inserted. These index types provide the highest concurrency without
deadlock conditions."

But, once the insertion happens the second transaction sees the potentially
conflicting record and so continues waiting to see whether it commits with
the conflict in place (at which point it returns the duplicate key error)
or not (in which case one of them then gets to proceed with their
insertion). This manifests as a lock on the transactionid of the first
session as is most easily seen in in pg_stat_activity
wait_event/wait_event_type fields.

I picked up the last point through my own experimentation, if there is
documentation someone else will hopefully point it out.

> 2) The DELETE @ #2 is so that the row is never present when not executing
> in the critical section mainly so that #1 can be a simple insert rather
> than an upsert. Is there a more standard pattern for this?
>

Not sure. But while definitely not standard, and somewhat limited, there
is the advisory locks feature.

https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS

> 3) Using the DB as a distributed mutex seems like a common application
> but nothing came up in various DB and PostgreSQL books I consulted or on
> the web. Is this a bad idea, or are there gotchas I'm missing?
>
>
For a job queue typically you'd fire off an update to claim the work -
ensuring only one session gets the assignment - and another to indicate
completion. The "critical section" is usually so long as to be harmful to
the database to be performed while holding a transaction open specifically
for locking purposes. I have to imagine there are better tools, especially
in-memory ones, designed to handle distributed process coordination. That
said, the goodness or badness of doing it in the database heavily depends
on knowing more than just "distributed mutex". The critical section
details, not just existence, factors into such a subjective evaluation.
The fact you have a functioning database does make things appealing.

David J.

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Joe 2022-03-24 05:31:47 Re: SELECT FOR UPDATE on rows that don't exist
Previous Message Joe 2022-03-23 19:52:05 SELECT FOR UPDATE on rows that don't exist