Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?
Date: 2022-04-09 21:39:16
Message-ID: 20220409213916.4ydygnkf37dd4sw2@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-04-09 17:00:41 -0400, Tom Lane wrote:
> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> > Unlike most "procsignal" handler routines, RecoveryConflictInterrupt()
> > doesn't just set a sig_atomic_t flag and poke the latch. Is the extra
> > stuff it does safe? For example, is this call stack OK (to pick one
> > that jumps out, but not the only one)?
>
> > procsignal_sigusr1_handler
> > -> RecoveryConflictInterrupt
> > -> HoldingBufferPinThatDelaysRecovery
> > -> GetPrivateRefCount
> > -> GetPrivateRefCountEntry
> > -> hash_search(...hash table that might be in the middle of an update...)
>
> Ugh. That one was safe before somebody decided we needed a hash table
> for buffer refcounts, but it's surely not safe now.

Mea culpa. This is 4b4b680c3d6d - from 2014.

> Which, of course, demonstrates the folly of allowing signal handlers to call
> much of anything; but especially doing so without clearly marking the called
> functions as needing to be signal safe.

Yea. Particularly when just going through bufmgr and updating places that look
at pin counts, it's not immediately obvious that
HoldingBufferPinThatDelaysRecovery() runs in a signal handler. Partially
because RecoveryConflictInterrupt() - which is mentioned in the comment above
HoldingBufferPinThatDelaysRecovery() - sounds a lot like it's called from
ProcessInterrupts(), which doesn't run in a signal handler...

RecoveryConflictInterrupt() calls a lot of functions, some of which quite
plausibly could be changed to not be signal safe, even if they currently are.

Is there really a reason for RecoveryConflictInterrupt() to run in a signal
handler? Given that we only react to conflicts in ProcessInterrupts(), it's
not immediately obvious that we need to do anything in
RecoveryConflictInterrupt() but set some flags. There's probably some minor
efficiency gains, but that seems unconvincing.

The comments really need a rewrite - it sounds like
RecoveryConflictInterrupt() will error out itself:

/*
* If we can abort just the current subtransaction then we are
* OK to throw an ERROR to resolve the conflict. Otherwise
* drop through to the FATAL case.
*
* XXX other times that we can throw just an ERROR *may* be
* PROCSIG_RECOVERY_CONFLICT_LOCK if no locks are held in
* parent transactions
*
* PROCSIG_RECOVERY_CONFLICT_SNAPSHOT if no snapshots are held
* by parent transactions and the transaction is not
* transaction-snapshot mode
*
* PROCSIG_RECOVERY_CONFLICT_TABLESPACE if no temp files or
* cursors open in parent transactions
*/

it's technically not *wrong* because it's setting up state that then leads to
ERROR / FATAL being thrown, but ...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-04-09 22:00:54 Re: failures in t/031_recovery_conflict.pl on CI
Previous Message Tom Lane 2022-04-09 21:00:41 Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?