Re: Missing calls to UnlockBuffers() - unify error handling?

From: Rahila Syed <rahilasyed90(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Missing calls to UnlockBuffers() - unify error handling?
Date: 2025-11-14 11:40:35
Message-ID: CAH2L28vEqBa7oVFr0TzcKE0xX1yFj75nkC_-ddHixub42BbG+Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Andres,

> Widening the perspective a bit, our current approach for such
> error-handling /
> shutdown functions seems ... not maintainable. In particular we have a
> substantial number of top-level sigsetjmp() handlers that have slightly
> different recovery codepaths. When adding a new process type, how is one
> supposed to even know what function one needs to call?
>
> PostgresMain() has this comment:
> /*
> * NOTE: if you are tempted to add more code in this
> if-block,
> * consider the high probability that it should be in
> * AbortTransaction() instead. The only stuff done
> directly here
> * should be stuff that is guaranteed to apply *only* for
> outer-level
> * error recovery, such as adjusting the FE/BE protocol
> status.
> */
>
> But a) that clearly hasn't worked, given how long the following block is,
> and
> b) can't really work that well, because we have plenty processes that don't
> use transaction, therefore putting cleanup in AbortTransaction() doesn't go
> that far.
>

I don't quite know how it should look like, but it seems pretty obvious that
> it should be more unified than it is. My gut feeling is that we should
> have a
> single error recovery function that has a flags argument guiding which
> subsystems should be reset and which shouldn't.
>
>
+1 for the idea. I am interested in writing a patch for the same if you
would like.

As you mentioned, these code blocks under the if
(sigsetjmp(local_sigjmp_buf, 1) != 0)
statement have a very similar set of function calls, depending on the
process type—whether
it's an auxiliary process, background process, or client backend.
There are also similarities across these types; for instance, all of them
register a ProcKill
callback as an on_shmem_exit callback..

Currently, we perform LWLockReleaseAll() at the before_shmem_exit stage,
such as in the ShutdownAuxiliaryProcess() callback for auxiliary processes
and
ShutdownPostgres() for client backends.
However, there are some inconsistencies:
1. The client backend does not call LWLockReleaseAll() until ProcKill(),
if it is not
in a transaction.
2. The background processes and AutovacuumLauncher do not register a
before_shmem_callback
for calling LWLockReleaseAll() but may invoke it directly within the
sigsetjmp() blocks.
3. Some auxiliary processes also call LWLockReleaseAll() early in the
sigsetjmp() block.

> Seems we should just reorder the sequence in ProcKill() to first call
> LWLockReleaseAll()

It would be too late to call it in ProcKill, since dsm_backend_shutdown()
would
detach segments containing the LWLocks before this. Using a
before_shmem_exit callback or
handling it in the initial steps of sigsetjmp might be more suitable.

Thank you,
Rahila Syed

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2025-11-14 11:53:21 Re: Changing shared_buffers without restart
Previous Message Masahiko Sawada 2025-11-14 11:39:34 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart