Re: Segmentation fault on proc exit after dshash_find_or_insert

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Rahila Syed <rahilasyed90(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Segmentation fault on proc exit after dshash_find_or_insert
Date: 2025-12-04 16:03:44
Message-ID: CA+HiwqHk=7PfqhQ57i75iirV=gmSqR_Td=ki4L_wV1fhW0R2KA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Fri, Dec 5, 2025 at 12:33 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2025-12-04 11:06:20 +0900, Amit Langote wrote:
> > On Thu, Dec 4, 2025 at 12:33 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > I don't agree. You *cannot* rely on lwlocks working after an error before you
> > > have called LWLockReleaseAll(). I.e. currently *any* use of lwlocks in
> > > before_shmem_exit is unsafe. The only reason we haven't really noticed that is
> > > that most of the top-level error handlers (i.e. sigsetjmp()s) do an
> > > AbortCurrentTransaction(), which does an LWLockReleaseAll() if in a tx, and
> > > most lwlock acquisitions happen within a transaction. But if you ever do stuff
> > > outside of a transaction, the AbortCurrentTransaction() won't do
> > > LWLockReleaseAll(), and you're in trouble, as the case here.
> > >
> > > IOW, I think we need to do LWLockReleaseAll() at the top of proc_exit(), at
> > > least in case of FATAL errors.
> >
> > Oh, so not at the top of not shmem_exit() as Rahila has proposed?
>
> Oh, ontop of shmem_exit() seems fine, what I was trying to express was that it
> should happen unconditionally at the start of exit processing, not just at the
> tail.

Ah, ok. I was talking about this with Rahila today and she pointed
out to me that whether we add it to the top of proc_exit() or
shmem_exit() doesn't really make any material difference to the fact
that it will get done at the start of exit processing as you say, at
least today. So I think we can keep it like Rahila originally
proposed.

> > > We probably should add a note to LWLockReleaseAll() indicating that we rely on
> > > LWLockReleaseAll() working even if CreateLWLocks()/InitializeLWLocks() etc
> > > hasn't yet been called...
> >
> > Makes sense. AFAICS, num_held_lwlocks would be 0 in that case, so
> > LWLockReleaseAll() would be a no-op.
>
> Right. I just meant we should add a comment noting that we rely on that
> fact...

Ok, got it. Maybe like this:

@@ -1940,6 +1940,10 @@ LWLockReleaseClearVar(LWLock *lock,
pg_atomic_uint64 *valptr, uint64 val)
* unchanged by this operation. This is necessary since InterruptHoldoffCount
* has been set to an appropriate level earlier in error recovery. We could
* decrement it below zero if we allow it to drop for each released lock!
+ *
+ * Note that this function must be safe to call even if the LWLock subsystem
+ * hasn't been initialized (e.g., during early startup error recovery).
+ * In that case, num_held_lwlocks will be 0, and we'll do nothing.
*/
void
LWLockReleaseAll(void)

--
Thanks, Amit Langote

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2025-12-04 16:29:24 Re: Fix PrivateRefCount hash table key size
Previous Message Nathan Bossart 2025-12-04 16:03:22 Re: pgsql: Add pg_atomic_unlocked_write_u64