Re: Unexpected behavior after OOM errors

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unexpected behavior after OOM errors
Date: 2026-06-18 15:27:57
Message-ID: CAEze2WiAbGykWpFTXbq6C9ZtXbSApbzLewpBkFjwhqrVuPy2gw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 18 Jun 2026 at 06:37, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> >> 3) An issue in StandbyAcquireAccessExclusiveLock()
> > <snip>
> >
> > I'm not sure how to solve this correctly; I think ideally the
> > StandbyAcquireAccessExclusiveLock() hash code would be wrapped by a
> > critical section, but I'm not 100% sure if that will be a sufficient
> > approach; and it'd definitely need some code to allow the various
> > hashmaps' memctxs to alloc during critical sections.
>
> Not checked this one yet.

I found that the attached patch v3 solves that issue. The assert fires
because we link the lock into the transaction's exclusive locks ahead
of actually having acquired the lock, and when that lock acquisition
fails, as part of the error handling we hit
StartupProcExit->ShutdownRecoveryTransactionEnvironment->StandbyReleaseAllLocks,
which causes this assertion failure because the lock was not taken by
this backend.

By moving StandbyAcquireAccessExclusiveLock's LockAcquire ahead of
when it links the lock to the transaction, the local data structure
doesn't know to clean up the lock until after it's acquired, so
failure in that process won't make error cleanup try to clean up the
lock.

Kind regards,

Matthias van de Meent
Databricks (https://www.databricks.com)

Attachment Content-Type Size
v3-0001-IPC-standby-keep-better-track-of-taken-locks.patch application/octet-stream 1.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Florin Irion 2026-06-18 15:34:03 Re: Fix HAVING-to-WHERE pushdown with mismatched operator families
Previous Message Nathan Bossart 2026-06-18 15:21:38 Re: mxid_score can become Infinity in pg_stat_autovacuum_scores