Re: Unexpected behavior after OOM errors

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unexpected behavior after OOM errors
Date: 2026-06-18 23:29:03
Message-ID: ajR_Py8uB9Vn56W6@paquier.xyz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 18, 2026 at 05:27:57PM +0200, Matthias van de Meent wrote:
> By moving StandbyAcquireAccessExclusiveLock's LockAcquire ahead of
> when it links the lock to the transaction, the local data structure
> doesn't know to clean up the lock until after it's acquired, so
> failure in that process won't make error cleanup try to clean up the
> lock.

Yep, reordering these two actions would take care of the list
inconsistency where the startup process goes down following the ERROR
promoted to a FATAL.

I have been fingering the idea of backpatching this fix for a few
minutes, actually, but discarded the idea at the end. It does not
require a random pattern to cause the failure, being actionable
through a combination of GUCs as Alexander has proved. Still, the
only consequence is an extra LOG entry telling that the lock is not
being tracked for non-assert builds. Confusing, OK, but not really
critical.

Comments?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2026-06-18 23:35:14 Re: PG20 Minimum Dependency Thread
Previous Message Jeff Davis 2026-06-18 23:21:33 Re: Avoid orphaned objects dependencies, take 3