Re: Unexpected behavior after OOM errors

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unexpected behavior after OOM errors
Date: 2026-06-18 04:37:34
Message-ID: ajN2DkXPOzzC1GSj@paquier.xyz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 17, 2026 at 02:27:25PM +0200, Matthias van de Meent wrote:
> On Wed, 17 Jun 2026 at 08:00, Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
>> 1) An issue in lookup_type_cache()
>
> I believe this is caused by partial subsystem initialization. Attached
> patch 0001 should address this failure without causing the server to
> restart on OOM.

Hmm. I think that this is an ordering problem. We could make the
callbacks be registered last, once we are sure that the two hash
tables and the in-progress list have been initialized. I am not sure
that this requires a new facility; it is also an advantage to keep the
initialization sequence in a one code path, without an abstraction.

RelIdToTypeIdCacheHash and RelIdToTypeIdCacheHash are in the
TopMemoryContext, static to the process, so we could just check them
for NULL-ness to make the initialization repeatable. That gives me
the attached v2. Reusing Alexander's randomness trick, that looks
stable here.

>> 2) An issue in GetSnapshotData()
>
> Again, caused by partial initialization, though in this case it's of a
> SnapshotData* which is later checked again. Attached patch 0002 should
> address this failure.

Yeah, that seems right to make repeated calls of GetSnapshotData()
able to work. LGTM.

>> 3) An issue in StandbyAcquireAccessExclusiveLock()
> <snip>
>
> I'm not sure how to solve this correctly; I think ideally the
> StandbyAcquireAccessExclusiveLock() hash code would be wrapped by a
> critical section, but I'm not 100% sure if that will be a sufficient
> approach; and it'd definitely need some code to allow the various
> hashmaps' memctxs to alloc during critical sections.

Not checked this one yet.

Thoughts about the first part?
--
Michael

Attachment Content-Type Size
v2-0001-typcache-Make-initialization-more-resilient-on-OO.patch text/plain 3.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2026-06-18 04:50:42 Re: ci: CCache churns through available space too quickly
Previous Message Ashutosh Sharma 2026-06-18 04:26:34 Re: pg_stat_replication docs incomplete for logical replication