Re: VM corruption on standby

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kirill Reshke <reshkekirill(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject: Re: VM corruption on standby
Date: 2025-08-19 19:50:19
Message-ID: 872064.1755633019@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kirill Reshke <reshkekirill(at)gmail(dot)com> writes:
> On Tue, 19 Aug 2025 at 10:32, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>> Any idea involving deferring the handling of PM death from here
>> doesn't seem right: you'd keep waiting for the CV, but the backend
>> that would wake you might have exited.

Yeah. Taking the check for PM death out of here seems just about
as hazardous as leaving it in :-(. Not something I want to mess
with so late in the v18 cycle.

> I revert this commit (these were conflicts but i resolved them) and
> added assert for crit sections in WaitEventSetWait.

Your patch still contains some conflict markers :-(. Attached is
a corrected version, just to save other people the effort of fixing
the diffs themselves.

> make check passes (without v2-0001 it fails)

While 'make check' is okay with this assertion, 'make check-world'
still falls over if you have injection points enabled, because
src/test/modules/test_slru/t/001_multixact.pl also had the
cute idea of putting an injection-point wait inside a critical
section. I did not find any other failures though.

I'm inclined to think that we do want to prohibit WaitEventSetWait
inside a critical section --- it just seems like a bad idea all
around, even without considering this specific failure mode.
Therefore, I vote for reverting bc22dc0e0. Hopefully only
temporarily, but it's too late to figure out another way for v18,
and I don't think that bc22dc0e0 is such an essential improvement
that we can't afford to give it up for v18.

However, we can't install the proposed assertion until we do
something about that test_slru test. It seems like in general
it would be sad if we can't put injection points inside
critical sections, so I'm wondering if there's a way to
re-implement the injection point "wait" functionality without
depending on WaitEventSetWait. I would be willing to accept
weaker safety guarantees in this context, since we only anticipate
such cases being used in test scaffolding.

regards, tom lane

Attachment Content-Type Size
v3-0001-Revert-Get-rid-of-WALBufMappingLock.patch text/x-diff 15.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-08-19 19:55:41 Re: VM corruption on standby
Previous Message Masahiko Sawada 2025-08-19 19:23:44 Re: Conflict detection for update_deleted in logical replication