Re: VM corruption on standby

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kirill Reshke <reshkekirill(at)gmail(dot)com>
Cc: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject: Re: VM corruption on standby
Date: 2025-08-19 18:08:19
Message-ID: 599759.1755626899@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kirill Reshke <reshkekirill(at)gmail(dot)com> writes:
> On Tue, 19 Aug 2025 at 21:16, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
>> `if (CritSectionCount != 0) _exit(2) else proc_exit(1)` in
>> WaitEventSetWaitBlock () solves the issue of inconsistency IF POSTMASTER IS
>> SIGKILLED, and doesn't lead to any problem, if postmaster is not SIGKILL-ed
>> (since postmaster will SIGKILL its children).

> This fix was proposed in this thread. It fixes inconsistency but it
> replaces one set of problems with another set, namely systems that
> fail to shut down.

I think a bigger objection is that it'd result in two separate
shutdown behaviors in what's already an extremely under-tested
(and hard to test) scenario. I don't want to have to deal with
the ensuing state-space explosion.

I still think that proc_exit(1) is fundamentally the wrong thing
to do if the postmaster is gone: that code path assumes that
the cluster is still functional, which is at best shaky.
I concur though that we'd have to do some more engineering work
before _exit(2) would be a practical solution.

In the meantime, it seems like this discussion point arises
only because the presented test case is doing something that
seems pretty unsafe, namely invoking WaitEventSet inside a
critical section.

We'd probably be best off to get back to the actual bug the
thread started with, namely whether we aren't doing the wrong
thing with VM-update order of operations.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message 章晨曦 2025-08-19 18:13:01 Re: Performance issue on temporary relations
Previous Message Andres Freund 2025-08-19 18:06:50 Re: Improve LWLock tranche name visibility across backends