From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Kirill Reshke <reshkekirill(at)gmail(dot)com> |
Cc: | Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Subject: | Re: VM corruption on standby |
Date: | 2025-08-19 18:08:19 |
Message-ID: | 599759.1755626899@sss.pgh.pa.us |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Kirill Reshke <reshkekirill(at)gmail(dot)com> writes:
> On Tue, 19 Aug 2025 at 21:16, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
>> `if (CritSectionCount != 0) _exit(2) else proc_exit(1)` in
>> WaitEventSetWaitBlock () solves the issue of inconsistency IF POSTMASTER IS
>> SIGKILLED, and doesn't lead to any problem, if postmaster is not SIGKILL-ed
>> (since postmaster will SIGKILL its children).
> This fix was proposed in this thread. It fixes inconsistency but it
> replaces one set of problems with another set, namely systems that
> fail to shut down.
I think a bigger objection is that it'd result in two separate
shutdown behaviors in what's already an extremely under-tested
(and hard to test) scenario. I don't want to have to deal with
the ensuing state-space explosion.
I still think that proc_exit(1) is fundamentally the wrong thing
to do if the postmaster is gone: that code path assumes that
the cluster is still functional, which is at best shaky.
I concur though that we'd have to do some more engineering work
before _exit(2) would be a practical solution.
In the meantime, it seems like this discussion point arises
only because the presented test case is doing something that
seems pretty unsafe, namely invoking WaitEventSet inside a
critical section.
We'd probably be best off to get back to the actual bug the
thread started with, namely whether we aren't doing the wrong
thing with VM-update order of operations.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | 章晨曦 | 2025-08-19 18:13:01 | Re: Performance issue on temporary relations |
Previous Message | Andres Freund | 2025-08-19 18:06:50 | Re: Improve LWLock tranche name visibility across backends |