Re: VM corruption on standby

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kirill Reshke <reshkekirill(at)gmail(dot)com>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: VM corruption on standby
Date: 2025-08-17 14:33:46
Message-ID: 168715.1755441226@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Kirill Reshke <reshkekirill(at)gmail(dot)com> writes:
> [ v1-0001-Do-not-exit-on-postmaster-death-ever-inside-CRIT-.patch ]

I do not like this patch one bit: it will replace one set of problems
with another set, namely systems that fail to shut down.

I think the actual bug here is the use of proc_exit(1) after
observing postmaster death. That is what creates the hazard,
because it releases the locks that are preventing other processes
from observing the inconsistent state in shared memory.
Compare this to what we do, for example, on receipt of SIGQUIT:

/*
* We DO NOT want to run proc_exit() or atexit() callbacks -- we're here
* because shared memory may be corrupted, so we don't want to try to
* clean up our transaction. Just nail the windows shut and get out of
* town. The callbacks wouldn't be safe to run from a signal handler,
* anyway.
*
* Note we do _exit(2) not _exit(0). This is to force the postmaster into
* a system reset cycle if someone sends a manual SIGQUIT to a random
* backend. This is necessary precisely because we don't clean up our
* shared memory state. (The "dead man switch" mechanism in pmsignal.c
* should ensure the postmaster sees this as a crash, too, but no harm in
* being doubly sure.)
*/
_exit(2);

So I think the correct fix here is s/proc_exit(1)/_exit(2)/ in the
places that are responding to postmaster death. There might be
more than just WaitEventSetWaitBlock; I didn't look.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-08-17 15:19:22 Re: psql: Count all table footer lines in pager setup
Previous Message Etsuro Fujita 2025-08-17 10:50:51 Re: Obsolete comments in ResultRelInfo struct