Quick Links

Re: VM corruption on standby

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject:	Re: VM corruption on standby
Date:	2025-08-19 14:54:09
Message-ID:	CA+hUKGKBUaWrJWSJCLtWqVvP4_aDPnPJ8LFv17wj-ViQQi4ouw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Aug 20, 2025 at 1:56 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2025-08-19 02:13:43 -0400, Tom Lane wrote:
> > > Then wouldn't backends blocked in LWLockAcquire(x) hang forever, after
> > > someone who holds x calls _exit()?
> >
> > If someone who holds x is killed by (say) the OOM killer, how do
> > we get out of that?

If a backend is killed by the OOM killer, the postmaster will of
course send SIGQUIT/SIGKILL to all backend. If the postmaster itself
is killed, then surviving backends will notice at their next
WaitEventSetWait() and exit, but if any are blocked in sem_wait(),
they it will only make progress because other exiting backends release
their LWLocks on their way out. So if we change that to _exit(), I
assume such backends would linger forever in sem_wait() after the
postmaster dies. I do agree that it seems quite weird to release all
locks as if this is a "normal" exit though, which is why Kirill and I
both wondered about other ways to boot them out of sem_wait()...

> On linux - the primary OS with OOM killer troubles - I'm pretty sure'll lwlock
> waiters would get killed due to the postmaster death signal we've configured
> (c.f. PostmasterDeathSignalInit()).

No, that has a handler that just sets a global variable. That was
done because recovery used to try to read() from the postmaster pipe
after replaying every record. Also we currently have some places that
don't want to be summarily killed (off the top of my head, syncrep
wants to send a special error message, and the logger wants to survive
longer than everyone else to catch as much output as possible, things
I've been thinking about in the context of threads).

In response to

Re: VM corruption on standby at 2025-08-19 13:56:27 from Andres Freund

Responses

Re: VM corruption on standby at 2025-08-19 14:57:43 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2025-08-19 14:57:43	Re: VM corruption on standby
Previous Message	Tom Lane	2025-08-19 14:46:58	Re: Remove traces of long in dynahash.c