From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Subject: | Re: VM corruption on standby |
Date: | 2025-08-19 14:54:09 |
Message-ID: | CA+hUKGKBUaWrJWSJCLtWqVvP4_aDPnPJ8LFv17wj-ViQQi4ouw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Aug 20, 2025 at 1:56 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2025-08-19 02:13:43 -0400, Tom Lane wrote:
> > > Then wouldn't backends blocked in LWLockAcquire(x) hang forever, after
> > > someone who holds x calls _exit()?
> >
> > If someone who holds x is killed by (say) the OOM killer, how do
> > we get out of that?
If a backend is killed by the OOM killer, the postmaster will of
course send SIGQUIT/SIGKILL to all backend. If the postmaster itself
is killed, then surviving backends will notice at their next
WaitEventSetWait() and exit, but if any are blocked in sem_wait(),
they it will only make progress because other exiting backends release
their LWLocks on their way out. So if we change that to _exit(), I
assume such backends would linger forever in sem_wait() after the
postmaster dies. I do agree that it seems quite weird to release all
locks as if this is a "normal" exit though, which is why Kirill and I
both wondered about other ways to boot them out of sem_wait()...
> On linux - the primary OS with OOM killer troubles - I'm pretty sure'll lwlock
> waiters would get killed due to the postmaster death signal we've configured
> (c.f. PostmasterDeathSignalInit()).
No, that has a handler that just sets a global variable. That was
done because recovery used to try to read() from the postmaster pipe
after replaying every record. Also we currently have some places that
don't want to be summarily killed (off the top of my head, syncrep
wants to send a special error message, and the logger wants to survive
longer than everyone else to catch as much output as possible, things
I've been thinking about in the context of threads).
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2025-08-19 14:57:43 | Re: VM corruption on standby |
Previous Message | Tom Lane | 2025-08-19 14:46:58 | Re: Remove traces of long in dynahash.c |