Re: VM corruption on standby

From: Kirill Reshke <reshkekirill(at)gmail(dot)com>
To: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject: Re: VM corruption on standby
Date: 2025-08-19 17:00:53
Message-ID: CALdSSPisWpkL+-_vS7B7vonX1XTC8aVkPhj3BBc2wtmuZ_a7cQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 19 Aug 2025 at 21:16, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:

>
> That is not true.
> elog(PANIC) doesn't clear LWLocks. And XLogWrite, which is could be called
> from AdvanceXLInsertBuffer, may call elog(PANIC) from several places.
>
> It doesn't lead to any error, because usually postmaster is alive and it
> will kill -9 all its children if any one is died in critical section.
>
> So the problem is postmaster is already killed with SIGKILL by definition
> of the issue.
>
> Documentation says [0]:
> > If at all possible, do not use SIGKILL to kill the main postgres server.
> > Doing so will prevent postgres from freeing the system resources (e.g.,
> shared memory and semaphores) that it holds before terminating.
>
> Therefore if postmaster SIGKILL-ed, administrator already have to do some
> actions.
>

There are surely many cases when a system reaches the state which can
only be fixed by admin action.
The elog(PANIC) in the CRIT section is very rare (and very probably is
corruption already).
The simpler example is to kill-9 postmaster and then immediately
kill-9 someone who holds LWLock.
The problem is in pgv18 is that this state probability is much higher
due to the aforementioned commit. In can happen with almost
any OOM on highly loaded systems.

--
Best regards,
Kirill Reshke

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message 章晨曦 2025-08-19 17:06:20 Re: Performance issue on temporary relations
Previous Message Robert Haas 2025-08-19 16:47:34 RFC: extensible planner state