Re: VM corruption on standby

From: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
To: Kirill Reshke <reshkekirill(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject: Re: VM corruption on standby
Date: 2025-08-19 16:16:39
Message-ID: 26a0a0c0-fc61-4499-be81-d872bccf6625@postgrespro.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

19.08.2025 16:43, Kirill Reshke пишет:
> On Tue, 19 Aug 2025 at 18:29, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> wrote:
>
>
>> Latch and ConditionVariable (that uses Latch) are among basic
>> synchronization primitives in PostgreSQL.
>
> Sure
>
>> Therefore they have to work correctly in any place: in critical section, in
>> wal logging, etc.
>
> No. Before bc22dc0e0ddc2dcb6043a732415019cc6b6bf683 ConditionVariable
> code path was never exercised in critical sections. After
> bc22dc0e0ddc2dcb6043a732415019cc6b6bf683 it is exercised in almost
> every one (if the system is highly loaded). This is a crucial change
> with corruption as a drawback (until we fix this).
>
> To replace proc_exit(1) with _exit(2) is not a cure too: if we exit
> inside CRIT section without any message to LWLock contenders, they
> will never do the same (never exit), because they are wait the
> semaphore and do not respond to signals (so, only way to stop them in
> to kill-9). Before bc22dc0e0ddc2dcb6043a732415019cc6b6bf683 lwlock
> holders did not exit inside crit sections (unless kill9)

That is not true.
elog(PANIC) doesn't clear LWLocks. And XLogWrite, which is could be called
from AdvanceXLInsertBuffer, may call elog(PANIC) from several places.

It doesn't lead to any error, because usually postmaster is alive and it
will kill -9 all its children if any one is died in critical section.

So the problem is postmaster is already killed with SIGKILL by definition
of the issue.

Documentation says [0]:
> If at all possible, do not use SIGKILL to kill the main postgres server.
> Doing so will prevent postgres from freeing the system resources (e.g.,
shared memory and semaphores) that it holds before terminating.

Therefore if postmaster SIGKILL-ed, administrator already have to do some
actions.

So the issue Andrey Borodin arose is without any fix Pg18 does provides
inconsistency between data and the WAL log: data could written to main fork
and vm fork although WAL is not written yet IF POSTMASTER IS SIGKILL-ed.

`if (CritSectionCount != 0) _exit(2) else proc_exit(1)` in
WaitEventSetWaitBlock () solves the issue of inconsistency IF POSTMASTER IS
SIGKILLED, and doesn't lead to any problem, if postmaster is not SIGKILL-ed
(since postmaster will SIGKILL its children).

> I had one suggestion about what can be done [0]. However there is
> little no time until the pg18 release for a change that scary and big
> (my own understanding), so the safest option is to revert.
>
> [0] https://www.postgresql.org/message-id/CALdSSPgDAyqt%3DORyLMWMpotb9V4Jk1Am%2Bhe39mNtBA8%2Ba8TQDw%40mail.gmail.com

[0] https://www.postgresql.org/docs/17/app-postgres.html

--
regards
Yura Sokolov aka funny-falcon

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message 章晨曦 2025-08-19 16:18:34 Re: Performance issue on temporary relations
Previous Message Tom Lane 2025-08-19 16:02:39 Re: Performance issue on temporary relations