Re: VM corruption on standby

From: Kirill Reshke <reshkekirill(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: VM corruption on standby
Date: 2025-08-18 10:32:19
Message-ID: CALdSSPhu8U9JVe5dOJySk6bzwHVBmgrEGkcsduYjrY=TqeG5bQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 18 Aug 2025 at 13:15, I wrote:
> > I do not like this patch one bit: it will replace one set of problems
> > with another set, namely systems that fail to shut down.
>
> I did not observe this during my by-hand testing.

I am sorry: I was wrong. This is exactly what happens in this test
(modified 001_multixact.pl).
To be precise, after the INSERT process exits, CHECKPOINT process
waits indefinitely in LWLockAcquire.

It looks like the reason why proc_exit(1) releases all holded lwlocks
is because we use it to notify all lwlock contenders through shared
memory about state change, which will not be notified otherwise, since
we do not check for signals inside LWLockAcquire.

Looks like we need to do something like ConditionVariableBroadcast(),
but without lwlock release, to notify all lwlock contenders and then
exit(2).

===
As for the fix, I am now trying to make attached work. The idea to
"escalate" proc_exit to immediately exit via syscall comes to my mind
from how elog(ERROR) behaves in CRIT sections (every elog(ERROR)
efficiently becomes elog(PANIC)).

--
Best regards,
Kirill Reshke

Attachment Content-Type Size
v2-0001-Do-not-exit-on-postmaster-death-even-inside-CRIT-.patch application/octet-stream 1.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2025-08-18 11:31:48 Re: SQL Property Graph Queries (SQL/PGQ)
Previous Message Peter Eisentraut 2025-08-18 10:23:03 Re: fixing tsearch locale support