From: | Kirill Reshke <reshkekirill(at)gmail(dot)com> |
---|---|
To: | Andrey Borodin <x4mmm(at)yandex-team(dot)ru> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Subject: | Re: VM corruption on standby |
Date: | 2025-08-14 05:41:50 |
Message-ID: | CALdSSPhO4zJAPEKu6wuxg362d0uHv1Qr8f83q-o4T54c0J4GgA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, 13 Aug 2025 at 16:15, I wrote:
> I did not find any doc or other piece of information indicating
> whether WaitEventSetWait and critical sections are allowed. But I do
> thing this is bad, because we do not process interruptions during
> critical sections, so it is unclear to me why we should handle
> postmaster death any differently.
Maybe I'm very wrong about this, but I'm currently suspecting there is
corruption involving CHECKPOINT, process in CRIT section and kill -9.
The scenario I am trying to reproduce is following:
1) Some process p1 locks some buffer (name it buf1), enters CRIT
section, calls MarkBufferDirty and hangs inside XLogInsert on CondVar
in (GetXLogBuffer -> AdvanceXLInsertBuffer).
2) CHECKPOINT (p2) stars and tries to FLUSH dirty buffers, awaiting lock on buf1
3) Postmaster kill-9-ed
4) signal of postmaster death delivered to p1, it wakes up in
WaitLatch/WaitEventSetWaitBlock functions, checks postmaster
aliveness, and exits releasing all locks.
5) p2 acquires locks on buf1 and flushes it to disk.
6) signal of postmaster death delivered to p2, p2 exits.
And we now have a case when the buffer is flushed to disk, while the
xlog record that describes this change never makes it to disk. This is
very bad.
To be clear, I am trying to avoid use of inj points to reproduce
corruption. I am not yet successful in this though.
--
Best regards,
Kirill Reshke
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-08-14 05:49:06 | Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem) |
Previous Message | Shinya Kato | 2025-08-14 04:26:17 | Re: Add log_autovacuum_{vacuum|analyze}_min_duration |