From: | Tomas Vondra <tomas(at)vondra(dot)me> |
---|---|
To: | Daniel Gustafsson <daniel(at)yesql(dot)se> |
Cc: | Bernd Helmle <mailings(at)oopsware(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Michael Banck <mbanck(at)gmx(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Changing the state of data checksums in a running cluster |
Date: | 2025-08-27 12:42:05 |
Message-ID: | dfe57980-f594-46c5-af39-852ff30d34fa@vondra.me |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 8/27/25 14:39, Tomas Vondra wrote:
> ...
>
> And this happened on Friday:
>
> commit c13070a27b63d9ce4850d88a63bf889a6fde26f0
> Author: Alexander Korotkov <akorotkov(at)postgresql(dot)org>
> Date: Fri Aug 22 18:44:39 2025 +0300
>
> Revert "Get rid of WALBufMappingLock"
>
> This reverts commit bc22dc0e0ddc2dcb6043a732415019cc6b6bf683.
> It appears that conditional variables are not suitable for use
> inside critical sections. If WaitLatch()/WaitEventSetWaitBlock()
> face postmaster death, they exit, releasing all locks instead of
> PANIC. In certain situations, this leads to data corruption.
>
> ...
>
> I think it's very likely the checksums were broken by this. After all,
> that linked thread has subject "VM corruption on standby" and I've only
> ever seen checksum failures on standby on the _vm fork.
>
Forgot to mention - I did try with c13070a27b reverted, and with that I
can reproduce the checksum failures again (using the fixed TAP test).
It's not a definitive proof, but it's a hint c13070a27b63 was causing
the checksum failures.
regards
--
Tomas Vondra
From | Date | Subject | |
---|---|---|---|
Next Message | Kirill Reshke | 2025-08-27 12:55:27 | Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) |
Previous Message | Tomas Vondra | 2025-08-27 12:39:38 | Re: Changing the state of data checksums in a running cluster |