Quick Links

Re: VM corruption on standby

From:	Aleksander Alekseev <aleksander(at)tigerdata(dot)com>
To:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc:	Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject:	Re: VM corruption on standby
Date:	2025-08-07 15:17:17
Message-ID:	CAJ7c6TOtYagmAm+f4B3JEWoahG3bocoBNe1Gvdrjejo5MMMC1g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

> If my understanding is correct, we should make a WAL record with the
> XLH_LOCK_ALL_FROZEN_CLEARED flag *before* we modify the VM but within
> the same critical section [...]
>
> A draft patch is attached. It makes the test pass and doesn't seem to
> break any other tests.
>
> Thoughts?

In order not to forget - assuming I'm not wrong about the cause of the
issue, we might want to recheck the order of visibilitymap_* and XLog*
calls in the following functions too:

- heap_multi_insert
- heap_delete
- heap_update
- heap_lock_tuple
- heap_lock_updated_tuple_rec

By a quick look all named functions modify the VM before making a
corresponding WAL record. This can cause a similar issue:

1. VM modified
2. evicted asynchronously before logging
3. kill 9
4. different state of VM on primary and standby

In response to

Re: VM corruption on standby at 2025-08-07 14:09:39 from Aleksander Alekseev

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Ilia Evdokimov	2025-08-07 15:23:15	Re: stylesheet-html-common: only apply Bootstrap container classes in website build
Previous Message	Xuneng Zhou	2025-08-07 15:00:50	Re: Implement waiting for wal lsn replay: reloaded