Re: VM corruption on standby

From: Aleksander Alekseev <aleksander(at)tigerdata(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject: Re: VM corruption on standby
Date: 2025-08-07 15:17:17
Message-ID: CAJ7c6TOtYagmAm+f4B3JEWoahG3bocoBNe1Gvdrjejo5MMMC1g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

> If my understanding is correct, we should make a WAL record with the
> XLH_LOCK_ALL_FROZEN_CLEARED flag *before* we modify the VM but within
> the same critical section [...]
>
> A draft patch is attached. It makes the test pass and doesn't seem to
> break any other tests.
>
> Thoughts?

In order not to forget - assuming I'm not wrong about the cause of the
issue, we might want to recheck the order of visibilitymap_* and XLog*
calls in the following functions too:

- heap_multi_insert
- heap_delete
- heap_update
- heap_lock_tuple
- heap_lock_updated_tuple_rec

By a quick look all named functions modify the VM before making a
corresponding WAL record. This can cause a similar issue:

1. VM modified
2. evicted asynchronously before logging
3. kill 9
4. different state of VM on primary and standby

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ilia Evdokimov 2025-08-07 15:23:15 Re: stylesheet-html-common: only apply Bootstrap container classes in website build
Previous Message Xuneng Zhou 2025-08-07 15:00:50 Re: Implement waiting for wal lsn replay: reloaded