VM corruption on standby

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject: VM corruption on standby
Date: 2025-08-06 14:59:58
Message-ID: B3C69B86-7F82-4111-B97F-0005497BB745@yandex-team.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers!

I was reviewing the patch about removing xl_heap_visible and found the VM\WAL machinery very interesting.
At Yandex we had several incidents with corrupted VM and on pgconf.dev colleagues from AWS confirmed that they saw something similar too.
So I toyed around and accidentally wrote a test that reproduces $subj.

I think the corruption happens as follows:
0. we create a table with one frozen tuple
1. next heap_insert() clears VM bit and hangs immediately, nothing was logged yet
2. VM buffer is flushed on disk with checkpointer or bgwriter
3. primary is killed with -9
now we have a page that is ALL_VISIBLE\ALL_FORZEN on standby, but clear VM bits on primary
4. subsequent insert does not set XLH_LOCK_ALL_FROZEN_CLEARED in it's WAL record
5. pg_visibility detects corruption

Interestingly, in an off-list conversation Melanie explained me how ALL_VISIBLE is protected from this: WAL-logging depends on PD_ALL_VISIBLE heap page bit, not a state of the VM. But for ALL_FROZEN this is not a case:

/* Clear only the all-frozen bit on visibility map if needed */
if (PageIsAllVisible(page) &&
visibilitymap_clear(relation, block, vmbuffer,
VISIBILITYMAP_ALL_FROZEN))
cleared_all_frozen = true; // this won't happen due to flushed VM buffer before a crash

Anyway, the test reproduces corruption of both bits. And also reproduces selecting deleted data on standby.

The test is not intended to be committed when we fix the problem, so some waits are simulated with sleep(1) and test is placed at modules/test_slru where it was easier to write. But if we ever want something like this - I can design a less hacky version. And, probably, more generic.

Thanks!

Best regards, Andrey Borodin.

Attachment Content-Type Size
v1-0001-Corrupt-VM-on-standby.patch application/octet-stream 10.6 KB
unknown_filename text/plain 2 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2025-08-06 15:03:57 Re: [PATCH] OAuth: fix performance bug with stuck multiplexer events
Previous Message Peter Geoghegan 2025-08-06 14:36:24 Re: index prefetching