From: | Konstantin Knizhnik <knizhnik(at)garret(dot)ru> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Lack of PageSetLSN in heap_xlog_visible |
Date: | 2022-10-13 09:50:37 |
Message-ID: | fed17dac-8cb8-4f5b-d462-1bb4908c029e@garret.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi hackers!
heap_xlog_visible is not bumping heap page LSN when setting all-visible
flag in it.
There is long comment explaining it:
/*
* We don't bump the LSN of the heap page when setting the
visibility
* map bit (unless checksums or wal_hint_bits is enabled, in which
* case we must), because that would generate an unworkable
volume of
* full-page writes. This exposes us to torn page hazards, but
since
* we're not inspecting the existing page contents in any way, we
* don't care.
*
* However, all operations that clear the visibility map bit
*do* bump
* the LSN, and those operations will only be replayed if the
XLOG LSN
* follows the page LSN. Thus, if the page LSN has advanced
past our
* XLOG record's LSN, we mustn't mark the page all-visible, because
* the subsequent update won't be replayed to clear the flag.
*/
But it still not clear for me that not bumping LSN in this place is
correct if wal_log_hints is set.
In this case we will have VM page with larger LSN than heap page,
because visibilitymap_set
bumps LSN of VM page. It means that in theory after recovery we may have
page marked as all-visible in VM,
but not having PD_ALL_VISIBLE in page header. And it violates VM
constraint:
* When we *set* a visibility map during VACUUM, we must write WAL.
This may
* seem counterintuitive, since the bit is basically a hint: if it is
clear,
* it may still be the case that every tuple on the page is visible to all
* transactions; we just don't know that for certain. The difficulty
is that
* there are two bits which are typically set together: the
PD_ALL_VISIBLE bit
* on the page itself, and the visibility map bit. If a crash occurs
after the
* visibility map page makes it to disk and before the updated heap
page makes
* it to disk, redo must set the bit on the heap page. Otherwise, the next
* insert, update, or delete on the heap page will fail to realize that the
* visibility map bit must be cleared, possibly causing index-only scans to
* return wrong answers.
From | Date | Subject | |
---|---|---|---|
Next Message | Bharath Rupireddy | 2022-10-13 09:55:27 | Re: archive modules |
Previous Message | Alvaro Herrera | 2022-10-13 09:42:04 | Re: Move backup-related code to xlogbackup.c/.h |