Lack of PageSetLSN in heap_xlog_visible

From: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Lack of PageSetLSN in heap_xlog_visible
Date: 2022-10-13 09:50:37
Message-ID: fed17dac-8cb8-4f5b-d462-1bb4908c029e@garret.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers!

heap_xlog_visible is not bumping heap page LSN when setting all-visible
flag in it.
There is long comment explaining it:

        /*
         * We don't bump the LSN of the heap page when setting the
visibility
         * map bit (unless checksums or wal_hint_bits is enabled, in which
         * case we must), because that would generate an unworkable
volume of
         * full-page writes.  This exposes us to torn page hazards, but
since
         * we're not inspecting the existing page contents in any way, we
         * don't care.
         *
         * However, all operations that clear the visibility map bit
*do* bump
         * the LSN, and those operations will only be replayed if the
XLOG LSN
         * follows the page LSN.  Thus, if the page LSN has advanced
past our
         * XLOG record's LSN, we mustn't mark the page all-visible, because
         * the subsequent update won't be replayed to clear the flag.
         */

But it still not clear for me that not bumping LSN in this place is
correct if wal_log_hints is set.
In this case we will have VM page with larger LSN than heap page,
because visibilitymap_set
bumps LSN of VM page. It means that in theory after recovery we may have
page marked as all-visible in VM,
but not having PD_ALL_VISIBLE  in page header. And it violates VM
constraint:

 * When we *set* a visibility map during VACUUM, we must write WAL.
This may
 * seem counterintuitive, since the bit is basically a hint: if it is
clear,
 * it may still be the case that every tuple on the page is visible to all
 * transactions; we just don't know that for certain.  The difficulty
is that
 * there are two bits which are typically set together: the
PD_ALL_VISIBLE bit
 * on the page itself, and the visibility map bit.  If a crash occurs
after the
 * visibility map page makes it to disk and before the updated heap
page makes
 * it to disk, redo must set the bit on the heap page.  Otherwise, the next
 * insert, update, or delete on the heap page will fail to realize that the
 * visibility map bit must be cleared, possibly causing index-only scans to
 * return wrong answers.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2022-10-13 09:55:27 Re: archive modules
Previous Message Alvaro Herrera 2022-10-13 09:42:04 Re: Move backup-related code to xlogbackup.c/.h