Re: Thoughts on "killed tuples" index hint bits support on standby

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Thoughts on "killed tuples" index hint bits support on standby
Date: 2021-02-01 22:45:52
Message-ID: CAH2-Wzmii7Sx7wpkaMmts-eHxRnB5Kx8s+PHsCvuEmG8VrvDmw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 1, 2021 at 1:19 PM Michail Nikolaev
<michail(dot)nikolaev(at)gmail(dot)com> wrote:
> It is fine to receive a page to the standby from any source: `btpo_flags` should have some kind “LP_DEAD safe for standby” bit set to allow new bits to be set and old - read.
>
> > We can't really mask LP_DEAD bits from
> > the primary in recovery anyway, because of stuff like page-level
> > checksums. I suspect that it just isn't useful to fight against that.
>
> As far as I can see - there is no problem here. Checksums already differ for both heap and index pages on standby and primary.

AFAICT that's not true, at least not in any practical sense. See the
comment in the middle of MarkBufferDirtyHint() that begins with "If we
must not write WAL, due to a relfilenode-specific...", and see the
"Checksums" section at the end of src/backend/storage/page/README. The
last paragraph in the README is particularly relevant:

New WAL records cannot be written during recovery, so hint bits set during
recovery must not dirty the page if the buffer is not already dirty, when
checksums are enabled. Systems in Hot-Standby mode may benefit from hint bits
being set, but with checksums enabled, a page cannot be dirtied after setting a
hint bit (due to the torn page risk). So, it must wait for full-page images
containing the hint bit updates to arrive from the primary.

IIUC the intention is that MarkBufferDirtyHint() is a no-op during hot
standby when we successfully set a hint bit, though only in the
XLogHintBitIsNeeded() case. So we don't really dirty the page within
SetHintBits() in this specific scenario. That is, the buffer header
won't actually get marked BM_DIRTY or BM_JUST_DIRTIED within
MarkBufferDirtyHint() when in Hot Standby + XLogHintBitIsNeeded().
What else could work at all? The only "alternative" is to write an
FPI, just like on the primary -- but writing new WAL records is not
possible during Hot Standby!

A comment within MarkBufferDirtyHint() spells it out directly -- we
can have hint bits set in hot standby independently of the primary,
but it works in a way that makes sure that the hint bits never make it
out to disk:

"We can set the hint, just not dirty the page as a result so the hint
is lost when we evict the page or shutdown"

You may be right in some narrow sense -- checksums can differ on a
standby. But that's like saying that it's sometimes okay to have a
torn page on disk. Yes, it's okay, but only because we expect the
problem during crash recovery, and can reliably repair it.

> Checksums are calculated before the page is written to the disk (not after applying FPI). So, the masking page during *applying* the FPI is semantically the same as setting a bit in it 1 nanosecond after.
>
> And `btree_mask` (and other mask functions) already used for consistency checks to exclude LP_DEAD.

I don't see how that is relevant. btree_mask() is only used by
wal_consistency_checking, which is mostly just for Postgres hackers.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2021-02-01 22:57:33 Re: POC: Cleaning up orphaned files using undo logs
Previous Message Bruce Momjian 2021-02-01 22:44:29 Re: Key management with tests