Re: Thoughts on "killed tuples" index hint bits support on standby

From: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Thoughts on "killed tuples" index hint bits support on standby
Date: 2021-02-02 20:31:00
Message-ID: CANtu0oiAtteJ+MpPonBg6WfEsJCKrxuLK15P6GsaGDcYGjefVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, Peter.

> AFAICT that's not true, at least not in any practical sense. See the
> comment in the middle of MarkBufferDirtyHint() that begins with "If we
> must not write WAL, due to a relfilenode-specific...", and see the
> "Checksums" section at the end of src/backend/storage/page/README. The
> last paragraph in the README is particularly relevant:

I have attached a TAP-test to demonstrate how easily checksums on standby
and primary starts to differ. The test shows two different scenarios - for
both heap and index (and the bit is placed in both standby and primary).

Yes, MarkBufferDirtyHint does not mark the page as dirty… So, hint bits on
secondary could be easily lost. But it leaves the page dirty if it already
is (or it could be marked dirty by WAL replay later). So, hints bits could
be easily flushed and taken into account during checksum calculation on
both - standby and primary.

> "We can set the hint, just not dirty the page as a result so the hint
> is lost when we evict the page or shutdown"

Yes, it is not allowed to mark a page as dirty because of hints on standby.
Because we could achieve this:

CHECKPOINT
SET HINT BIT
TORN FLUSH + CRASH = BROKEN CHECKSUM, SERVER FAULT

But this scenario is totally fine:

CHECKPOINT
FPI (page is still dirty)
SET HINT BIT
TORN FLUSH + CRASH = PAGE IS RECOVERED, EVERYTHING IS OK

And, as result, this is fine too:

CHECKPOINT
FPI WITH MASKED LP_DEAD (page is still dirty)
SET HINT BIT
TORN FLUSH + CRASH = PAGE IS RECOVERED + LP_DEAD MASKED AGAIN IF STANDBY

So, my point here - it is fine to mask LP_DEAD bits during replay because
they are already different on standby and primary. And it is fine to set
and flush hint bits (and LP_DEADs) on standby because they already could be
easily flushed (just need to consider minRecovertPoint and, probably,
OldesXmin from primary in case of LP_DEAD to make promotion easily).

>> And `btree_mask` (and other mask functions) already used for consistency
checks to exclude LP_DEAD.
> I don't see how that is relevant. btree_mask() is only used by
> wal_consistency_checking, which is mostly just for Postgres hackers.

I was thinking about the possibility to reuse these functions in masking
during replay.

Thanks,
Michail.

Attachment Content-Type Size
022_checksum_tests.pl application/x-perl 5.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2021-02-02 20:33:14 Re: New IndexAM API controlling index vacuum strategies
Previous Message Joel Jacobson 2021-02-02 20:05:01 Re: Recording foreign key relationships for the system catalogs