Re: [PATCH] Full support for index LP_DEAD hint bits on standby

From: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Full support for index LP_DEAD hint bits on standby
Date: 2021-01-27 19:27:21
Message-ID: CANtu0oh28mX5gy5jburH+n1mcczK5_dCQnhbBnCM=Pfqh-A26Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, hackers.

I think I was able to fix the issue related to minRecoveryPoint and crash
recovery. To make sure standby will be consistent after crash recovery, we
need to take the current value of minRecoveryPoint into account while
setting LP_DEAD hints (almost the same way as it is done for *heap* hint
bits already).

I have introduced new structure IndexHintBitsData:
-------
/* guaranteed not visible for all backends */
bool all_dead;

/* latest removed xid if known */
TransactionId latest_removed_xid;

/* lsn of page where dead tuple located */
XLogRecPtr page_lsn;
-------

This structure is filled by the `heap_hot_search_buffer` function. After,
we decide to set or not `kill_prior_tuple` depending on its content
(calling `IsMarkBufferDirtyIndexHintAllowed`).

For primary - it is always safe to set LP_DEAD in index if `all_dead` ==
true.

In the case of standby, we need to check `latest_removed_xid` (if
available) first. If commit LSN of the latest removed xid is already lower
than minRecoveryPoint (`XLogNeedsFlush`) - it is safe to set
`kill_prior_tuple`.

Sometimes we are not sure about the latest removed xid - heap record could
be marked dead by the XLOG_HEAP2_CLEAN record, for example. In such a case
we check the LSN of the *heap* page containing the tuple (LSN could be
updated by other transactions already - but it does not matter in that
situation). If page LSN is lower than minRecoveryPoint - it is safe to set
LP_DEAD in the index too. Otherwise - just leave the index tuple alive.

So, to bring it all together:

* Normal operation, proc->indexIgnoreKilledTuples is true:
It is safe for standby to use hint bits from the primary FPI because
of XLOG_INDEX_HINT_BITS_HORIZON conflict resolution.
It is safe for standby to set its index hint bits because
`ComputeXidHorizons` honors other read-only procs xmin and lowest xid on
primary (`KnownAssignedXidsGetOldestXmin`).

* Normal operation, proc->indexIgnoreKilledTuples is false:
Index hint bits are never set or taken into account.

* Crash recovery, proc->indexIgnoreKilledTuples is true:
It is safe for standby to use hint bits from the primary FPW because
XLOG_INDEX_HINT_BITS_HORIZON is always logged before FPI, and commit record
of transaction removed the tuple is logged before
XLOG_INDEX_HINT_BITS_HORIZON. So, if FPI with hints was flushed (and taken
into account by minRecoveryPoint) - both transaction-remover and horizon
records are replayed before reading queries.
It is safe for standby to use its hint bits because they can be set
only if the commit record of transaction-remover is lower than
minRecoveryPoint or LSN of heap page with removed tuples is lower than
minRecoveryPoint.

* Crash recovery, proc->indexIgnoreKilledTuples is false:
Index hint bits are never set or taken into account.

So, now it seems correct to me.

Another interesting point here - now position of minRecoveryPoint affects
performance a lot. It is happening already (because of *heap* hint bits)
but after the patch, it is noticeable even more. Is there any sense to keep
minRecoveryPoint at a low value?

Rebased and updated patch in attachment.

Will be happy if someone could recheck my ideas or even the code :)

Thanks a lot,
Michail.

Attachment Content-Type Size
test.patch text/x-patch 10.6 KB
docs.patch text/x-patch 5.9 KB
code.patch text/x-patch 53.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michail Nikolaev 2021-01-27 19:30:33 Re: Thoughts on "killed tuples" index hint bits support on standby
Previous Message Justin Pryzby 2021-01-27 19:06:40 Re: Add MAIN_RELATION_CLEANUP and SECONDARY_RELATION_CLEANUP options to VACUUM