Re: [PATCH] Btree BackwardScan race condition on Standby during VACUUM

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Btree BackwardScan race condition on Standby during VACUUM
Date: 2020-03-28 03:17:54
Message-ID: CAH2-WznoARFpGv-RnFo+e8PE-kpCUzHcGOGkieyQYRz-wS6Lqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 16, 2020 at 7:08 AM Michail Nikolaev
<michail(dot)nikolaev(at)gmail(dot)com> wrote:
> I was sure I have broken something in btree and spent a lot of time
> trying to figure what.
> And later... I realized what it is bug in btree since a very old times...
> Because of much faster scans with LP_DEAD support on a standby it
> happens much more frequently in my case.

On second thought, I wonder how commit 558a9165 could possibly be
relevant here. nbtree VACUUM doesn't care about the LP_DEAD bit at
all. Sure, btree_xlog_delete_get_latestRemovedXid() is not going to
have to run on the standby on Postgres 12, but that only ever happened
at the point where we might have to split the page on the primary
(i.e. when _bt_delitems_delete() is called on the primary) anyway.
_bt_delitems_delete()/btree_xlog_delete_get_latestRemovedXid() are not
related to page deletion by VACUUM.

It's true that VACUUM will routinely kill tuples that happen to have
their LP_DEAD bit set, but it isn't actually influenced by the fact
that somebody set (or didn't set) any tuple's LP_DEAD bit. VACUUM has
its own strategy for generating recovery conflicts (it relies on
conflicts generated during the pruning phase of heap VACUUMing).
VACUUM is not willing to generate ad-hoc conflicts (in the style of
_bt_delitems_delete()) just to kill a few more tuples in relatively
uncommon cases -- cases where some LP_DEAD bits were set after a
VACUUM process started, but before the VACUUM process reached an
affected (LP_DEAD bits set) leaf page.

Again, I suspect that the problem is more likely to occur on Postgres
12 in practice because page deletion is more likely to occur on that
version. IOW, due to my B-Tree work for Postgres 12: commit dd299df8,
and related commits. That's probably all that there is to it.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-03-28 03:28:27 Re: Online checksums verification in the backend
Previous Message Tomas Vondra 2020-03-28 02:58:30 Re: [PATCH] Incremental sort (was: PoC: Partial sort)