Re: nbtree VACUUM's REDO routine doesn't clear page's VACUUM cycle ID

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: nbtree VACUUM's REDO routine doesn't clear page's VACUUM cycle ID
Date: 2025-11-18 06:32:19
Message-ID: 6E91F356-B8F6-411F-99B8-56F0AB1B1CFD@yandex-team.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 24 Dec 2024, at 01:46, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Wed, Nov 20, 2024 at 4:41 AM Andrey M. Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>>> On 15 Nov 2024, at 21:33, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>>> I propose this for the master branch only.
>>
>> The change seems correct to me: anyway cycle must be less than cycle of any future vacuum after promotion.
>
> The cycles set in the page special area during page splits that happen
> to run while a VACUUM also runs must use that same VACUUM's cycle ID
> (which is stored in shared memory for the currently running VACUUM).
> That way the VACUUM will know when it must backtrack later on, to
> avoid missing index tuples that it is expected to remove.
>
> It doesn't matter if the cycle_id that VACUUM sees is less than or
> greater than its own one -- only that it matches its own one when it
> needs to match to get correct behavior from VACUUM. (Though it's also
> possible to get a false positive, in rare cases where we get unlucky
> and there's a collision. This might waste cycles within VACUUM, but
> it shouldn't lead to truly incorrect behavior.)

I'm thinking more about it. We always reset btpo_cycleid even in redo of a split.
This "btpo_cycleid = 0;" reset can break two scenarios that are not currently supported by us, but might be supported in future.
This reset is based on the idea that crash recovery will interrupt vacuum. It is not true in these cases.

1. We are dealing with compute-storage separation system. We do not have filesystem and when we need to read a page we get it from some storage service, that rebuild pages from WAL. (e.g. Aurora and Neon) If we split a page during vacuum, evict it and read it from service - we will miss needed backtrack to the left page...
2. There's a tool for repairing pages with checksum violations - page repair. AFAIK it can request page from Standby, and if it does amidst vacuum, vacuum can get false negative for backtracking logic.

Thanks!

Best regards, Andrey Borodin.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2025-11-18 06:49:38 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Previous Message Michael Paquier 2025-11-18 06:29:57 Re: Type of pg_buffercache_pages()::forknum