Re: nbtree VACUUM's REDO routine doesn't clear page's VACUUM cycle ID

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: nbtree VACUUM's REDO routine doesn't clear page's VACUUM cycle ID
Date: 2025-11-18 18:54:11
Message-ID: CAH2-WznCv0XkTYnAhemSaN=P=PVCk1oWoEBOm-9WQcmB4KoSDw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 18, 2025 at 1:32 AM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> I'm thinking more about it. We always reset btpo_cycleid even in redo of a split.
> This "btpo_cycleid = 0;" reset can break two scenarios that are not currently supported by us, but might be supported in future.

I don't follow.

> This reset is based on the idea that crash recovery will interrupt vacuum. It is not true in these cases.

It's also based on the idea that only one VACUUM operation can be
running at a time.

> 1. We are dealing with compute-storage separation system. We do not have filesystem and when we need to read a page we get it from some storage service, that rebuild pages from WAL. (e.g. Aurora and Neon) If we split a page during vacuum, evict it and read it from service - we will miss needed backtrack to the left page...

Are you arguing that the xl_btree_split record should include the cycleid?

I see that systems that are built on this architecture do something
along these lines:
https://github.com/neondatabase/postgres/commit/a9b92820c5d14dbff8f59ab65ffdaae92ab9c3c8

However, that seems well out of scope for core Postgres. At least for
the foreseeable future.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2025-11-18 19:06:40 Re: Changing the state of data checksums in a running cluster
Previous Message Sami Imseih 2025-11-18 18:49:45 Re: pg_utility ?