Re: [GSoC 2026] - B-tree Index Bloat Reduction - Approach & Questions

From: Salma El-Sayed <salmasayed182003(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GSoC 2026] - B-tree Index Bloat Reduction - Approach & Questions
Date: 2026-06-11 17:25:10
Message-ID: CANBEAPGZgcQX41zUYfPr0e1+G6SpEFHq2Ym2HVzxy5VA=nBhLw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Matthias,

Thanks for your email and these detailed answers and questions.
Apologies for the delayed response, I am right in the middle of my
university final exams right now.

I am actively using your feedback to shape the design plan, but I
wanted to go ahead and address a couple of your specific questions
regarding BTP_MERGED_AWAY pages:

> b. Deleting a BTP_MERGED_AWAY page
> BTP_MERGED_AWAY pages keep a copy of their old tuples around,
> which you mention are used by backward scans. This means its contents
> must also be cleaned up by subsequent VACUUM runs, as the backwards
> IOS may otherwise return TIDs that have been recycled and recieved new
> indexed values. This cleanup can result in an empty page - which can
> happen earlier than the XID horizon in the MergeID. Does this design
> allow those pages to be reclaimed?

VACUUM will be taught to ignore the contents of BTP_MERGED_AWAY pages.
The entries inside are not live data, they are ghost copies cached for
exactly one case:
a backward scan that was positioned between R and L at the moment of the merge.
No other reader ever sees them. Forward scans see BTP_MERGED_AWAY and
skip via the right link.
New backward scans already read R (which now holds L's data) before
arriving at L, so they skip L too.
TID safety is guaranteed by MergeXID. The only scanner that can reach
L's "ghost copies" is one whose snapshot predates MergeXID. MVCC
guarantees that the heap rows those TIDs point to cannot be recycled
while any transaction predating MergeXID is still active, because
those rows are still visible to that transaction. Once MergeXID is no
longer visible to any active transaction, the page transitions to
HALF_DEAD normally. So VACUUM never needs to touch L's entries. The
page header does all the work.

> d. Are BTP_MERGED_AWAY pages still part of the data structure?
> So, is L still pointed to by both L's left sibling and R, or is it
> immediately removed from the structure (or at least as immediate as a
> HALF_DEAD page would)?
> If it's kept in the structure for extended periods: Why?

L is unlinked from the parent as soon as it becomes BTP_MERGED_AWAY
and its key space will be assigned to R.
However, it does remain part of the leaf-level data structure (still
pointed to by L's left sibling and R).
This is necessary because backward scans that were positioned between
L and R during the merge still need to traverse left into L to read
the original data.
As soon as it is safe for L to become HALF_DEAD (the MergeXID horizon
is passed), it will be treated as a normal page deletion.

Best regards,
Salma El-Sayed

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2026-06-11 18:35:22 Re: Make SPI_prepare argtypes argument const
Previous Message Álvaro Herrera 2026-06-11 16:44:45 Re: Redundant/mis-use of _(x) gettext macro?