[GSoC 2026] - B-tree Index Bloat Reduction - Approach & Questions

From: Salma El-Sayed <salmasayed182003(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: [GSoC 2026] - B-tree Index Bloat Reduction - Approach & Questions
Date: 2026-05-08 14:40:27
Message-ID: CANBEAPFq3YAOydjUS3xwcUG9L6e3WE5Z4nGPk_Q3RsjSFWTJNA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Hackers,

My name is Salma El-Sayed, and I am working on B-tree Index Bloat Reduction
as part of GSoC 2026. This is my introduction email[1]. I would like to
share my current approach for page merging and get your feedback.

*The Core Idea*
Two sibling B-tree pages L (left) and R (right) are candidates for merging
when both pages are below a size threshold (less than 10% full for example)
and their combined size fits within the fill factor.

The merge copies all data from L into R, then:
1- R is marked BTP_MERGED --> it now contains L's data.
2- L is marked BTP_MERGED_AWAY --> it is logically gone but kept around
temporarily for any active scan that still holds a reference to it.
3- L is unlinked from its parent. VACUUM will handle updating the
right-link of L's left sibling as part of normal cleanup.

Once all transactions that were active at the time of the merge have
committed, L can transition to HALF_DEAD and be reclaimed by VACUUM.

*Handling Concurrent Scans*
The tricky part is ensuring correctness for scans that were already in
progress when the merge happened.

-Forward Scans-
A forward scanner that arrives at L after the merge sees BTP_MERGED_AWAY
and follows through to R.

If the scanner had already read L before the merge, it would have saved L's
high key. It can then binary-search R to skip the tuples it already saw and
continue from where it left off.

-Backward Scans-
A backward scanner that arrives at R after the merge sees BTP_MERGED, reads
R (which now contains L's data), and skips L entirely.

If the scanner had already read R before the merge, when it later arrives
at L it sees BTP_MERGED_AWAY. Here we need a small piece of state on the
scanner:
- If the scanner has not previously seen a BTP_MERGED flag, it knows it
visited R before the merge and has not yet seen L's data, so it must read L.
- If the scanner has previously seen BTP_MERGED, it knows it read R after
the merge, meaning R already contained L's data, so it should skip L and
continue left.

*Questions for the Community*

1- Triggering the merge: how aggressive should we be?

How should the merge process be triggered? I see two broad options:
a- Full-index scan: aggressively find all merge candidates in one pass.
b- Bounded scan: a separate dedicated scan that stops after a controlled
number of pages, giving the user control over how much of the index to
process at a time.

A possible interface for the candidate scanner might look something like:
bt_merge_pages(index regclass, min_fillfactor float8, max_fillfactor
float8, max_pages int4)

** This would allow the process to run over a controlled subset of the
index rather than always scanning the whole thing, which seems important
for large tables where a full scan could run for a very long time. What is
the community's preferred approach here?

2- Flag naming and free bits

I am proposing two new flags in btpo_flags:
- BTP_MERGED: set on R to indicate it absorbed L's data.
- BTP_MERGED_AWAY: set on L to indicate it has been merged away.

** Are there free bits available in btpo_flags suitable for this purpose?
And does the community see any objection to consuming two bits in this
field for the merge mechanism?

Thank you for your time. I look forward to your feedback!

[1]
https://www.postgresql.org/message-id/CANBEAPFW64QUGmMYoPekV6jGrn5Yx8kTNfeEtcTG9CCBUUBoLQ%40mail.gmail.com

Best regards,
Salma El-Sayed

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2026-05-08 15:03:22 Re: Disallow whole-row index references with virtual generated columns?
Previous Message Ayush Tiwari 2026-05-08 14:40:10 Re: Disallow whole-row index references with virtual generated columns?