| From: | Salma El-Sayed <salmasayed182003(at)gmail(dot)com> |
|---|---|
| To: | pgsql-hackers(at)postgresql(dot)org |
| Subject: | [GSoC 2026] - B-tree Index Bloat Reduction - Approach & Questions |
| Date: | 2026-05-08 14:40:27 |
| Message-ID: | CANBEAPFq3YAOydjUS3xwcUG9L6e3WE5Z4nGPk_Q3RsjSFWTJNA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello Hackers,
My name is Salma El-Sayed, and I am working on B-tree Index Bloat Reduction
as part of GSoC 2026. This is my introduction email[1]. I would like to
share my current approach for page merging and get your feedback.
*The Core Idea*
Two sibling B-tree pages L (left) and R (right) are candidates for merging
when both pages are below a size threshold (less than 10% full for example)
and their combined size fits within the fill factor.
The merge copies all data from L into R, then:
1- R is marked BTP_MERGED --> it now contains L's data.
2- L is marked BTP_MERGED_AWAY --> it is logically gone but kept around
temporarily for any active scan that still holds a reference to it.
3- L is unlinked from its parent. VACUUM will handle updating the
right-link of L's left sibling as part of normal cleanup.
Once all transactions that were active at the time of the merge have
committed, L can transition to HALF_DEAD and be reclaimed by VACUUM.
*Handling Concurrent Scans*
The tricky part is ensuring correctness for scans that were already in
progress when the merge happened.
-Forward Scans-
A forward scanner that arrives at L after the merge sees BTP_MERGED_AWAY
and follows through to R.
If the scanner had already read L before the merge, it would have saved L's
high key. It can then binary-search R to skip the tuples it already saw and
continue from where it left off.
-Backward Scans-
A backward scanner that arrives at R after the merge sees BTP_MERGED, reads
R (which now contains L's data), and skips L entirely.
If the scanner had already read R before the merge, when it later arrives
at L it sees BTP_MERGED_AWAY. Here we need a small piece of state on the
scanner:
- If the scanner has not previously seen a BTP_MERGED flag, it knows it
visited R before the merge and has not yet seen L's data, so it must read L.
- If the scanner has previously seen BTP_MERGED, it knows it read R after
the merge, meaning R already contained L's data, so it should skip L and
continue left.
*Questions for the Community*
1- Triggering the merge: how aggressive should we be?
How should the merge process be triggered? I see two broad options:
a- Full-index scan: aggressively find all merge candidates in one pass.
b- Bounded scan: a separate dedicated scan that stops after a controlled
number of pages, giving the user control over how much of the index to
process at a time.
A possible interface for the candidate scanner might look something like:
bt_merge_pages(index regclass, min_fillfactor float8, max_fillfactor
float8, max_pages int4)
** This would allow the process to run over a controlled subset of the
index rather than always scanning the whole thing, which seems important
for large tables where a full scan could run for a very long time. What is
the community's preferred approach here?
2- Flag naming and free bits
I am proposing two new flags in btpo_flags:
- BTP_MERGED: set on R to indicate it absorbed L's data.
- BTP_MERGED_AWAY: set on L to indicate it has been merged away.
** Are there free bits available in btpo_flags suitable for this purpose?
And does the community see any objection to consuming two bits in this
field for the merge mechanism?
Thank you for your time. I look forward to your feedback!
Best regards,
Salma El-Sayed
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Matthias van de Meent | 2026-05-08 15:03:22 | Re: Disallow whole-row index references with virtual generated columns? |
| Previous Message | Ayush Tiwari | 2026-05-08 14:40:10 | Re: Disallow whole-row index references with virtual generated columns? |