| From: | Salma El-Sayed <salmasayed182003(at)gmail(dot)com> |
|---|---|
| To: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
| Cc: | pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: [GSoC 2026] - B-tree Index Bloat Reduction - Approach & Questions |
| Date: | 2026-06-23 16:03:53 |
| Message-ID: | CANBEAPFzA9=fnyfBjZQCmp7Y4T4wGRxrS_s_3iox=of+1etjzA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Matthias,
Thanks for the response and detailed explanation.
> Could you expand on this "treated as" a bit more? Do you mean that
> once the horizon has passed, the next time maintenance comes around
> this page will be deleted like a normal empty page would during
> vacuum? Or is it immediately considered dead?
Regarding your question about how the page is "treated as" a normal deletion:
Because a BTP_MERGED_AWAY page is already unlinked from its parent, it
has essentially already completed the first stage of deletion. Once
the MergeXID horizon has safely passed, the page transitions to
HALF_DEAD and is handled exactly as described in the nbtree README for
the second stage of deletion:
"In the second-stage, the half-dead leaf page is unlinked from its
siblings. We first lock the left sibling (if any) of the target, the
target page itself, and its right sibling (there must be one) in that
order. Then we update the side-links in the siblings, and mark the
target page deleted."
To safely track this state transition, I need to store the MergeXID
and the blkno of the BTP_MERGED_AWAY page. As you pointed out
previously, adding these to the B-tree page header reduces available
space and risks backward incompatibility with max tuple sizes.
Given the constraints you mentioned, is modifying the header
completely off the table, or could we safely introduce this through a
new index version?
Also, I wanted to share my current implementation for forward scans
that were positioned between L and R before the merge. Since the
forward scan already read L, here is how I handle it:
When the scan encounters the BTP_MERGED page (R), it calls
_bt_readpage. After unlocking R, but before returning, it steps back
to read L (BTP_MERGED_AWAY). It saves L's tuples in a list inside
BTScanOpaqueData, compares them against the data just read from R
(so->currPos.items), and removes any duplicates.
Best regards,
Salma El-Sayed
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Rui Zhao | 2026-06-23 16:07:23 | Re: [PATCH] Preserve replication origin OIDs in pg_upgrade |
| Previous Message | Baji Shaik | 2026-06-23 16:02:57 | Re: [PATCH] Warn when io_min_workers exceeds io_max_workers |