| From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
|---|---|
| To: | Kirk Wolak <wolakk(at)gmail(dot)com> |
| Cc: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Salma El-Sayed <salmasayed182003(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: [GSoC 2026] - B-tree Index Bloat Reduction - Approach & Questions |
| Date: | 2026-06-23 16:24:52 |
| Message-ID: | CAH2-Wz=w9Pd4BPMJ6zFtzMFf85Dmy8GqJ2cFEz_rUX_R2BNr6g@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Jun 23, 2026 at 11:57 AM Kirk Wolak <wolakk(at)gmail(dot)com> wrote:
> Back to the question of VACUUM. And where your insight might be trying to help us from making a terrible mistake.
> My understanding is that there can be MANY "extra" references in any index to records that have been LONG deleted.
> And when the Table is read, those are removed from the result set (I believe it's why index-only scans are limited to
> mostly clean tables, freshly vacuumed, etc).
It's not okay for VACUUM to ever fail to remove any TIDs from its
deadItems[] from the target index. If it fails to do that, the index
is already irretrievably broken; you cannot account for it at the scan
level.
Matthias explained why this is, but it boils down to this: that is a
core invariant, that all index AMs (with the possible exception of
BRIN) must follow at all times. It has nothing to do with what
concurrent scans might be doing when VACUUM runs; it's about
maintaining basic agreement between the index structure and the heap.
You can never allow the heap to recycle the same TID for a wholly
unrelated row/HOT chain, because doing so will result in the same TIDs
being reused for different logical rows some time after VACUUM runs.
> Now, it's at this point that I wonder if we can actually just DELETE the BTP_MERGED_AWAY page, by VACUUM.
> But what we prefer, is that it gets marked half dead, and the TXNID_Block_No is used to flip the BTP_MERGED page(s)
> that follow (because there could have been page splits). And all of those pages that are past that horizon, WILL BE
> flipped back to NORMAL pages (from BTP_MERGED).
>
> For now, for testing and validation... We want to get it working first, and have it take a few steps so we
> can watch the table "self-recover" from the merge with the Autovacuum.
>
> This is why you will hear Salma defend vacuum "not looking" at the old "L" records, certainly not changing them.
> They are REQUIRED to be the snapshot for what was merged... Only for the 2 edge cases.
Any design that lacks a completely airtight argument for why VACUUM
will reliably remove all of the TIDs from its deadItems[] in the
target index has zero chance of being accepted. Again, whether and how
that affects concurrent scans is irrelevant. The downstream problems
only start after VACUUM runs, when the heap starts to recycle TID
addresses that were supposed to be safe to recycle (safe because there
couldn't possibly be any references to the same TID left behind in
indexes).
--
Peter Geoghegan
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Matthias van de Meent | 2026-06-23 17:52:57 | Re: [GSoC 2026] - B-tree Index Bloat Reduction - Approach & Questions |
| Previous Message | Xuneng Zhou | 2026-06-23 16:20:48 | Re: 048_vacuum_horizon_floor.pl hangs due to wakeup lost inside LockBufferForCleanup |