| From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
|---|---|
| To: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
| Cc: | Kirk Wolak <wolakk(at)gmail(dot)com>, Salma El-Sayed <salmasayed182003(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: [GSoC 2026] - B-tree Index Bloat Reduction - Approach & Questions |
| Date: | 2026-06-23 18:58:53 |
| Message-ID: | CAH2-Wz=rugC=iZTDCKDo2gv-4u7Mkih=akqq07NVz-nai6=vKg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Jun 23, 2026 at 2:30 PM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
> > I think that you meant that it can see different TIDs originating from
> > the same updated logical row.
>
> It could access and return the same TID twice, on multiple pages, if
> the TID is recycled between the page accesses. But, as mentioned, at
> most one of the rows indicated by the index' scan mechanism will be
> MVCC-visible for the IndexScan executor node.
That might be true with SnapshotAny, but bringing visibility concerns
into this discussion doesn't seem useful.
The relevant invariant is that the same physical TID cannot appear
twice within the same index. It is useful to think of it as an
invariant that the index AM is directly concerned with (and to ignore
visibility stuff, which happens at a higher level, and shouldn't be of
concern to the index AM at all).
In general, nbtree page deletion (and merging underfull pages)
modifies a physical data structure to improve space efficiency. It
isn't relevant why VACUUM deleted some index tuples (making that free
space available and indirectly triggering deletion/merging).
Using XIDs for BTPageGetDeleteXid is just a convenient (though very
conservative) way to implement "the drain technique". It would still
be correct to implement it differently, provided this alternative
approach also ensures that no backend follows a downlink and ends up
on a wholly unrelated page due to concurrent deletion.
We must always ensure that such a backend at least lands on a page
marked deleted/a tombstone page and then recovers by moving right --
no scan can ever have an irredeemably bad picture of the tree
structure. But we don't fundamentally need to care about XIDs to make
that work -- this is a physical modification that's orthogonal to
logical/transaction considerations.
> > A non-hot update can create 2 separate TIDs that point to different
> > versions of the same logical row. In that case, both TIDs must be
> > returned to the scan (assuming both have index tuple values that
> > satisfy the scan keys). This doesn't really matter to the index AM; it
> > doesn't know about updates at all.
>
> Except the indexUnchanged flag for aminsert() -which index AMs should
> only use as a hint- but more generally, yes that's right.
That's just a hint used to trigger bottom-up index deletion, it really
isn't relevant.
--
Peter Geoghegan
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2026-06-23 19:07:48 | Re: Fix variadic argument types for pg_get_xxx_ddl() functions |
| Previous Message | Masahiko Sawada | 2026-06-23 18:48:28 | Re: Add a hook for handling logical decoding messages on subscribers. |