| From: | Kirk Wolak <wolakk(at)gmail(dot)com> |
|---|---|
| To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
| Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Salma El-Sayed <salmasayed182003(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: [GSoC 2026] - B-tree Index Bloat Reduction - Approach & Questions |
| Date: | 2026-07-01 15:42:14 |
| Message-ID: | CACLU5mR6MrAedo3JB-t5Qae_RYGh3DMrXV1pQap=suYaecRQdA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Fri, Jun 26, 2026 at 11:50 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Fri, Jun 26, 2026 at 10:54 AM Robert Haas <robertmhaas(at)gmail(dot)com>
> wrote:
> ...
>
> Page deletion is easily the most complicated part of the nbtree
> design. That is a natural consequence of the absurdly permissive rules
> for searchers under the Lehman & Yao design. Searchers never need to
> lock more than one page at a time, on any level. But they can never
> have an irredeemably bad picture of the tree structure, even with
> concurrent page deletions.
>
> In short, the eye-watering complexity of page deletion is the exact
> cost you pay to completely avoid lock coupling with the Lehman & Yao
> design. If you completely ignore page deletion (as in the original L&Y
> paper), the concurrency rules are simple and intuitive. Most
> individual nbtree code paths effectively do just that: they pretend
> page deletion does not exist. Searchers can just move right (as with a
> concurrent page split) when they land on a deleted page. Everything
> works out because the real complexity is confined to the page
> deletion/page recycling code.
>
Yes, and our solution may seem Naive. Our approach has the drawback that
it SLOWS DOWN page deletion,
and keeps it where it currently is. We are asking vacuum to NOT delete
LEAF pages that we have "MERGED".
That MERGED pages are in a repair state until nobody LEFT can see that
transition. Then vacuum flips them
back to normal pages. (BTP_MERGED_AWAY would become BTP_HALF_DEAD). And
the NEXT vacuum would
delete that page as normal. We chose this approach for PURE Simplicity of
proving correctness. By making sure
our MERGED pages do not get shredded until the "merge event" is over... We
keep things simple. While not as
bad of an idea as "eventual consistency" (IMO), it draws a line in the
sand. We also will NOT attempt to MERGE
pages that are currently in MERGED status... As that would break things.
This will cause vacuum to take more passes to clean everything up. But we
believe it is proveable this way.
Our goal, of course, is that this is all "naturally" built into autovacuum
at some point. Where it deletes 99%
of the entries on a page, and it can apply a proven process to "merge" the
1% into the next page and flag everything.
It's at this point, that doing this in SMALL increments makes the most
sense. And only merging 2 pages prevents
taking new lock chains that could cause issues. Furthermore those 2 pages
must have the same parent. Another
limitation, but we don't believe it is a major hurdle more than it is a
great safety feature.
We are truly trying to limit "extra" work to the edge cases, and normal
scans should not concern themselves.
> ...
> I suspect everyone here would love to see a solution to this problem,
> > but I would personally be beyond impressed if someone managed to come
> > up with even a working prototype for this in one summer. Even a
> > respectable amount of progress on figuring out an algorithm would be a
> > good result, IMHO.
> >
> > I don't know what the rules for GSoC are, but if it's still possible
> > to switch to a different project, that might be worth considering.
>
> +1
>
LOL. It's hard to say if you did the +1 for the GSoC comment or the
"combination" which I assume.
It was addressed previously. We march forward! (And we continue to learn
about the 3rd rails in this process!)
Again, we appreciate the feedback. Keep it coming. I know seeing the code
will help.
Also, as we do this, it needs to be well documented... We will have to
settle on some nomenclature...
(BTP_MERGED_AWAY -> BTP_MERGED ... : Can we call this a MERGE SET? While
we only create 1 BTP_MERGED page.
we know page splits can and will happen, so there can be many. No other
NON-MERGE pages are allowed between)
Kirk
PS: I will let Salma speak to the code details.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Nathan Bossart | 2026-07-01 15:49:49 | Re: doc: fix pg_stat_autovacuum_scores threshold wording |
| Previous Message | Tatsuya Kawata | 2026-07-01 15:36:35 | Re: [PATCH] doc: clarify pg_stat_lock.fastpath_exceeded scope |