Re: [GSoC 2026] - B-tree Index Bloat Reduction - Approach & Questions

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kirk Wolak <wolakk(at)gmail(dot)com>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Salma El-Sayed <salmasayed182003(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GSoC 2026] - B-tree Index Bloat Reduction - Approach & Questions
Date: 2026-06-26 14:53:48
Message-ID: CA+TgmoaXh3Q=_k+UtpuHeE1ndch846DSUZ5ALASziBDCicvePg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 23, 2026 at 4:46 PM Kirk Wolak <wolakk(at)gmail(dot)com> wrote:
>> (Today's lesson for her, is how to defend your position, while being prepared for the public beat down (lol) where you might be wrong. Be wrong out loud! You learn faster!)
>
> Everyone. My apologies. Some people may not have detected the sarcasm/humor in that last line.
> Obviously this is NOT how this group works, and the moderators would not tolerate it.

Yeah, I don't think anybody's getting a beat down here. This is just
people discussing whether the algorithm as proposed is correct or
incorrect. Nobody's mad at Salma or at you, but a number of people (me
included, but on this topic my opinion isn't the one you should be
worrying about) think that the proposed algorithm is probably
incorrect -- and the vigor of their responses suggests, more than
anything, that they don't want anyone to waste a bunch of time on
something that will ultimately be a dead end. The fact that they care
enough to respond is a good sign, not a bad one.

From a certain point of view, the suspicion that the algorithm may be
incorrect is not at all surprising. The fact that we never merge
nbtree pages to improve space utilization is a limitation we've had
since I first got involved with PostgreSQL which was, at the risk of
dating myself, a long time ago. The fact that nobody's fixed yet means
it's probably really hard. The extremely intense discussion of page
deletion in the nbtree README also suggests the complexity of the
task: any concern that applies to page deletion probably also implies
in some way to page merging, and there are probably additional
complications that apply only to page merging, because page deletion
only involves empty pages, so there's no danger of seeing values
twice.

I suspect everyone here would love to see a solution to this problem,
but I would personally be beyond impressed if someone managed to come
up with even a working prototype for this in one summer. Even a
respectable amount of progress on figuring out an algorithm would be a
good result, IMHO.

I don't know what the rules for GSoC are, but if it's still possible
to switch to a different project, that might be worth considering. On
the other hand, there's certainly interesting work to be done on this.
I just don't see how anyone, no matter how smart or talented, could
show up and expect to get the algorithm completely right on the first
try. I'd be absolutely shocked if that happened. I've been shocked
before, of course: some time people do amazing things. But often
getting something like this right takes a lot of trial and error, or
just doesn't work out. That doesn't mean it isn't worth trying, but
having realistic expectations is good.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message wenhui qiu 2026-06-26 14:54:44 Re: Report oldest xmin source when autovacuum cannot remove tuples
Previous Message Jonathan S. Katz 2026-06-26 14:35:04 PostgreSQL 19 Beta 2 release date