Re: Randomize B-Tree page split location to avoid oscillating patterns

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Randomize B-Tree page split location to avoid oscillating patterns
Date: 2026-05-06 17:49:10
Message-ID: CAH2-WznzR_q9Q+5kZ8xbjaqt0c0LmXnv_wqQfzOX-g5cVgZYbg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 6, 2026 at 1:10 PM Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
> > On Mon, Apr 27, 2026 at 02:07:14PM -0400, Peter Geoghegan wrote:
> > Your patch should initially look for several split points that are
> > approximately equal in quality according to the current criteria -- a
> > separate, initial pass. You'd then randomly pick a final split point
> > from those gathered during this initial pass -- not from the original
> > fillfactormult-sorted list of split points. What you have in v1 will
> > make suffix truncation much less effective, which seems unacceptable.
>
> The paper suggest combining randomization and suffix truncation via
> randomly shifting the interval for truncation. I'm not sure a separate
> pass is needed for that, looks like it should be enough to add random
> shift to lowsplit / highsplit in _bt_bestsplitloc, when the penalty is
> calculated.

I can't see why that would make much difference. Increasing the split
interval usually won't affect the final chosen split point at all.
Decreasing is more likely to change the split point, but that's
*precisely* because it makes suffix truncation less effective.

In general it's quite unlikely that an expanded interval (e.g.,
doubling LEAF_SPLIT_DISTANCE) will include a split point that
truncates additional index attributes compared to the best split point
available within the current calculated split interval/current
LEAF_SPLIT_DISTANCE. For example, with the TPC-C indexes, I'd expect
only a tiny minority of all page splits to be affected by doubling
LEAF_SPLIT_DISTANCE to make nbtsplitloc.c consider ~20% of all
possible split points around the middle of the page. Whereas *halving*
LEAF_SPLIT_DISTANCE will indeed make suffix truncation worse.

This asymmetry matters. I believe that it'll make it impossible for
space utilization to average out to the target leaffillfactor% over
time. After all, this approach doesn't actually target space
utilization. This is also why it will make literally no difference at
all with a single column unique index.

> The interesting part here is that it seems the current split interval
> might be too small for such randomization to make a significant impact
> -- few experiments show that with the current value the result looks
> more like changing the fillfactor, waves are just shifting to the right.
> Increasing the split interval helps in this case.

The point of gathering a list of "equally good" split points in an
initial pass is that it removes the danger of making the split
interval less effective in terms of final split penalty. The random
choice becomes a choice among split points that all truncate away the
same number of suffix attributes (all of which must still be
sufficiently close to the space-optimal split location to ensure that
no split is ever wildly unbalanced). This prevents the variable/random
choice from affecting the suffix truncation/split penalty.

With a single column unique index, all split points will have an equal
penalty under this scheme (implying that suffix truncation will be
equally effective). The initial list gathered in the first pass is
exactly all of the split points within the split interval in that
case. Whereas with other types of indexes the initial list gathered is
some subset of all the split points within the interval, without
regard for how balanced the post-split space utilization will be (all
splits within the interval are assumed to be "good enough" from a
space utilization perspective). Over time, space utilization should
reach fillfactor% -- without it negatively affecting suffix truncation.

It's possible that increasing the split interval would also make
sense. That seems like a follow-up question to me.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2026-05-06 18:56:35 Re: Why is_admin_of_role() use ROLERECURSE_MEMBERS rather than ROLERECURSE_PRIVS?
Previous Message Paul A Jungwirth 2026-05-06 17:13:39 Re: FOR PORTION OF does not recompute GENERATED STORED columns that depend on the range column