From 4eb6062d5a5d5b0b79363bc986f14d595f9cc826 Mon Sep 17 00:00:00 2001 From: Dmitrii Dolgov <9erthalion6@gmail.com> Date: Mon, 27 Apr 2026 16:54:36 +0200 Subject: [PATCH v1 2/2] Randomize nbtree split location to avoid oscillating patterns MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The way nbtree page split works can lead to the same split location chosen over and over under certain workloads. To simplify it, as long as the data to be ingested follows the same distribution as already existing data, in particular it's true for an empty tree. According to [1] (and some one-off experiments) this could lead to the number of splits following an oscillating pattern, meaning some intrinsic variability in performance. The easiest workaround is to introduce a range around the best split location, and pick up the actual split location at random from this range. Introduce such randomization, based on the split status containing list of possible locations. The whitepaper mentioned above recommends range of 20%, so we stick with this range. A list of possible split locations is sorted by delta, meaning that it's not exactly equivalent to a "range around the best split location", but looks like it's close enough. [1]: Glombiewski N., Seeger B., Graefe G. (2019). Waves of Misery After Index Creation. BTW 2019. Gesellschaft für Informatik. doi:10.18420/btw2019-06 --- src/backend/access/nbtree/nbtsplitloc.c | 21 ++++++++++++++++++++- 1 file changed, 20 insertions(+), 1 deletion(-) diff --git a/src/backend/access/nbtree/nbtsplitloc.c b/src/backend/access/nbtree/nbtsplitloc.c index de9eca3c8b2..71becf0257e 100644 --- a/src/backend/access/nbtree/nbtsplitloc.c +++ b/src/backend/access/nbtree/nbtsplitloc.c @@ -17,6 +17,7 @@ #include "access/nbtree.h" #include "access/tableam.h" #include "common/int.h" +#include "common/pg_prng.h" typedef enum { @@ -792,6 +793,7 @@ _bt_bestsplitloc(FindSplitData *state, int perfectpenalty, int bestpenalty, lowsplit; int highsplit = Min(state->interval, state->nsplits); + int rand_offset = 0; SplitPoint *final; bestpenalty = INT_MAX; @@ -812,7 +814,24 @@ _bt_bestsplitloc(FindSplitData *state, int perfectpenalty, break; } - final = &state->splits[lowsplit]; + /* + * There are workloads, where we would find the same best split location + * over and over, even with the suffix truncation introducing some + * variability. According to [1] this leads to the number of splits + * following oscillating pattern, and the easiest workaround is to + * introduce some randomness in chosing split location. + * + * To achieve that add a random shift to the lowsplit, corresponding to the + * 20% of the all possible split locations. Since splits are sorted by + * delta (see _bt_deltasortsplits), it should be close enough to + * introducing a range around the split point. + * + * [1]: Glombiewski N., Seeger B., Graefe G. (2019). Waves of Misery After + * Index Creation. BTW 2019. Gesellschaft für Informatik. doi:10.18420/btw2019-06 + */ + rand_offset = pg_prng_uint64_range( + &pg_global_prng_state, 0, state->nsplits * 0.2); + final = &state->splits[lowsplit + rand_offset]; /* * There is a risk that the "many duplicates" strategy will repeatedly do -- 2.52.0