Re: btree: downlink right separator/HIKEY optimization

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, David Christensen <david(at)pgguru(dot)net>
Subject: Re: btree: downlink right separator/HIKEY optimization
Date: 2024-02-22 13:34:00
Message-ID: CAEze2Wgxed_arnf-7cpO=4RV2V5mUCQ7Y+JoBcY013xRSvBU=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 5 Dec 2023 at 08:43, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 01/11/2023 00:08, Matthias van de Meent wrote:
> > By caching the right separator index tuple in _bt_search, we can
> > compare the downlink's right separator and the HIKEY, and when they
> > are equal (memcmp() == 0) we don't have to call _bt_compare - the
> > HIKEY is known to be larger than the scan key, because our key is
> > smaller than the right separator, and thus transitively also smaller
> > than the HIKEY because it contains the same data. As _bt_compare can
> > call expensive user-provided functions, this can be a large
> > performance boon, especially when there are only a small number of
> > column getting compared on each page (e.g. index tuples of many 100s
> > of bytes, or dynamic prefix truncation is enabled).
>
> What would be the worst case scenario for this? One situation where the
> memcmp() would not match is when there is a concurrent page split. I
> think it's OK to pessimize that case. Are there any other situations?

There is also concurrent page deletion which can cause downlinked
pages to get removed from the set of accessible pages, but that's
quite rare, too: arguably even more rare than page splits.

> When the memcmp() matches, I think this is almost certainly not slower
> than calling the datatype's comparison function.
>
> > if (offnum < PageGetMaxOffsetNumber(page))
> > [...]
> > else if (!P_RIGHTMOST(opaque))
> > [...]
> > }
>
> This could use a one-line comment above this, something like "Remember
> the right separator of the downlink we follow, to speed up the next
> _bt_moveright call".

Done.

> Should there be an "else rightsep = NULL;" here? Is it possible that we
> follow the non-rightmost downlink on a higher level and rightmost
> downlink on next level? Concurrent page deletion?

While possible, the worst this could do is be less efficient in those
fringe cases: The cached right separator is a key that is known to
compare larger than the search key and thus always correct to use as
an optimization for "is this HIKEY larger than my search key", as long
as we don't clobber the data in that cache (which we don't).
Null-ing the argument, while not incorrect, could be argued to be
worse than useless here, as the only case where NULL may match an
actual highkey is on the rightmost page, which we already special-case
in _bt_moveright before hitting the 'compare the highkey' code.
Removal of the value would thus remove any chance of using the
optimization after hitting the rightmost page in a layer below.

I've added a comment to explain this in an empty else block in the
attached version 2 of the patch.

> Please update the comment above _bt_moveright to describe the new
> argument. Perhaps the text from README should go there, this feels like
> a detail specific to _bt_search and _bt_moveright.

Done.

Thank you for the review.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Attachment Content-Type Size
v2-0001-btree-optimize-_bt_moveright-using-downlink-s-rig.patch application/octet-stream 9.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2024-02-22 13:34:42 Re: Documentation to upgrade logical replication cluster
Previous Message Давыдов Виталий 2024-02-22 13:29:43 Slow catchup of 2PC (twophase) transactions on replica in LR