Re: btree: downlink right separator/HIKEY optimization

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, David Christensen <david(at)pgguru(dot)net>
Subject: Re: btree: downlink right separator/HIKEY optimization
Date: 2023-12-05 07:43:20
Message-ID: ed1ad315-1a23-4f85-aa4b-c708e501df19@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/11/2023 00:08, Matthias van de Meent wrote:
> calling _bt_compare in _bt_moveright in many common cases. This patch,
> when stacked on top of the prefix truncation patch, improves INSERT
> performance by an additional 2-9%pt, with an extreme case of 45% in
> the worscase index tests at [0].
>
> The optimization is that we now recognze that our page split algorithm
> all but guarantees that the HIKEY matches this page's downlink's right
> separator key bytewise, excluding the data stored in the
> IndexTupleData struct.

Good observation.

> By caching the right separator index tuple in _bt_search, we can
> compare the downlink's right separator and the HIKEY, and when they
> are equal (memcmp() == 0) we don't have to call _bt_compare - the
> HIKEY is known to be larger than the scan key, because our key is
> smaller than the right separator, and thus transitively also smaller
> than the HIKEY because it contains the same data. As _bt_compare can
> call expensive user-provided functions, this can be a large
> performance boon, especially when there are only a small number of
> column getting compared on each page (e.g. index tuples of many 100s
> of bytes, or dynamic prefix truncation is enabled).

What would be the worst case scenario for this? One situation where the
memcmp() would not match is when there is a concurrent page split. I
think it's OK to pessimize that case. Are there any other situations?
When the memcmp() matches, I think this is almost certainly not slower
than calling the datatype's comparison function.

> if (offnum < PageGetMaxOffsetNumber(page))
> {
> ItemId rightsepitem = PageGetItemId(page, offnum + 1);
> IndexTuple pagerightsep = (IndexTuple) PageGetItem(page, rightsepitem);
> memcpy(rsepbuf.data, pagerightsep, ItemIdGetLength(rightsepitem));
> rightsep = &rsepbuf.tuple;
> }
> else if (!P_RIGHTMOST(opaque))
> {
> /*
> * The rightmost data tuple on inner page has P_HIKEY as its right
> * separator.
> */
> ItemId rightsepitem = PageGetItemId(page, P_HIKEY);
> IndexTuple pagerightsep = (IndexTuple) PageGetItem(page, rightsepitem);
> memcpy(rsepbuf.data, pagerightsep, ItemIdGetLength(rightsepitem));
> rightsep = &rsepbuf.tuple;
> }

This could use a one-line comment above this, something like "Remember
the right separator of the downlink we follow, to speed up the next
_bt_moveright call".

Should there be an "else rightsep = NULL;" here? Is it possible that we
follow the non-rightmost downlink on a higher level and rightmost
downlink on next level? Concurrent page deletion?

Please update the comment above _bt_moveright to describe the new
argument. Perhaps the text from README should go there, this feels like
a detail specific to _bt_search and _bt_moveright.

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-12-05 07:56:04 Re: Test 002_pg_upgrade fails with olddump on Windows
Previous Message Shlok Kyal 2023-12-05 07:14:23 Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION