Re: Improving btree performance through specializing by key shape, take 2

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, David Christensen <david(at)pgguru(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Improving btree performance through specializing by key shape, take 2
Date: 2023-06-23 14:46:02
Message-ID: CAEze2WgJM2yxJhFQ9LidmefGTO_BbgXESMdF1oXO9PFoEWHvxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 23 Jun 2023 at 11:26, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Fri, Jun 23, 2023 at 2:21 AM Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
> >
>
> > == Dynamic prefix truncation (0001)
> > The code now tracks how many prefix attributes of the scan key are
> > already considered equal based on earlier binsrch results, and ignores
> > those prefix colums in further binsrch operations (sorted list; if
> > both the high and low value of your range have the same prefix, the
> > middle value will have that prefix, too). This reduces the number of
> > calls into opclass-supplied (dynamic) compare functions, and thus
> > increase performance for multi-key-attribute indexes where shared
> > prefixes are common (e.g. index on (customer, order_id)).
>
> I think the idea looks good to me.
>
> I was looking into the 0001 patches,

Thanks for reviewing.

> and I have one confusion in the
> below hunk in the _bt_moveright function, basically, if the parent
> page's right key is exactly matching the HIGH key of the child key
> then I suppose while doing the "_bt_compare" with the HIGH_KEY we can
> use the optimization right, i.e. column number from where we need to
> start the comparison should be used what is passed by the caller. But
> in the below hunk, you are always passing that as 'cmpcol' which is 1.
> I think this should be '*comparecol' because '*comparecol' will either
> hold the value passed by the parent if high key data exactly match
> with the parent's right tuple or it will hold 1 in case it doesn't
> match. Am I missing something?

We can't carry _bt_compare prefix results across pages, because the
key range of a page may shift while we're not holding a lock on that
page. That's also why the code resets the prefix to 1 every time it
accesses a new page ^1: it cannot guarantee correct results otherwise.
See also [0] and [1] for why that is important.

^1: When following downlinks, the code above your quoted code tries to
reuse the _bt_compare result of the parent page in the common case of
a child page's high key that is bytewise equal to the right separator
tuple of the parent page's downlink to this page. However, if it
detects that this child high key has changed (i.e. not 100% bytewise
equal), we can't reuse that result, and we'll have to re-establish all
prefix info on that page from scratch.
In any case, this only establishes the prefix for the right half of
the page's keyspace, the prefix of the left half of the data still
needs to be established separetely.

I hope this explains the reasons for why we can't reuse comparecol as
_bt_compare argument.

Kind regards,

Matthias van de Meent
Neon, Inc.

[0] https://www.postgresql.org/message-id/CAH2-Wzn_NAyK4pR0HRWO0StwHmxjP5qyu+X8vppt030XpqrO6w@mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dagfinn Ilmari Mannsåker 2023-06-23 15:14:34 Re: Bytea PL/Perl transform
Previous Message James Coleman 2023-06-23 14:27:57 Stampede of the JIT compilers