Re: Improving btree performance through specializing by key shape, take 2

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, David Christensen <david(at)pgguru(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Improving btree performance through specializing by key shape, take 2
Date: 2023-06-27 04:57:12
Message-ID: CAFiTN-t1jfttH7MEZ8JsqfE7mguq8gg07hS0o3EHA2zAFCWBsQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 27, 2023 at 9:42 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Fri, Jun 23, 2023 at 8:16 PM Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
> >
> > On Fri, 23 Jun 2023 at 11:26, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> >
> > > and I have one confusion in the
> > > below hunk in the _bt_moveright function, basically, if the parent
> > > page's right key is exactly matching the HIGH key of the child key
> > > then I suppose while doing the "_bt_compare" with the HIGH_KEY we can
> > > use the optimization right, i.e. column number from where we need to
> > > start the comparison should be used what is passed by the caller. But
> > > in the below hunk, you are always passing that as 'cmpcol' which is 1.
> > > I think this should be '*comparecol' because '*comparecol' will either
> > > hold the value passed by the parent if high key data exactly match
> > > with the parent's right tuple or it will hold 1 in case it doesn't
> > > match. Am I missing something?
> >
> > We can't carry _bt_compare prefix results across pages, because the
> > key range of a page may shift while we're not holding a lock on that
> > page. That's also why the code resets the prefix to 1 every time it
> > accesses a new page ^1: it cannot guarantee correct results otherwise.
> > See also [0] and [1] for why that is important.
>
> Yeah that makes sense
>
> > ^1: When following downlinks, the code above your quoted code tries to
> > reuse the _bt_compare result of the parent page in the common case of
> > a child page's high key that is bytewise equal to the right separator
> > tuple of the parent page's downlink to this page. However, if it
> > detects that this child high key has changed (i.e. not 100% bytewise
> > equal), we can't reuse that result, and we'll have to re-establish all
> > prefix info on that page from scratch.
> > In any case, this only establishes the prefix for the right half of
> > the page's keyspace, the prefix of the left half of the data still
> > needs to be established separetely.
> >
> > I hope this explains the reasons for why we can't reuse comparecol as
> > _bt_compare argument.
>
> Yeah got it, thanks for explaining this. Now I see you have explained
> this in comments as well above the memcmp() statement.

At high level 0001 looks fine to me, just some suggestions

1.
+Notes about dynamic prefix truncation
+-------------------------------------

I feel instead of calling it "dynamic prefix truncation" should we can
call it "dynamic prefix skipping", I mean we are not
really truncating anything right, we are just skipping those
attributes in comparison?

2.
I think we should add some comments in the _bt_binsrch() function
where we are having main logic around maintaining highcmpcol and
lowcmpcol.
I think the notes section explains that very clearly but adding some
comments here would be good and then reference to that section in the
README.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-06-27 05:23:44 Re: ReadRecentBuffer() doesn't scale well
Previous Message Peter Geoghegan 2023-06-27 04:53:12 Re: ReadRecentBuffer() doesn't scale well