Re: Improving btree performance through specializing by key shape, take 2

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, David Christensen <david(at)pgguru(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Improving btree performance through specializing by key shape, take 2
Date: 2023-09-19 01:29:03
Message-ID: CAH2-WzmZLPUpejYBewd4Oz1JAPoJvrCTPmoumtqqEphUs86fcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 18, 2023 at 8:57 AM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
> > Rebased again to v13 to account for API changes in 9f060253 "Remove
> > some more "snapshot too old" vestiges."
>
> ... and now attached.

I see that this revised version is approximately as invasive as
earlier versions were - it has specializations for almost everything.
Do you really need to specialize basically all of nbtdedup.c, for
example?

In order for this patch series to have any hope of getting committed,
there needs to be significant work on limiting the amount of code
churn, and resulting object code size. There are various problems that
come from whole-sale specializing all of this code. There are
distributed costs -- costs that won't necessarily be in evidence from
microbenchmarks.

It might be worth familiarizing yourself with bloaty, a tool for
profiling the size of binaries:

https://github.com/google/bloaty

Is it actually sensible to tie dynamic prefix compression to
everything else here? Apparently there is a regression for certain
cases caused by that patch (the first one), which necessitates making
up the difference in later patches. But...isn't it also conceivable
that some completely different optimization could do that for us
instead? Why is there a regression from v13-0001-*? Can we just fix
the regression directly? And if not, why not?

I also have significant doubts about your scheme for avoiding
invalidating the bounds of the page based on its high key matching the
parent's separator. The subtle dynamic prefix compression race
condition that I was worried about was one caused by page deletion.
But page deletion doesn't change the high key at all (it does that for
the deleted page, but that's hardly relevant). So how could checking
the high key possibly help?

Page deletion will make the pivot tuple in the parent page whose
downlink originally pointed to the concurrently deleted page change,
so that it points to the deleted page's original right sibling page
(the sibling being the page that you need to worry about). This means
that the lower bound for the not-deleted right sibling page has
changed from under us. And that we lack any simple way of detecting
that it might have happened.

The race that I'm worried about is extremely narrow, because it
involves a page deletion and a concurrent insert into the key space
that was originally covered by the deleted page. It's extremely
unlikely to happen in the real world, but it's still a bug.

It's possible that it'd make sense to do a memcmp() of the high key
using a copy of a separator from the parent page. That at least seems
like it could be made safe. But I don't see what it has to do with
dynamic prefix compression. In any case there is a simpler way to
avoid the high key check for internal pages: do the _bt_binsrch first,
and only consider _bt_moveright when the answer that _bt_binsrch gave
suggests that we might actually need to call _bt_moveright.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-09-19 01:29:09 Re: Inefficiency in parallel pg_restore with many tables
Previous Message Nathan Bossart 2023-09-19 01:28:58 Re: Inefficiency in parallel pg_restore with many tables