Re: Faster inserts with mostly-monotonically increasing values

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(dot)riggs(at)2ndquadrant(dot)com>
Subject: Re: Faster inserts with mostly-monotonically increasing values
Date: 2018-03-06 01:59:52
Message-ID: CAH2-Wz=5eZr0+h8wZozpHEJkoYS-T05qW_Z00JWRh3md3ofGCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 5, 2018 at 5:48 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> From what I read, both phase 1 & 2 are served by the !P_IGNORE check.

True.

> For the third phase:
>
>> A deleted page can only be reclaimed once there is no scan or search that
>> has a reference to it; until then, it must stay in place with its
>> right-link undisturbed.
>
> That's because:
>
>> A deleted page cannot be reclaimed immediately, since there may be other
>> processes waiting to reference it (ie, search processes that just left the
>> parent, or scans moving right or left from one of the siblings). These
>> processes must observe that the page is marked dead and recover
>> accordingly. Searches and forward scans simply follow the right-link
>> until they find a non-dead page --- this will be where the deleted page's
>> key-space moved to.
>
> But in thise case there's no right link to follow, so it's a non-issue.
>
> BTree doesn't truncate AFAIK, so the cached block number can't be
> nonexisting (beyond the end of the relation). It's probably fragile to
> assume BTree will never truncate, so maybe it would be wise to check
> for that. But if the block exists, I believe it contains a page that
> can be safely interpreted by that condition. As long as there can be
> no other pages with the same flags on the index while the cached page
> is write-locked, that code will be safe.

Introducing any case that allows us to land on a recycled page, and
reason that it must at least not be the page we were looking for based
on *any* criteria about the page itself seems very brittle. Yeah, it
probably won't break in practice, but it's a bad design.

> Yes, if the scankey is equal to the leftmost key, the first occurrence
> of that key could be on the left, so the check would have to be for
> strictly greater. But that's about as complex as it would get.

That's pretty complicated.

>> I didn't say that the patch is necessarily wrong here. Just that it
>> makes me nervous.
>
> Any change to btree would make anyone nervous ;-)

True. Anyway, this isn't the thing that I'm most concerned about right now.

>> I didn't get what the point of checking the first item on the page
>> (your proposal) was.
>
> Right now, if you've got concurrent transactions, there's absolutely
> no chance this optimization would kick in. For it to happen,
> insertions on the index need to happen in order, each insertion a key
> greater than any other key (visible or not) on the index.
>
> Concurrent inserts on PKs are unlikely to arrive in order, a slight
> disorder is to be expected as serial sequence numbers get generated a
> significant amount of time before the insert actually takes place.
> Different backends can get to the index insertion at different times,
> causing unordered inserts.
>
> I believe PKs are a prime candidate for this optimization, and
> expecting it to apply only when no concurrency is involved is severely
> dumbing down the optimization.

Pavan justified the patch using a benchmark that only involved a
single client -- hardly typical for a patch that changes the B-Tree
code. If the benefits with many clients can be shown to matter, that
will make this much more interesting to me.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-03-06 02:08:53 Re: pgsql: Clone extended stats in CREATE TABLE (LIKE INCLUDING ALL)
Previous Message Andres Freund 2018-03-06 01:58:54 Re: BUG #14941: Vacuum crashes