Re: Faster inserts with mostly-monotonically increasing values

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(dot)riggs(at)2ndquadrant(dot)com>
Subject: Re: Faster inserts with mostly-monotonically increasing values
Date: 2018-03-06 00:37:46
Message-ID: CAGTBQpYFeJRYMO7VyB30yD=CTA3yBKWKCFBQeiC4BM_dYXuvog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 5, 2018 at 9:12 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Thu, Mar 1, 2018 at 3:18 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>> The key there is strictly greater than the rightmost key. As such, it
>> must precede the first page with an equal key, and since it's the
>> rightmost page... there's no such key. But if there was, the check for
>> uniqueness only needs the insert point to point to the first such
>> page, and this guarantees it.
>
> This code assumes that it can use block number as a stable way of
> finding the same point in the B-Tree over time, without any interlock.
> That may actually work out in practice, since even if the page is
> recycled there can only ever be one rightmost leaf page, but it seems
> like a brittle approach to me. The page image may be recycled by the
> FSM already, and we really shouldn't be depending on the page not
> looking a particular way once that has happened. I guess that you can
> say the same thing about using page LSN to determine cached block
> staleness instead, but that still seems a lot safer.
>
> Basically, I'm worried that we could do something entirely
> unpredictable when a page has actually been recycled, since we're
> entirely opting out of the RecentGlobalXmin interlock business on the
> assumption that we can be sure that recycling hasn't occurred in some
> other way. Imagine what could happen if we ever teach the FSM to share
> freespace among small relations, which seems quite possible to me.

Correct me if I'm wrong, but there's lots of code in nbtree already
that risks reading recycled pages for various reasons. Presumably,
checking P_ISDELETED and P_ISHALFDEAD should be enough, which is what
!P_IGNORE implies.

>> You could allow for some slack, by comparing against the leftmost key
>> instead. You just need to know whether the key fits in the page. Then,
>> the insert can proceed as usual, doing the binsearch to find the
>> proper insertion point, etc.
>
> The whole way that this patch opts out of using an exclusive buffer
> lock on the "first page the value could be on" (the _bt_check_unique()
> + _bt_findinsertloc() stuff) makes me uneasy. I know that that isn't
> terribly concrete.

Assuming the rightmost page is the first page the value could be on,
it already does get an exclusive buffer lock.

And it seems that assumption is correct in the patch as-is. In fact,
the current patch checks for a much stronger situation than needed to
apply this optimization, so it can even skip checking for conflicting
concurrent transactions. With an exclusive lock on the buffer, and
being the rightmost page, it means there can be no conflicting key
since the check is that the scan key is strictly greater than the
rightmost key (lets forget for a moment empty rightmost pages). Unless
there can be a new rightmost page in construction somehow (which I
don't believe it can happen, new rightmost pages would be created by
splitting the rightmost page, and that would conflict with the
exclusive lock), I don't see how this can fail.

If one wanted to relax the condition as I proposed, that uniqueness
check is necessary, so that's why I propose shortcircuiting the search
only, and not the actual insertion on the page. I believe IMO it's
more important to try and maximize the conditions under which the
optimization can be applied.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-03-06 00:47:47 Re: Cache lookup errors with functions manipulation object addresses
Previous Message Andres Freund 2018-03-06 00:34:09 Re: [COMMITTERS] pgsql: Fix inadequate locking during get_rel_oids().