Re: Faster inserts with mostly-monotonically increasing values

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(dot)riggs(at)2ndquadrant(dot)com>
Subject: Re: Faster inserts with mostly-monotonically increasing values
Date: 2018-03-06 00:52:11
Message-ID: CAH2-WznvyfskFXYP8+kgW9q9p4XvQDkeoup+=J69AviiqOWfag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 5, 2018 at 4:37 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> Correct me if I'm wrong, but there's lots of code in nbtree already
> that risks reading recycled pages for various reasons. Presumably,
> checking P_ISDELETED and P_ISHALFDEAD should be enough, which is what
> !P_IGNORE implies.

You're mistaken. Read the nbtree README on page deletion, and the
RecentGlobalXmin interlock. Note also that there are 3 distinct page
states in play as far as deletion/recycling is concerned:

1. The first phase of deletion
2. The second phase of deletion.
3. Post-recycling, when in general the page could look like just about anything.

It's page deletion's job to make sure that an index scan cannot land
on a page after it was recycled. The corollary is that it is not
everyone else's job to make sure that that didn't happen -- how could
that possibly work?

Quite a few places know a bit and must reason about the the first
phase of deletion, and a few know about the second, but that's it.

>>> You could allow for some slack, by comparing against the leftmost key
>>> instead. You just need to know whether the key fits in the page. Then,
>>> the insert can proceed as usual, doing the binsearch to find the
>>> proper insertion point, etc.
>>
>> The whole way that this patch opts out of using an exclusive buffer
>> lock on the "first page the value could be on" (the _bt_check_unique()
>> + _bt_findinsertloc() stuff) makes me uneasy. I know that that isn't
>> terribly concrete.
>
> Assuming the rightmost page is the first page the value could be on,
> it already does get an exclusive buffer lock.

This can be quite fiddly. The downlink in the parent can be equal to
the value in the target page, meaning that we actually lock the
target's left sibling (see _bt_binsrch() special case for internal
pages).

I didn't say that the patch is necessarily wrong here. Just that it
makes me nervous.

> If one wanted to relax the condition as I proposed, that uniqueness
> check is necessary, so that's why I propose shortcircuiting the search
> only, and not the actual insertion on the page. I believe IMO it's
> more important to try and maximize the conditions under which the
> optimization can be applied.

I didn't get what the point of checking the first item on the page
(your proposal) was.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-03-06 00:53:23 Re: [COMMITTERS] pgsql: Fix inadequate locking during get_rel_oids().
Previous Message Tom Lane 2018-03-06 00:49:06 Re: [COMMITTERS] pgsql: Fix inadequate locking during get_rel_oids().