Re: pgsql: Optimize btree insertions for common case of increasing values

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-committers <pgsql-committers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgsql: Optimize btree insertions for common case of increasing values
Date: 2018-03-28 23:09:41
Message-ID: CAH2-Wzmh_sxhXYfqhpkD6t+C3K57TRDOMXKdG-NQ6qvjnuW37g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

On Wed, Mar 28, 2018 at 11:56 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>> Previously, we agreed that P_IGNORE() is required. So I assume no issues
>> there. The other tests seem required too for us to conclusively decide to
>> use the cached block.
>
> Actually, you're right that P_INCOMPLETE_SPLIT() is the only redundant
> one.

Another issue that I have with the main test of the
RelationSetTargetBlock() page is that nobody explains *why* we check
that we have enough space on the page to proceed. Why would it not be
okay if there was a page split?

Although it's subtle, I'm pretty confident that it actually would be
okay from a correctness point of view to allow an insertion to go
ahead that would result in a page split -- it's possible to pass a
NULL stack to _bt_findinsertloc() and _bt_insertonpg() while still
getting correct behavior.

* In the case of _bt_findinsertloc(), it's okay to have a NULL stack
because you'll never take the P_INCOMPLETE_SPLIT() path for a
rightmost page, as already explained, so the stack cannot possibly be
used (besides, even _bt_finish_split() can deal with a NULL stack). It
is not just an accident that it works, because _bt_search() will
sometimes return a NULL stack when writing -- it does so when there is
only one non-meta page, which is both a root and a leaf page. Your
optimization is a bit similar to that, in the sense that the
cached/target block contains a leaf page that is rightmost (root pages
are always rightmost, and are leaf pages in the case described).

* In the case of _bt_insertonpg(), it's okay to have a NULL stack
because there is a codepath that allows _bt_insert_parent() (which is
the only thing in _bt_insertonpg() that touches the stack) to work
without a stack during crash recovery. That path is quite a bit
slower, though.

Suggested next steps to deal with this:

* A minor point: I don't think you should call
RelationSetTargetBlock() when the page P_ISROOT(), which, as I
mentioned, is a condition that can coexist with P_ISLEAF() with very
small B-Trees. There can be no point in doing so. No need to add
P_ISROOT() to the main "Is cached page stale?" test within
_bt_doinsert(), though.

* An assertion would make me feel a lot better about depending on not
having a page split from a significant distance.

Your optimization should continue to not be used when it would result
in a page split, but only because that would be slow. The comments
should say so, IMV. Also, _bt_insertonpg() should have an assertion
against a page split actually occurring when the optimization was
used, just in case. When !BufferIsValid(cbuf), we know that we're
being called from _bt_doinsert() (see existing assertion at the top of
_bt_insertonpg() as an example of this), so at the point where it's
clear a page split is needed, we should assert that there is no target
block that we must have been passed as the target page.

* The indentation of the main _bt_doinsert() test does not follow
project guidelines. Please tweak that, too.

That's all I have for now. I might think of something else tomorrow,
since I haven't spent that much time on this. It would be nice to get
a second opinion on some of this, if at all possible.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Peter Eisentraut 2018-03-28 23:15:28 pgsql: Allow committing inside cursor loop
Previous Message Simon Riggs 2018-03-28 22:50:57 Re: pgsql: Add documentation for the JIT feature.