Re: pgsql: Optimize btree insertions for common case of increasing values

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-committers <pgsql-committers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgsql: Optimize btree insertions for common case of increasing values
Date: 2018-03-29 14:59:45
Message-ID: CANP8+jLv-yNmnPeMQ-zuvQzbMcR3sLVyDQWvbzjjZM4nZ3YH+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

On 29 March 2018 at 00:09, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Wed, Mar 28, 2018 at 11:56 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>>> Previously, we agreed that P_IGNORE() is required. So I assume no issues
>>> there. The other tests seem required too for us to conclusively decide to
>>> use the cached block.
>>
>> Actually, you're right that P_INCOMPLETE_SPLIT() is the only redundant
>> one.
>
> Another issue that I have with the main test of the
> RelationSetTargetBlock() page is that nobody explains *why* we check
> that we have enough space on the page to proceed. Why would it not be
> okay if there was a page split?

...because we need the Stack to bubble up the parent insert.

> Although it's subtle, I'm pretty confident that it actually would be
> okay from a correctness point of view to allow an insertion to go
> ahead that would result in a page split -- it's possible to pass a
> NULL stack to _bt_findinsertloc() and _bt_insertonpg() while still
> getting correct behavior.
>
> * In the case of _bt_findinsertloc(), it's okay to have a NULL stack
> because you'll never take the P_INCOMPLETE_SPLIT() path for a
> rightmost page, as already explained, so the stack cannot possibly be
> used (besides, even _bt_finish_split() can deal with a NULL stack). It
> is not just an accident that it works, because _bt_search() will
> sometimes return a NULL stack when writing -- it does so when there is
> only one non-meta page, which is both a root and a leaf page. Your
> optimization is a bit similar to that, in the sense that the
> cached/target block contains a leaf page that is rightmost (root pages
> are always rightmost, and are leaf pages in the case described).
>
> * In the case of _bt_insertonpg(), it's okay to have a NULL stack
> because there is a codepath that allows _bt_insert_parent() (which is
> the only thing in _bt_insertonpg() that touches the stack) to work
> without a stack during crash recovery. That path is quite a bit
> slower, though.

It's a linear scan, so quite clearly not going to perform well on big indexes.

Why would that be worth pursuing?

> Suggested next steps to deal with this:
>
> * A minor point: I don't think you should call
> RelationSetTargetBlock() when the page P_ISROOT(), which, as I
> mentioned, is a condition that can coexist with P_ISLEAF() with very
> small B-Trees. There can be no point in doing so. No need to add
> P_ISROOT() to the main "Is cached page stale?" test within
> _bt_doinsert(), though.

True. A better test would be to not use the optimization at all on
smaller btrees by checking the level is >= 3.

> * An assertion would make me feel a lot better about depending on not
> having a page split from a significant distance.
>
> Your optimization should continue to not be used when it would result
> in a page split, but only because that would be slow. The comments
> should say so, IMV. Also, _bt_insertonpg() should have an assertion
> against a page split actually occurring when the optimization was
> used, just in case. When !BufferIsValid(cbuf), we know that we're
> being called from _bt_doinsert() (see existing assertion at the top of
> _bt_insertonpg() as an example of this), so at the point where it's
> clear a page split is needed, we should assert that there is no target
> block that we must have been passed as the target page.

This is the same issues as the NULL stack. The page split would need
to insert into parent.

> * The indentation of the main _bt_doinsert() test does not follow
> project guidelines. Please tweak that, too.
>
> That's all I have for now. I might think of something else tomorrow,
> since I haven't spent that much time on this. It would be nice to get
> a second opinion on some of this, if at all possible.

I don't see any need to change any of these things. This is a tuning
patch and none of the above affects correctness of what has been
committed.

If you can find non-performant cases that require further tuning, we
can submit another patch at a later time.

And then we can cross the bridge of checking whether what is said
above is correct or not.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message David Steele 2018-03-29 15:02:32 Re: pgsql: Add documentation for the JIT feature.
Previous Message Teodor Sigaev 2018-03-29 14:57:25 Re: pgsql: Add casts from jsonb