Quick Links

Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

From:	Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc:	Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Subject:	Re: Making all nbtree entries unique by having heap TIDs participate in comparisons
Date:	2018-09-20 04:56:37
Message-ID:	f6ccee8c-8f23-a3a6-16be-76349801472b@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

I use the v5 version in quick vacuum strategy and in the heap&index
cleaner (see new patches at corresponding thread a little bit later). It
works fine and give quick vacuum 2-3% performance growup in comparison
with version v3 at my 24-core test server.
Note, that the interface of _bt_moveright() and _bt_binsrch() functions
with combination of scankey, scantid and nextkey parameters is too
semantic loaded.
Everytime of code reading i spend time to remember, what this functions
do exactly.
May be it needed to rewrite comments. For example, _bt_moveright()
comments may include phrase:
nextkey=false: Traverse to the next suitable index page if the current
page does not contain the value (scan key; scan tid).

What do you think about submitting the patch to the next CF?

12.09.2018 23:11, Peter Geoghegan пишет:
> Attached is v4. I have two goals in mind for this revision, goals that
> are of great significance to the project as a whole:
>
> * Making better choices around leaf page split points, in order to
> maximize suffix truncation and thereby maximize fan-out. This is
> important when there are mostly-distinct index tuples on each leaf
> page (i.e. most of the time). Maximizing the effectiveness of suffix
> truncation needs to be weighed against the existing/main
> consideration: evenly distributing space among each half of a page
> split. This is tricky.
>
> * Not regressing the logic that lets us pack leaf pages full when
> there are a great many logical duplicates. That is, I still want to
> get the behavior I described on the '"Write amplification" is made
> worse by "getting tired" while inserting into nbtree secondary
> indexes' thread [1]. This is not something that happens as a
> consequence of thinking about suffix truncation specifically, and
> seems like a fairly distinct thing to me. It's actually a bit similar
> to the rightmost 90/10 page split case.
>
> v4 adds significant new logic to make us do better on the first goal,
> without hurting the second goal. It's easy to regress one while
> focussing on the other, so I've leaned on a custom test suite
> throughout development. Previous versions mostly got the first goal
> wrong, but got the second goal right. For the time being, I'm
> focussing on index size, on the assumption that I'll be able to
> demonstrate a nice improvement in throughput or latency later. I can
> get the main TPC-C order_line pkey about 7% smaller after an initial
> bulk load with the new logic added to get the first goal (note that
> the benefits with a fresh CREATE INDEX are close to zero). The index
> is significantly smaller, even though the internal page index tuples
> can themselves never be any smaller due to alignment -- this is all
> about not restricting what can go on each leaf page by too much. 7% is
> not as dramatic as the "get tired" case, which saw something like a
> 50% decrease in bloat for one pathological case, but it's still
> clearly well worth having. The order_line primary key is the largest
> TPC-C index, and I'm merely doing a standard bulk load to get this 7%
> shrinkage. The TPC-C order_line primary key happens to be kind of
> adversarial or pathological to B-Tree space management in general, but
> it's still fairly realistic.
>
> For the first goal, page splits now weigh what I've called the
> "distance" between tuples, with a view to getting the most
> discriminating split point -- the leaf split point that maximizes the
> effectiveness of suffix truncation, within a range of acceptable split
> points (acceptable from the point of view of not implying a lopsided
> page split). This is based on probing IndexTuple contents naively when
> deciding on a split point, without regard for the underlying
> opclass/types. We mostly just use char integer comparisons to probe,
> on the assumption that that's a good enough proxy for using real
> insertion scankey comparisons (only actual truncation goes to those
> lengths, since that's a strict matter of correctness). This distance
> business might be considered a bit iffy by some, so I want to get
> early feedback. This new "distance" code clearly needs more work, but
> I felt that I'd gone too long without posting a new version.
>
> For the second goal, I've added a new macro that can be enabled for
> debugging purposes. This has the implementation sort heap TIDs in ASC
> order, rather than DESC order. This nicely demonstrates how my two
> goals for v4 are fairly independent; uncommenting "#define
> BTREE_ASC_HEAP_TID" will cause a huge regression with cases where many
> duplicates are inserted, but won't regress things like the TPC-C
> indexes. (Note that BTREE_ASC_HEAP_TID will break the regression
> tests, though in a benign way can safely be ignored.)
>
> Open items:
>
> * Do more traditional benchmarking.
>
> * Add pg_upgrade support.
>
> * Simplify _bt_findsplitloc() logic.
>
> [1] https://postgr.es/m/CAH2-Wzmf0fvVhU+SSZpGW4Qe9t--j_DmXdX3it5JcdB8FF2EsA@mail.gmail.com
>

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company

In response to

Re: Making all nbtree entries unique by having heap TIDs participate in comparisons at 2018-09-12 18:11:18 from Peter Geoghegan

Responses

Re: Making all nbtree entries unique by having heap TIDs participate in comparisons at 2018-09-20 21:58:02 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Chris Travers	2018-09-20 05:12:22	Re: Code of Conduct plan
Previous Message	Tom Lane	2018-09-20 04:44:30	Re: vary read_only in SPI calls? or poke at the on-entry snapshot?