Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: Making all nbtree entries unique by having heap TIDs participate in comparisons
Date: 2019-01-28 15:31:56
Message-ID: 7010f1ac-fdb0-52b1-7c53-8771d1dc4673@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/01/2019 02:47, Peter Geoghegan wrote:
> On Fri, Dec 28, 2018 at 3:32 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>> On Fri, Dec 28, 2018 at 3:20 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>>> I'm envisioning that you have an array, with one element for each item
>>> on the page (including the tuple we're inserting, which isn't really on
>>> the page yet). In the first pass, you count up from left to right,
>>> filling the array. Next, you compute the complete penalties, starting
>>> from the middle, walking outwards.
>
>> Ah, right. I think I see what you mean now.
>
>> Leave it with me. I'll need to think about this some more.
>
> Attached is v10 of the patch series, which has many changes based on
> your feedback. However, I didn't end up refactoring _bt_findsplitloc()
> in the way you described, because it seemed hard to balance all of the
> concerns there. I still have an open mind on this question, andAt a
> recognize the merit in what you suggested. Perhaps it's possible to
> reach a compromise here.

I spent some time first trying to understand the current algorithm, and
then rewriting it in a way that I find easier to understand. I came up
with the attached. I think it optimizes for the same goals as your
patch, but the approach is quite different. At a very high level, I
believe the goals can be described as:

1. Find out how much suffix truncation is possible, i.e. how many key
columns can be truncated away, in the best case, among all possible ways
to split the page.

2. Among all the splits that achieve that optimum suffix truncation,
find the one with smallest "delta".

For performance reasons, it doesn't actually do it in that order. It's
more like this:

1. First, scan all split positions, recording the 'leftfree' and
'rightfree' at every valid split position. The array of possible splits
is kept in order by offset number. (This scans through all items, but
the math is simple, so it's pretty fast)

2. Compute the optimum suffix truncation, by comparing the leftmost and
rightmost keys, among all the possible split positions.

3. Split the array of possible splits in half, and process both halves
recursively. The recursive process "zooms in" to the place where we'd
expect to find the best candidate, but will ultimately scan through all
split candidates, if no "good enough" match is found.

I've only been testing this on leaf splits. I didn't understand how the
penalty worked for internal pages in your patch. In this version, the
same algorithm is used for leaf and internal pages. I'm sure this still
has bugs in it, and could use some polishing, but I think this will be
more readable way of doing it.

What have you been using to test this? I wrote the attached little test
extension, to explore what _bt_findsplitloc() decides in different
scenarios. It's pretty rough, but that's what I've been using while
hacking on this. It prints output like this:

postgres=# select test_split();
NOTICE: test 1:
left 2/358: 1 0
left 358/358: 1 356
right 1/ 51: 1 357
right 51/ 51: 1 407 SPLIT TUPLE
split ratio: 12/87

NOTICE: test 2:
left 2/358: 0 0
left 358/358: 356 356
right 1/ 51: 357 357
right 51/ 51: 407 407 SPLIT TUPLE
split ratio: 12/87

NOTICE: test 3:
left 2/358: 0 0
left 320/358: 10 10 SPLIT TUPLE
left 358/358: 48 48
right 1/ 51: 49 49
right 51/ 51: 99 99
split ratio: 12/87

NOTICE: test 4:
left 2/309: 1 100
left 309/309: 1 407 SPLIT TUPLE
right 1/100: 2 0
right 100/100: 2 99
split ratio: 24/75

Each test consists of creating a temp table with one index, and
inserting rows in a certain pattern, until the root page splits. It then
prints the first and last tuples on both pages, after the split, as well
as the tuple that caused the split. I don't know if this is useful to
anyone but myself, but I thought I'd share it.

- Heikki

Attachment Content-Type Size
nbtsplitloc.c text/x-csrc 19.4 KB
btree-split-test.tar.gz application/gzip 3.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Verite 2019-01-28 16:01:40 Re: Alternative to \copy in psql modelled after \g
Previous Message Tom Lane 2019-01-28 15:31:25 Re: Alternative to \copy in psql modelled after \g