Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: Making all nbtree entries unique by having heap TIDs participate in comparisons
Date: 2019-03-16 21:01:48
Message-ID: CAH2-WzkN5eT1g8GOfejEf5V5ZvrTX5s9ZJ7hjLVTgjRTWqn7wQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 16, 2019 at 1:47 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I agree that it's always true that the high key is also in the parent,
> and we could cross-check that within the child. Actually, I should
> probably figure out a way of arranging for the Bloom filter used
> within bt_relocate_from_root() (which has been around since PG v11) to
> include the key itself when fingerprinting. That would probably
> necessitate that we don't truncate "negative infinity" items (it was
> actually that way about 18 years ago), just for the benefit of
> verification.

Clarification: You'd fingerprint an entire pivot tuple -- key, block
number, everything. Then, you'd probe the Bloom filter for the high
key one level down, with the downlink block in the high key set to
point to the current sibling on the same level (the child level). As I
said, I think that the only reason that that cannot be done at present
is because of the micro-optimization of truncating the first item on
the right page to zero attributes during an internal page split. (We
can retain the key without getting rid of the hard-coded logic for
negative infinity within _bt_compare()).

bt_relocate_from_root() already has smarts around interrupted page
splits (with the incomplete split bit set).

Finally, you'd also make bt_downlink_check follow every downlink, not
all-but-one downlink (no more excuse for leaving out the first one if
we don't truncate during internal page splits).

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Li, Zheng 2019-03-16 21:07:04 Re: NOT IN subquery optimization
Previous Message Chapman Flack 2019-03-16 20:55:44 Re: Fix XML handling with DOCTYPE