Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: Making all nbtree entries unique by having heap TIDs participate in comparisons
Date: 2019-03-16 08:43:00
Message-ID: c248f8a3-7054-3338-73ec-5cd5df403c8a@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16/03/2019 06:16, Peter Geoghegan wrote:
> On Thu, Mar 14, 2019 at 2:21 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> It doesn't matter how often it happens, the code still needs to deal
>> with it. So let's try to make it as readable as possible.
>
>> Well, IMHO holding the buffer and the bounds in the new struct is more
>> clean than the savebinsrc/restorebinsrch flags. That's exactly why I
>> suggested it. I don't know what else to suggest. I haven't done any
>> benchmarking, but I doubt there's any measurable difference.
>
> Fair enough. Attached is v17, which does it using the approach taken
> in your earlier prototype. I even came around to your view on
> _bt_binsrch_insert() -- I kept that part, too. Note, however, that I
> still pass checkingunique to _bt_findinsertloc(), because that's a
> distinct condition to whether or not bounds were cached (they happen
> to be the same thing right now, but I don't want to assume that).
>
> This revision also integrates your approach to the "continuescan"
> optimization patch, with the small tweak I mentioned yesterday (we
> also pass ntupatts). I also prefer this approach.

Great, thank you!

> It would be nice if you could take a look at the amcheck "relocate"
> patch
When I started looking at this, I thought that "relocate" means "move".
So I thought that the new mode would actually move tuples, i.e. that it
would modify the index. That sounded horrible. Of course, it doesn't
actually do that. It just checks that each tuple can be re-found, or
"relocated", by descending the tree from the root. I'd suggest changing
the language to avoid that confusion.

It seems like a nice way to catch all kinds of index corruption issues.
Although, we already check that the tuples are in order within the page.
Is it really necessary to traverse the tree for every tuple, as well?
Maybe do it just for the first and last item?

> + * This routine can detect very subtle transitive consistency issues across
> + * more than one level of the tree. Leaf pages all have a high key (even the
> + * rightmost page has a conceptual positive infinity high key), but not a low
> + * key. Their downlink in parent is a lower bound, which along with the high
> + * key is almost enough to detect every possible inconsistency. A downlink
> + * separator key value won't always be available from parent, though, because
> + * the first items of internal pages are negative infinity items, truncated
> + * down to zero attributes during internal page splits. While it's true that
> + * bt_downlink_check() and the high key check can detect most imaginable key
> + * space problems, there are remaining problems it won't detect with non-pivot
> + * tuples in cousin leaf pages. Starting a search from the root for every
> + * existing leaf tuple detects small inconsistencies in upper levels of the
> + * tree that cannot be detected any other way. (Besides all this, it's
> + * probably a useful testing strategy to exhaustively verify that all
> + * non-pivot tuples can be relocated in the index using the same code paths as
> + * those used by index scans.)

I don't understand this. Can you give an example of this kind of
inconsistency?

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2019-03-16 08:51:39 Re: Making all nbtree entries unique by having heap TIDs participate in comparisons
Previous Message Fabien COELHO 2019-03-16 08:40:26 RE: Timeout parameters