Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: Making all nbtree entries unique by having heap TIDs participate in comparisons
Date: 2019-03-16 08:51:39
Message-ID: CAH2-Wz=AcYC5hNChERUQLsXHz4Fgf8fAqjUH1_JP68UEcrcn4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 16, 2019 at 1:44 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> > It would be nice if you could take a look at the amcheck "relocate"
> > patch
> When I started looking at this, I thought that "relocate" means "move".
> So I thought that the new mode would actually move tuples, i.e. that it
> would modify the index. That sounded horrible. Of course, it doesn't
> actually do that. It just checks that each tuple can be re-found, or
> "relocated", by descending the tree from the root. I'd suggest changing
> the language to avoid that confusion.

Okay. What do you suggest? :-)

> It seems like a nice way to catch all kinds of index corruption issues.
> Although, we already check that the tuples are in order within the page.
> Is it really necessary to traverse the tree for every tuple, as well?
> Maybe do it just for the first and last item?

It's mainly intended as a developer option. I want it to be possible
to detect any form of corruption, however unlikely. It's an
adversarial mindset that will at least make me less nervous about the
patch.

> I don't understand this. Can you give an example of this kind of
> inconsistency?

The commit message gives an example, but I suggest trying it out for
yourself. Corrupt the least significant key byte of a root page of a
B-Tree using pg_hexedit. Say it's an index on a text column, then
you'd corrupt the last byte to be something slightly wrong. Then, the
only way to catch it is with "relocate" verification. You'll only miss
a few tuples on a cousin page at the leaf level (those on either side
of the high key that the corrupted separator key in the root was
originally copied from).

The regular checks won't catch this, because the keys are similar
enough one level down. The "minus infinity" item is a kind of a blind
spot, because we cannot do a parent check of its children, because we
don't have the key (it's truncated when the item becomes a right page
minus infinity item, during an internal page split).

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2019-03-16 09:35:52 Re: jsonpath
Previous Message Heikki Linnakangas 2019-03-16 08:43:00 Re: Making all nbtree entries unique by having heap TIDs participate in comparisons