Re: Making all nbtree entries unique by having heap TIDs participate in comparisons

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Subject: Re: Making all nbtree entries unique by having heap TIDs participate in comparisons
Date: 2019-03-16 20:33:29
Message-ID: a7e95343-ec6b-d204-63ec-27966a853a3f@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16/03/2019 18:55, Peter Geoghegan wrote:
> On Sat, Mar 16, 2019 at 9:17 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> Then, a cosmic ray changes the 20 on the root page to 18. That causes
>> the the leaf tuple 19 to become non-re-findable; if you descend the for
>> 19, you'll incorrectly land on page 6. But it also causes the high key
>> on page 2 to be different from the downlink on the root page. Wouldn't
>> the existing checks catch this?
>
> No, the existing checks will not check that. I suppose something
> closer to the existing approach *could* detect this issue, by making
> sure that the "seam of identical high keys" from the root to the leaf
> are a match, but we don't use the high key outside of its own page.
> Besides, there is something useful about having the code actually rely
> on _bt_search().
>
> There are other similar cases, where it's not obvious how you can do
> verification without either 1) crossing multiple levels, or 2)
> retaining a "low key" as well as a high key (this is what Goetz Graefe
> calls "retaining fence keys to solve the cousin verification problem"
> in Modern B-Tree Techniques). What if the corruption was in the leaf
> page 6 from your example, which had a 20 at the start? We wouldn't
> have compared the downlink from the parent to the child, because leaf
> page 6 is the leftmost child, and so we only have "-inf". The lower
> bound actually comes from the root page, because we truncate "-inf"
> attributes during page splits, even though we don't have to. Most of
> the time they're not "absolute minus infinity" -- they're "minus
> infinity in this subtree".

AFAICS, there is a copy of every page's high key in its immediate
parent. Either in the downlink of the right sibling, or as the high key
of the parent page itself. Cross-checking those would catch any
corruption in high keys.

Note that this would potentially catch some corruption that the
descend-from-root would not. If you have a mismatch between the high key
of a leaf page and its parent or grandparent, all the current tuples
might be pass the rootdescend check. But a tuple might get inserted to
wrong location later.

> Maybe you could actually do something with the high key from leaf page
> 5 to detect the stray value "20" in leaf page 6, but again, we don't
> do anything like that right now.

Hmm, yeah, to check for stray values, you could follow the left-link,
get the high key of the left sibling, and compare against that.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-03-16 20:42:45 Re: Fix XML handling with DOCTYPE
Previous Message Chapman Flack 2019-03-16 20:31:18 Re: Fix XML handling with DOCTYPE