Re: Corrupted btree index on HEAD because of covering indexes

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)postgresql(dot)org>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Corrupted btree index on HEAD because of covering indexes
Date: 2018-04-20 20:06:15
Message-ID: CAH2-Wzncb9OZ8BSMEGPVwi3m9oYF=g1Yhy-M4f0DyxkQTLoWCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 19, 2018 at 11:41 PM, Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:
>> heard of people using bt_index_parent_check() in production, but only
>> when they already knew that their database was corrupt, and wanted to
>> isolate the problem. I imagine that people that use
>> bt_index_parent_check() in production do so because they want as much
>> information as possible, and are willing to do whatever it takes to
>> get more information.
>
> That why I think we need improve amcheck - ideally, it should not miss any
> mistakes in tree structure.

I have finished writing a prototype that works, and passes all tests.
Without the fix, this is what the complaint from VACUUM looks like on
my machine:

ERROR: right sibling 265 of block 134 is not next child 395 of block
135 in index "delete_test_table_pkey"

After the final VACUUM in your test case runs (the one that throws
this error), my working copy of amcheck can detect the same issue with
low overhead:

pg(at)~[20995]=# select bt_index_parent_check('delete_test_table_pkey', true);
ERROR: index block lacks downlink in index "delete_test_table_pkey"
DETAIL: Block=265 level=1 page lsn=0/909F4B18.

There is no complaint from amcheck if called before the final VACUUM
in your test case, since that's when the trouble starts (and ends).
It's easy to account for concurrent page splits within amcheck by
checking for BTP_SPLIT_END on the child page that could lack a
downlink. Don't confuse this with checking the BTP_INCOMPLETE_SPLIT
flag, which is what we set on the left half/sibling of a split -- not
the "new" right half/sibling. Of course, the right sibling is the type
of block that can lack a downlink until after the second phase of a
pagesplit (i.e. until it's fully finished), or after the first phase
of page deletion. (As you know, page deletion is like a page split "in
reverse".)

I said concurrent page split, but I really meant incomplete page
split, since a concurrent insert/split is impossible given that
amcheck holds a ShareLock here. (See commit 40dae7ec5.)

Of course, we don't see details about the next level up in the amcheck
error, unlike the VACUUM error. That shouldn't matter, because we're
already verifying a lot about the relationship between blocks at the
next level up, and even between the two levels. This enhancement adds
the one piece we were missing. It could in theory detect other
problems that VACUUM cannot detect, too.

Expect me to post a patch during the work day on Monday or Tuesday
(Pacific time). I need to polish this patch some more. I particularly
need to think about the use of memory for the Bloom filter (right now
I just use work_mem).

> Could you write it? I'm afraid I can't give good explanation why we believe
> that nobody update this page ant it's high key while page is unlocked but
> pinned.

Okay. I'll come up with something soon.

> Hm, it seems to me, that 350ms is short enough to place it in both core and
> amcheck test. I think, basic functionality should be covered by core tests
> as we test insert/create.

I think you're right. There is really no excuse for not having
thorough code coverage for a module like nbtree.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chapman Flack 2018-04-20 20:18:10 Re: Add read-only param to set_config(...) / SET that effects (at least) customized runtime options
Previous Message Andrew Gierth 2018-04-20 20:03:17 Re: Add read-only param to set_config(...) / SET that effects (at least) customized runtime options