Re: room for improvement in amcheck btree checking?

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: room for improvement in amcheck btree checking?
Date: 2020-12-02 02:03:48
Message-ID: 049FF680-2498-4D12-8C63-934E532E9BE7@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Dec 1, 2020, at 4:38 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Tue, Dec 1, 2020 at 12:39 PM Mark Dilger
> <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>> I found it surprising that even when precisely zero of the tids in the index exist in the table the index checks all come back clean. The heapallindexed check is technically running as advertised, checking that all of the zero tuples in the heap are present in the index. That is a pretty useless check under this condition, though. Is a "indexallheaped" option (by some less crazy name) needed?
>>
>> Users might also run into this problem when a heap relation file gets erroneously shortened by some number of blocks but not fully truncated, or perhaps with torn page writes.
>
> It would probably be possible to detect this exact condition (though
> not other variants of the more general problem) relatively easily.
> Perhaps by doing something with RelationOpenSmgr() with the heap
> relation, while applying a little knowledge of what must be true about
> the heap relation given what is known about the index relation. (I'm
> not 100% sure that that's possible, but offhand it seems like it
> probably is.)
>
> See also: commit 6754fe65a4c, which tightened things up in this area
> for the index relation itself.

Yes, some of the test code I've been playing with already hit the error messages introduced in that commit.

>> Have you already considered and rejected a "indexallheaped" type check?
>
> Actually, I pointed out that we should do something along these lines
> very recently:
>
> https://postgr.es/m/CAH2-Wz=dy--FG5iJ0kPcQumS0W5g+xQED3t-7HE+UqAK_hmLTw@mail.gmail.com

Right.

> I'd be happy to see you pursue it.

I'm not sure how much longer I'll be pursuing corruption checkers before switching to another task. Certainly, I'm interested in pursuing this if time permits.

>> Background
>> -------
>>
>> I have up until recently been focused on corruption caused by twiddling the bits within heap and index relation pages, but real-world user error, file system error, and perhaps race conditions in the core postgresql code seem at least as likely to result in missing or incorrect versions of blocks of relation files rather than individual bytes within those blocks being wrong. Per our discussions in [3], not all corruptions that can be created under laboratory conditions are equally likely to occur in the wild, and it may be reasonable to only harden the amcheck code against corruptions that are more likely to happen in actual practice.
>
> Right. I also like to harden amcheck in ways that happen to be easy,
> especially when it seems like it might kind of work as a broad
> defense that doesn't consider any particular scenario. For example,
> hardening to make sure the an index tuple's lp_len matches
> IndexTupleSize() for the tuple proper (this also kind of works as
> storage format documentation). I am concerned about both costs and
> benefits.
>
> FWIW, this is a perfect example of the kind of hardening that makes
> sense to me. Clearly this kind of failure is a reasonably plausible
> scenario (probably even a known real world scenario with catalog
> corruption), and clearly we could do something about it that's pretty
> simple and obviously low risk. It easily meets the standard that I
> might apply here.

Ok, it seems we're in general agreement about that.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message tsunakawa.takay@fujitsu.com 2020-12-02 02:21:52 RE: Disable WAL logging to speed up data loading
Previous Message Fujii Masao 2020-12-02 02:01:52 Re: Allow some recovery parameters to be changed with reload