Re: new heapcheck contrib module

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new heapcheck contrib module
Date: 2020-06-12 21:06:18
Message-ID: 885D2EC9-7D06-4903-A17D-72C184A1E1C5@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Jun 11, 2020, at 11:35 PM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Fri, Jun 12, 2020 at 12:40 AM Mark Dilger
> <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>>
>>
>>
>>> On Jun 11, 2020, at 9:14 AM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>>>
>>> I have just browsed through the patch and the idea is quite
>>> interesting. I think we can expand it to check that whether the flags
>>> set in the infomask are sane or not w.r.t other flags and xid status.
>>> Some examples are
>>>
>>> - If HEAP_XMAX_LOCK_ONLY is set in infomask then HEAP_KEYS_UPDATED
>>> should not be set in new_infomask2.
>>> - If HEAP_XMIN(XMAX)_COMMITTED is set in the infomask then can we
>>> actually cross verify the transaction status from the CLOG and check
>>> whether is matching the hint bit or not.
>>>
>>> While browsing through the code I could not find that we are doing
>>> this kind of check, ignore if we are already checking this.
>>
>> Thanks for taking a look!
>>
>> Having both of those bits set simultaneously appears to fall into a different category than what I wrote verify_heapam.c to detect.
>
> Ok
>
>
>> It doesn't violate any assertion in the backend, nor does it cause
>> the code to crash. (At least, I don't immediately see how it does
>> either of those things.) At first glance it appears invalid to have
>> those bits both set simultaneously, but I'm hesitant to enforce that
>> without good reason. If it is a good thing to enforce, should we also
>> change the backend code to Assert?
>
> Yeah, it may not hit assert or crash but it could lead to a wrong
> result. But I agree that it could be an assertion in the backend
> code.

For v7, I've added an assertion for this. Per heap/README.tuplock, "We currently never set the HEAP_XMAX_COMMITTED when the HEAP_XMAX_IS_MULTI bit is set." I added an assertion for that, too. Both new assertions are in RelationPutHeapTuple(). I'm not sure if that is the best place to put the assertion, but I am confident that the assertion needs to only check tuples destined for disk, as in memory tuples can and do violate the assertion.

Also for v7, I've updated contrib/amcheck to report these two conditions as corruption.

> What about the other check, like hint bit is saying the
> transaction is committed but actually as per the clog the status is
> something else. I think in general processing it is hard to check
> such things in backend no? because if the hint bit is set saying that
> the transaction is committed then we will directly check its
> visibility with the snapshot. I think a corruption checker may be a
> good tool for catching such anomalies.

I already made some design changes to this patch to avoid taking the CLogTruncationLock too often. I'm happy to incorporate this idea, but perhaps you could provide a design on how to do it without all the extra locking? If not, I can try to get this into v8 as an optional check, so users can turn it on at their discretion. Having the check enabled by default is probably a non-starter.

Attachment Content-Type Size
v7-0001-Adding-verify_heapam-and-pg_amcheck.patch application/octet-stream 155.6 KB
v7-0002-Adding-checks-of-invalid-combinations-of-hint-bit.patch application/octet-stream 5.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2020-06-12 21:26:42 Re: Internal key management system
Previous Message Fabien COELHO 2020-06-12 20:59:37 Re: Internal key management system