Re: new heapcheck contrib module

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new heapcheck contrib module
Date: 2020-10-23 04:21:11
Message-ID: E9D7ECC3-5538-4B14-AD4C-75931B6BEE22@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Oct 22, 2020, at 7:01 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> writes:
>> Ahh, crud. It's because
>> syswrite($fh, '\x77\x77\x77\x77', 500)
>> is wrong twice. The 500 was wrong, but the string there isn't the bit pattern we want -- it's just a string literal with backslashes and such. It should have been double-quoted.
>
> Argh. So we really have, using same test except
>
> memcpy(&lp, "\\x77", sizeof(lp));
>
> little endian: off = 785c, flags = 2, len = 1b9b
> big endian: off = 2e3c, flags = 0, len = 3737
>
> which explains the apparent LP_DEAD result.
>
> I'm not particularly on board with your suggestion of "well, if it works
> sometimes then it's okay". Then we have no idea of what we really tested.
>
> regards, tom lane

Ok, I've pruned it down to something you may like better. Instead of just checking that *some* corruption occurs, it checks the returned corruption against an expected regex, and if it fails to match, you should see in the logs what you got vs. what you expected.

It only corrupts the first two line pointers, the first one with 0x77777777 and the second one with 0xAAAAAAAA, which are consciously chosen to be bitwise reverses of each other and just strings of alternating bits rather than anything that could have a more complicated interpretation.

On my little-endian mac, the 0x77777777 value creates a line pointer which redirects to an invalid offset 0x7777, which gets reported as decimal 30583 in the corruption report, "line pointer redirection to item at offset 30583 exceeds maximum offset 38". The test is indifferent to whether the corruption it is looking for is reported relative to the first line pointer or the second one, so if endian-ness matters, it may be the 0xAAAAAAAA that results in that corruption message. I don't have a machine handy to test that. It would be nice to determine the minimum amount of paranoia necessary to make this portable and not commit the rest.

Attachment Content-Type Size
regress.patch.WIP application/octet-stream 5.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-10-23 04:36:17 Re: "unix_socket_directories" should be GUC_LIST_INPUT?
Previous Message Ian Lawrence Barwick 2020-10-23 04:13:56 Re: "unix_socket_directories" should be GUC_LIST_INPUT?