Re: new heapcheck contrib module

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new heapcheck contrib module
Date: 2020-10-22 19:15:53
Message-ID: 2A7DA1A8-C4AA-43DF-A985-3CA52F4DC775@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Oct 22, 2020, at 9:01 AM, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>
>
>
>> On Oct 22, 2020, at 7:06 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>
>> On Thu, Oct 22, 2020 at 8:51 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> Committed. Let's see what the buildfarm thinks.
>>
>> It is mostly happy, but thorntail is not:
>>
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2020-10-22%2012%3A58%3A11
>>
>> I thought that the problem might be related to the fact that thorntail
>> is using force_parallel_mode, but I tried that here and it did not
>> cause a failure. So my next guess is that it is related to the fact
>> that this is a sparc64 machine, but it's hard to tell, since none of
>> the other sparc64 critters have run yet. In any case I don't know why
>> that would cause a failure. The messages in the log aren't very
>> illuminating, unfortunately. :-(
>>
>> Mark, any ideas what might cause specifically that set of tests to fail?
>
> The code is correctly handling an uncorrupted table, but then more or less randomly failing some of the time when processing a corrupt table.
>
> Tom identified a problem with an uninitialized variable. I'm putting together a new patch set to address it.

The 0001 attached patch addresses the -Werror=maybe-uninitialized problem.

The 0002 attached patch addresses the test failures:

The failing test is designed to stop the server, create blunt force trauma to the heap and toast files through overwriting garbage bytes, restart the server, and verify that corruption is detected by amcheck's verify_heapam(). The exact trauma is intended to be the same on all platforms, in terms of the number of bytes written and the location in the file that it gets written, but owing to differences between platforms, by design the test does not expect a particular corruption message.

The test was overwriting far fewer bytes than I had intended, but since it was still sufficient to create corruption on the platforms where I tested, I failed to notice. It should do a more thorough job now.

Attachment Content-Type Size
v21-0001-Fixing-unitialized-variable-bug.patch application/octet-stream 3.8 KB
v21-0002-Fixing-sloppy-regression-test-coding.patch application/octet-stream 2.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-10-22 19:23:04 Re: Mop-up around psql's \connect behavior
Previous Message Peter Geoghegan 2020-10-22 17:42:28 Re: Deleting older versions in unique indexes to avoid page splits