Re: [pgsql-hackers] Daily digest v1.9418 (15 messages)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [pgsql-hackers] Daily digest v1.9418 (15 messages)
Date: 2009-08-27 17:12:20
Message-ID: 603c8f070908271012g26ce3778t9a83539a696dccb8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 27, 2009 at 12:47 PM, Jeff Janes<jeff(dot)janes(at)gmail(dot)com> wrote:
>> ---------- Forwarded message ----------
>> From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
>> To: Robert Haas <robertmhaas(at)gmail(dot)com>
>> Date: Thu, 27 Aug 2009 10:11:24 -0400
>> Subject: Re: 8.5 release timetable, again
>>
>> What I'd like to see is some sort of test mechanism for WAL recovery.
>> What I've done sometimes in the past (and recently had to fix the tests
>> to re-enable) is to kill -9 a backend immediately after running the
>> regression tests, let the system replay the WAL for the tests, and then
>> take a pg_dump and compare that to the dump gotten after a conventional
>> run.  However this is quite haphazard since (a) the regression tests
>> aren't especially designed to exercise all of the WAL logic, and (b)
>> pg_dump might not show the effects of some problems, particularly not
>> corruption in non-system indexes.  It would be worth the trouble to
>> create a more specific test methodology.
>
> I hacked mdwrite so that it had a static int counter.  When the counter hit
> 400 and if the guc_of_death was set, it would write out a partial block (to
> simulate a partial page write) and then PANIC.  I have some Perl code that
> runs against the database doing a bunch of updates until the database dies.
> Then when it can reconnect again it makes sure the data reflects what Perl
> thinks it should.  This is how I (belatedly) found and traced down the bug
> in the visibility bit.  (What I was trying to do is determine if my toying
> around with XLogInsert was breaking anything.  Since the regression suit
> wouldn't show me a problem if one existed, I came up with this.  Then I
> found things were broken even before I started toying with it...)
>
> I don't know how lucky I was to hit open a test that found an already
> existing bug.  I have to assume I was somewhat lucky, simply because it took
> a run of many hours or overnight (with a simulated crash every 2 minutes or
> so) to reliably detect the problem.  But how do you turn something like this
> into a regression test?  Scattering the code with intentional crash inducing
> code that is there to exercise the error recover parts seems like it would
> be quite a mess.

This is pretty cool, IMO. Admittedly, it does seem hard to bottle it,
but you managed it, so it's not completely impossible. What you could
for this kind of thing is a series of patches and driver scripts, so
you build PostgreSQL with the patch, then run the driver script
against it. Probably we'd want to standardize some kind of framework
for the driver scripts, once we had a list of ideas for testing and
some idea what it should look like.

...Robert

P.S. The subject line of this thread is not ideal.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alex Hunsaker 2009-08-27 17:41:05 Re: clang's static checker report.
Previous Message Tom Lane 2009-08-27 17:08:55 Re: pretty print viewdefs