Re: [pgsql-hackers] Daily digest v1.9418 (15 messages)

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [pgsql-hackers] Daily digest v1.9418 (15 messages)
Date: 2009-08-27 16:47:30
Message-ID: f67928030908270947h10862c70h9e3a4c59ab21f337@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> ---------- Forwarded message ----------
> From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> To: Robert Haas <robertmhaas(at)gmail(dot)com>
> Date: Thu, 27 Aug 2009 10:11:24 -0400
> Subject: Re: 8.5 release timetable, again
>
> What I'd like to see is some sort of test mechanism for WAL recovery.
> What I've done sometimes in the past (and recently had to fix the tests
> to re-enable) is to kill -9 a backend immediately after running the
> regression tests, let the system replay the WAL for the tests, and then
> take a pg_dump and compare that to the dump gotten after a conventional
> run. However this is quite haphazard since (a) the regression tests
> aren't especially designed to exercise all of the WAL logic, and (b)
> pg_dump might not show the effects of some problems, particularly not
> corruption in non-system indexes. It would be worth the trouble to
> create a more specific test methodology.

I hacked mdwrite so that it had a static int counter. When the counter hit
400 and if the guc_of_death was set, it would write out a partial block (to
simulate a partial page write) and then PANIC. I have some Perl code that
runs against the database doing a bunch of updates until the database dies.
Then when it can reconnect again it makes sure the data reflects what Perl
thinks it should. This is how I (belatedly) found and traced down the bug
in the visibility bit. (What I was trying to do is determine if my toying
around with XLogInsert was breaking anything. Since the regression suit
wouldn't show me a problem if one existed, I came up with this. Then I
found things were broken even before I started toying with it...)

I don't know how lucky I was to hit open a test that found an already
existing bug. I have to assume I was somewhat lucky, simply because it took
a run of many hours or overnight (with a simulated crash every 2 minutes or
so) to reliably detect the problem. But how do you turn something like this
into a regression test? Scattering the code with intentional crash inducing
code that is there to exercise the error recover parts seems like it would
be quite a mess.

> In short: merely making the tests bigger doesn't impress me in the
> least. Focused testing on areas we aren't covering at all could be
> worth the trouble.

Do you have suggestions on what other areas need it?

Jeff

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-08-27 17:08:55 Re: pretty print viewdefs
Previous Message Jaime Casanova 2009-08-27 16:35:49 Re: MySQL Compatibility WAS: 8.5 release timetable, again