Re: Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Daniel Farina <daniel(at)heroku(dot)com>, "Harold A(dot) Giménez" <harold(dot)gimenez(at)gmail(dot)com>
Subject: Re: Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)
Date: 2012-07-18 00:31:52
Message-ID: 11099.1342571512@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> writes:
> On 07/18/2012 06:56 AM, Tom Lane wrote:
>> This implies that nobody has done pull-the-plug testing on either HEAD
>> or 9.2 since the checkpointer split went in (2011-11-01)

> That makes me wonder if on top of the buildfarm, extending some
> buildfarm machines into a "crashfarm" is needed:

Not sure if we need a whole "farm", but certainly having at least one
machine testing this sort of stuff on a regular basis would make me feel
a lot better.

> The main challenge would be coming up with suitable tests to run, ones
> that could then be checked to make sure nothing was broken.

One fairly simple test scenario could go like this:

* run the regression tests
* pg_dump the regression database
* run the regression tests again
* hard-kill immediately upon completion
* restart database, allow it to perform recovery
* pg_dump the regression database
* diff previous and new dumps; should be the same

The main thing this wouldn't cover is discrepancies in user indexes,
since pg_dump doesn't do anything that's likely to result in indexscans
on user tables. It ought to be enough to detect the sort of system-wide
problem we're talking about here, though.

In general I think the hard part is automated reproduction of an
OS-crash scenario, but your ideas about how to do that sound promising.
Once we have that going, it shouldn't be hard to come up with tests
of the form "do X, hard-crash, recover, check X still looks sane".

> What else should be checked? The main thing that comes to mind for me is
> something I've worried about for a while: that Pg might not always
> handle out-of-disk-space anywhere near as gracefully as it's often
> claimed to.

+1

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mike Wilson 2012-07-18 00:41:05 Re: BUG #6733: All Tables Empty After pg_upgrade (PG 9.2.0 beta 2)
Previous Message Craig Ringer 2012-07-18 00:13:19 Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2012-07-18 01:52:11 Linux memory zone reclaim
Previous Message Craig Ringer 2012-07-18 00:13:19 Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)