Re: silent data loss with ext4 / all current versions

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: silent data loss with ext4 / all current versions
Date: 2015-11-29 13:38:01
Message-ID: CAMsr+YHvML=eFiPPVF1Vcc+QPxojACo4gfbA_vkD=jXw_1yYAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 27 November 2015 at 21:28, Greg Stark <stark(at)mit(dot)edu> wrote:

> On Fri, Nov 27, 2015 at 11:17 AM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> > I plan to do more power failure testing soon, with more complex test
> > scenarios. I suspect there might be other similar issues (e.g. when we
> > rename a file before a checkpoint and don't fsync the directory - then
> the
> > rename won't be replayed and will be lost).
>
> I'm curious how you're doing this testing. The easiest way I can think
> of would be to run a database on an LVM volume and take a large number
> of LVM snapshots very rapidly and then see if the database can start
> up from each snapshot. Bonus points for keeping track of the committed
> transactions before each snaphsot and ensuring they're still there I
> guess.
>

I've had a few tries at implementing a qemu-based crashtester where it hard
kills the qemu instance at a random point then starts it back up.

I always got stuck on the validation part - actually ensuring that the DB
state is how we expect. I think I could probably get that right now, it's
been a while.

The VM can be started back up and killed again over and over quite quickly.

It's not as good as physical plug-pull, but it's a lot more practical.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2015-11-29 13:41:26 Re: silent data loss with ext4 / all current versions
Previous Message Pavel Stehule 2015-11-29 13:12:29 Re: proposal: PL/Pythonu - function ereport