Re: fsync reliability

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: fsync reliability
Date: 2011-04-25 16:00:13
Message-ID: 4DB59A8D.9060004@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/23/2011 09:58 AM, Matthew Woodcraft wrote:
> As far as I can make out, the current situation is that this fix (the
> auto_da_alloc mount option) doesn't work as advertised, and the ext4
> maintainers are not treating this as a bug.
>
> See https://bugzilla.kernel.org/show_bug.cgi?id=15910
>

I agree with the resolution that this isn't a bug. As pointed out
there, XFS does the same thing, and this behavior isn't going away any
time soon. Leaving behind zero-length files in situations where
developers tried to optimize away a necessary fsync happens.

Here's the part where the submitter goes wrong:

"We first added a fsync() call for each extracted file. But scattered
fsyncs resulted in a massive performance degradation during package
installation (factor 10 or more, some reported that it took over an hour
to unpack a linux-headers-* package!) In order to reduce the I/O
performance degradation, fsync calls were deferred..."

Stop right there; the slow path was the only one that had any hope of
being correct. It can actually slow things by a factor of 100X or more,
worst-case. "So, we currently have the choice between filesystem
corruption or major performance loss": yes, you do. Writing files is
tricky and it can either be slow or safe. If you're going to avoid even
trying to enforce the right thing here, you're really going to get
really burned.

It's unfortunate that so many people are used to the speed you get in
the common situation for a while now with ext3 and cheap hard drives:
all writes are cached unsafely, but the filesystem resists a few bad
behaviors. Much of the struggle where people say "this is so much
slower, I won't put up with it" and try to code around it is futile, and
it's hard to separate out the attempts to find such optimizations from
the legitimate complaints.

Anyway, you're right to point out that the filesystem is not necessarily
going to save anyone from some of the tricky rename situations even with
the improvements made to delayed allocation. They've fixed some of the
worst behavior of the earlier implementation, but there are still
potential issues in that area it seems.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2011-04-25 16:00:26 Re: Extension Packaging
Previous Message Aidan Van Dyk 2011-04-25 15:49:40 Re: branching for 9.2devel