Re: fsync reliability

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Greg Smith <greg(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: fsync reliability
Date: 2011-05-09 18:22:24
Message-ID: 201105091822.p49IMOP21362@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


FYI, does wal.c need updated comments to explain the file system
semantics we expect, and how our code triggers it?

---------------------------------------------------------------------------

Greg Smith wrote:
> On 04/23/2011 09:58 AM, Matthew Woodcraft wrote:
> > As far as I can make out, the current situation is that this fix (the
> > auto_da_alloc mount option) doesn't work as advertised, and the ext4
> > maintainers are not treating this as a bug.
> >
> > See https://bugzilla.kernel.org/show_bug.cgi?id=15910
> >
>
> I agree with the resolution that this isn't a bug. As pointed out
> there, XFS does the same thing, and this behavior isn't going away any
> time soon. Leaving behind zero-length files in situations where
> developers tried to optimize away a necessary fsync happens.
>
> Here's the part where the submitter goes wrong:
>
> "We first added a fsync() call for each extracted file. But scattered
> fsyncs resulted in a massive performance degradation during package
> installation (factor 10 or more, some reported that it took over an hour
> to unpack a linux-headers-* package!) In order to reduce the I/O
> performance degradation, fsync calls were deferred..."
>
> Stop right there; the slow path was the only one that had any hope of
> being correct. It can actually slow things by a factor of 100X or more,
> worst-case. "So, we currently have the choice between filesystem
> corruption or major performance loss": yes, you do. Writing files is
> tricky and it can either be slow or safe. If you're going to avoid even
> trying to enforce the right thing here, you're really going to get
> really burned.
>
> It's unfortunate that so many people are used to the speed you get in
> the common situation for a while now with ext3 and cheap hard drives:
> all writes are cached unsafely, but the filesystem resists a few bad
> behaviors. Much of the struggle where people say "this is so much
> slower, I won't put up with it" and try to code around it is futile, and
> it's hard to separate out the attempts to find such optimizations from
> the legitimate complaints.
>
> Anyway, you're right to point out that the filesystem is not necessarily
> going to save anyone from some of the tricky rename situations even with
> the improvements made to delayed allocation. They've fixed some of the
> worst behavior of the earlier implementation, but there are still
> potential issues in that area it seems.
>
> --
> Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
> PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-05-09 18:31:33 Re: Why not install pgstattuple by default?
Previous Message Robert Haas 2011-05-09 18:18:51 Re: Formatting Curmudgeons WAS: MMAP Buffers