Re: fsync reliability

From: Daniel Farina <daniel(at)heroku(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: fsync reliability
Date: 2011-04-25 01:14:55
Message-ID: BANLkTimahmfL+Hefeti_Do0Kv0CMh+dCiw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 21, 2011 at 1:26 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> Daniel Farina points out to me that the Linux man page for fsync() says
> "Calling fsync() does not necessarily ensure that the entry in the directory
>       containing the file has also reached disk.  For that an
> explicit fsync() on a
>       file descriptor for the directory is also needed."
> http://www.kernel.org/doc/man-pages/online/pages/man2/fsync.2.html

I'd also like to point out that even on ext(2|3) there is a special
option, 'dirsync', and directory attribute (see 'chattr') that exists,
mostly to the benefit of the authors of MTAs that use a lot of
metadata manipulation operations, to allow all directory metadata
mangling to be synchronous, to get around non-durable metadata
manipulations (even if you use fsync() a crash between the rename()
and the fsync() will leave you in either the pre-move or post-move
state: it is atomic, and non-durable, the synchronous directory
modification ensures that the return of rename() coincides with the
durability of the rename itself, or so I would think.

I only found this from doing some research about how perform a
two-phase commit between postgres and the file system and reading the
kernel source. I admit, it's a dusty and obscure corner, but it still
seems in use by said MTAs.

Would a reading and exploration of the kernel code at hand perhaps
help resolve this discussion, one way or another?

--
fdr

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2011-04-25 01:36:01 Re: Unlogged tables, persistent kind
Previous Message Peter Eisentraut 2011-04-25 00:39:56 Re: Some TODO items for collations