Re: fsync reliability

From: Daniel Farina <daniel(at)heroku(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: fsync reliability
Date: 2011-04-25 20:21:57
Message-ID: BANLkTimE3znct+OacMpGUwa_zffXGeu6Ng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 25, 2011 at 8:26 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> On 04/24/2011 10:06 PM, Daniel Farina wrote:
>>
>> On Thu, Apr 21, 2011 at 8:51 PM, Greg Smith<greg(at)2ndquadrant(dot)com>  wrote:
>>
>>>
>>> There's still the "fsync'd a data block but not the directory entry yet"
>>> issue as fall-out from this too.  Why doesn't PostgreSQL run into this
>>> problem?  Because the exact code sequence used is this one:
>>>
>>> open
>>> write
>>> fsync
>>> close
>>>
>>> And Linux shouldn't ever screw that up, or the similar rename path.
>>>  Here's
>>> what the close man page says, from http://linux.die.net/man/2/close :
>>>
>>
>> Theodore Ts'o addresses this *exact* sequence of events, and suggests
>> if you want that rename to definitely stick that you must fsync the
>> directory:
>>
>>
>> http://www.linuxfoundation.org/news-media/blogs/browse/2009/03/don%E2%80%99t-fear-fsync
>>
>
> Not exactly.  That's talking about the sequence used for creating a file,
> plus a rename.  When new WAL files are being created, I believe the ugly
> part of this is avoided.  The path when WAL files are recycled using rename
> does seem to be the one with the most likely edge case.

Hmm, how do we avoid this in the creation case? My current
anticipation is there are cases where you can do open(afile), write(),
fsync(), crash and the file will not be linked, or at the very least,
is *entitled* to not be linked to its parent directory.

The recycling case also sucks.

Would it be insane to use the MTA approach and just use chattr +D? That also
models the behavior on other systems with synchronous directory
modifications, of which (maybe? could very well be wrong) BSD is
included.

--
fdr

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-04-25 20:22:41 pgsql: Fix pg_size_pretty() to avoid overflow for inputs close to INT64
Previous Message Greg Stark 2011-04-25 20:13:58 Re: Foreign table permissions and cloning