Re: fsync reliability

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: fsync reliability
Date: 2011-04-25 15:26:33
Message-ID: 4DB592A9.3080307@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/24/2011 10:06 PM, Daniel Farina wrote:
> On Thu, Apr 21, 2011 at 8:51 PM, Greg Smith<greg(at)2ndquadrant(dot)com> wrote:
>
>> There's still the "fsync'd a data block but not the directory entry yet"
>> issue as fall-out from this too. Why doesn't PostgreSQL run into this
>> problem? Because the exact code sequence used is this one:
>>
>> open
>> write
>> fsync
>> close
>>
>> And Linux shouldn't ever screw that up, or the similar rename path. Here's
>> what the close man page says, from http://linux.die.net/man/2/close :
>>
> Theodore Ts'o addresses this *exact* sequence of events, and suggests
> if you want that rename to definitely stick that you must fsync the
> directory:
>
> http://www.linuxfoundation.org/news-media/blogs/browse/2009/03/don%E2%80%99t-fear-fsync
>

Not exactly. That's talking about the sequence used for creating a
file, plus a rename. When new WAL files are being created, I believe
the ugly part of this is avoided. The path when WAL files are recycled
using rename does seem to be the one with the most likely edge case.

The difficult case Tso's discussion is trying to satisfy involves
creating a new file and then swapping it for an old one atomically.
PostgreSQL never does that exactly. It creates new files, pads them
with zeros, and then starts writing to them; it also renames old files
that are already of the correctly length. Combined with the fact that
there are always fsyncs after writes to the files, and this case really
isn't exactly the same as any of the others people are complaining about.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christopher Browne 2011-04-25 15:32:44 Re: branching for 9.2devel
Previous Message Tom Lane 2011-04-25 15:22:49 Re: make check in contrib