Re: silent data loss with ext4 / all current versions

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: silent data loss with ext4 / all current versions
Date: 2015-12-01 22:00:10
Message-ID: 565E186A.1070608@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/01/2015 10:44 PM, Peter Eisentraut wrote:
> On 11/27/15 8:18 AM, Michael Paquier wrote:
>> On Fri, Nov 27, 2015 at 8:17 PM, Tomas Vondra
>> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>>>> So, what's going on? The problem is that while the rename() is atomic, it's
>>>> not guaranteed to be durable without an explicit fsync on the parent
>>>> directory. And by default we only do fdatasync on the recycled segments,
>>>> which may not force fsync on the directory (and ext4 does not do that,
>>>> apparently).
>> Yeah, that seems to be the way the POSIX spec clears things.
>> "If _POSIX_SYNCHRONIZED_IO is defined, the fsync() function shall
>> force all currently queued I/O operations associated with the file
>> indicated by file descriptor fildes to the synchronized I/O completion
>> state. All I/O operations shall be completed as defined for
>> synchronized I/O file integrity completion."
>> http://pubs.opengroup.org/onlinepubs/009695399/functions/fsync.html
>> If I understand that right, it is guaranteed that the rename() will be
>> atomic, meaning that there will be only one file even if there is a
>> crash, but that we need to fsync() the parent directory as mentioned.
>
> I don't see anywhere in the spec that a rename needs an fsync of the
> directory to be durable. I can see why that would be needed in
> practice, though. File system developers would probably be able to
> give a more definite answer.

Yeah, POSIX is the smallest common denominator. In this case the spec
seems not to require this durability guarantee (rename without fsync on
directory), which allows a POSIX-compliant filesystem.

At least that's my conclusion from reading https://lwn.net/Articles/322823/

However, as I explained in the original post, it's more complicated as
this only seems to be problem with fdatasync. I've been unable to
reproduce the issue with wal_sync_method=fsync.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2015-12-01 22:05:52 Re: silent data loss with ext4 / all current versions
Previous Message Robert Haas 2015-12-01 21:59:08 Re: Re: Multixact slru doesn't don't force WAL flushes in SlruPhysicalWritePage()