Re: silent data loss with ext4 / all current versions

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: silent data loss with ext4 / all current versions
Date: 2016-03-05 13:25:36
Message-ID: CAB7nPqSO84Hnb29zAzsaRZAv1O6pgFd3c8LFwr63Tywa=XO0mw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 5, 2016 at 7:47 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2016-03-05 07:43:00 +0900, Michael Paquier wrote:
>> On Sat, Mar 5, 2016 at 7:35 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> > On 2016-03-04 14:51:50 +0900, Michael Paquier wrote:
>> >> On Fri, Mar 4, 2016 at 4:06 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> >> Hm. OK. I don't see any reason why switching to link() even in code
>> >> paths like KeepFileRestoredFromArchive() or pgarch_archiveDone() would
>> >> be a problem thinking about it. Should HAVE_WORKING_LINK be available
>> >> on a platform we can combine it with unlink. Is that in line with what
>> >> you think?
>> >
>> > I wasn't trying to suggest we should replace all rename codepaths with
>> > the link wrapper, just the ones that already have a HAVE_WORKING_LINK
>> > check. The name of the routine I suggested is bad though...
>>
>> So we'd introduce a first routine rename_or_link_safe(), say replace_safe().
>
> Or actually maybe just link_safe(), which falls back to access() &&
> rename() if !HAVE_WORKING_LINK.
>
>> > That's one approach, yes. Combined with the fact that you can't actually
>> > reliably rename across directories, the two could be on different
>> > filesystems after all, that'd be a suitable defense. It just needs to be
>> > properly documented in the function header, not at the bottom.
>>
>> OK. Got it. Or the two could be on the same filesystem.
>
>> Still, link() and rename() do not support doing their stuff on
>> different filesystems (EXDEV).
>
> That's my point ...

OK, I hacked a v7:
- Move the link()/rename() group with HAVE_WORKING_LINK into a single
routine, making the previous link_safe renamed to replace_safe. This
is sharing a lot of things with rename_safe. I am not sure it is worth
complicating the code more this way by having a common single routine
for whole. Thoughts welcome. Honestly, I kind of liked the separation
with link_safe/rename_safe of previous patches because link_safe could
have been directly used by extensions and plugins btw.
- Remove the call of stat() in rename_safe() and implement a logic
depending on OpenTransientFile()/pg_fsync() to flush any existing
target file before performing the rename.
Andres, feel free to use this patch as a base, perhaps that will help.
--
Michael

Attachment Content-Type Size
xlog-fsync-v7.patch application/x-patch 14.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2016-03-05 14:25:20 Re: Freeze avoidance of very large table.
Previous Message Greg Stark 2016-03-05 13:03:26 Re: Static code checker research worth investigating (Communications of the ACM, 03/2016, Vol. 59, No. 03, p. 99)