Re: silent data loss with ext4 / all current versions

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: silent data loss with ext4 / all current versions
Date: 2016-03-04 22:43:00
Message-ID: CAB7nPqRFz_g3YxJX=oWzqiW-tky+yhVPkaCy9GL1pHVPb8ZLKg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 5, 2016 at 7:35 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2016-03-04 14:51:50 +0900, Michael Paquier wrote:
>> On Fri, Mar 4, 2016 at 4:06 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> > I don't think we want any stat()s here. I'd much, much rather check open
>> > for ENOENT.
>>
>> OK. So you mean more or less that, right?
>> int fd;
>> fd = OpenTransientFile(newfile, PG_BINARY | O_RDONLY, 0);
>> if (fd < 0)
>> {
>> if (errno != ENOENT)
>> return -1;
>> }
>> else
>> {
>> pg_fsync(fd);
>> CloseTransientFile(fd);
>> }
>
> Yes. Otherwise the check is racy: The file could be gone by the time you
> do the fsync; leading to a spurious ERROR (which often would get
> promoted to a PANIC).

Yeah, that makes sense.

>> >> +/*
>> >> + * link_safe -- make a file hard link, making it on-disk persistent
>> >> + *
>> >> + * This routine ensures that a hard link created on a file persists on system
>> >> + * in case of a crash by using fsync where on the link generated as well as on
>> >> + * its parent directory.
>> >> + */
>> >> +int
>> >> +link_safe(const char *oldfile, const char *newfile)
>> >> +{
>> >
>> > If we go for a new abstraction here, I'd rather make it
>> > 'replace_file_safe' or something, and move the link/rename code #ifdef
>> > into it.
>>
>> Hm. OK. I don't see any reason why switching to link() even in code
>> paths like KeepFileRestoredFromArchive() or pgarch_archiveDone() would
>> be a problem thinking about it. Should HAVE_WORKING_LINK be available
>> on a platform we can combine it with unlink. Is that in line with what
>> you think?
>
> I wasn't trying to suggest we should replace all rename codepaths with
> the link wrapper, just the ones that already have a HAVE_WORKING_LINK
> check. The name of the routine I suggested is bad though...

So we'd introduce a first routine rename_or_link_safe(), say replace_safe().

>> Do you suggest to correct this comment to remove the mention to the
>> old file's parent directory because we just care about having the new
>> file as being persistent?
>
> That's one approach, yes. Combined with the fact that you can't actually
> reliably rename across directories, the two could be on different
> filesystems after all, that'd be a suitable defense. It just needs to be
> properly documented in the function header, not at the bottom.

OK. Got it. Or the two could be on the same filesystem. Still, link()
and rename() do not support doing their stuff on different filesystems
(EXDEV).
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-03-04 22:46:41 Re: silent data loss with ext4 / all current versions
Previous Message Andres Freund 2016-03-04 22:37:48 Re: silent data loss with ext4 / all current versions