Re: silent data loss with ext4 / all current versions

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: silent data loss with ext4 / all current versions
Date: 2016-03-04 05:51:50
Message-ID: CAB7nPqQW_xPw==hk+Ei_PxNsKX9PX1vjoxjr8auK9EwPn4FgtA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 4, 2016 at 4:06 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,

Thanks for the review.

>> +/*
>> + * rename_safe -- rename of a file, making it on-disk persistent
>> + *
>> + * This routine ensures that a rename file persists in case of a crash by using
>> + * fsync on the old and new files before and after performing the rename so as
>> + * this categorizes as an all-or-nothing operation.
>> + */
>> +int
>> +rename_safe(const char *oldfile, const char *newfile)
>> +{
>> + struct stat filestats;
>> +
>> + /*
>> + * First fsync the old entry and new entry, it this one exists, to ensure
>> + * that they are properly persistent on disk. Calling this routine with
>> + * an existing new target file is fine, rename() will first remove it
>> + * before performing its operation.
>> + */
>> + fsync_fname(oldfile, false);
>> + if (stat(newfile, &filestats) == 0)
>> + fsync_fname(newfile, false);
>
> I don't think we want any stat()s here. I'd much, much rather check open
> for ENOENT.

OK. So you mean more or less that, right?
int fd;
fd = OpenTransientFile(newfile, PG_BINARY | O_RDONLY, 0);
if (fd < 0)
{
if (errno != ENOENT)
return -1;
}
else
{
pg_fsync(fd);
CloseTransientFile(fd);
}

>> +/*
>> + * link_safe -- make a file hard link, making it on-disk persistent
>> + *
>> + * This routine ensures that a hard link created on a file persists on system
>> + * in case of a crash by using fsync where on the link generated as well as on
>> + * its parent directory.
>> + */
>> +int
>> +link_safe(const char *oldfile, const char *newfile)
>> +{
>
> If we go for a new abstraction here, I'd rather make it
> 'replace_file_safe' or something, and move the link/rename code #ifdef
> into it.

Hm. OK. I don't see any reason why switching to link() even in code
paths like KeepFileRestoredFromArchive() or pgarch_archiveDone() would
be a problem thinking about it. Should HAVE_WORKING_LINK be available
on a platform we can combine it with unlink. Is that in line with what
you think?

>> + if (link(oldfile, newfile) < 0)
>> + return -1;
>> +
>> + /*
>> + * Make the link persistent in case of an OS crash, the new entry
>> + * generated as well as its parent directory need to be flushed.
>> + */
>> + fsync_fname(newfile, false);
>> +
>> + /*
>> + * Same for parent directory. This routine is never called to rename
>> + * files across directories, but if this proves to become the case,
>> + * flushing the parent directory if the old file would be necessary.
>> + */
>> + fsync_parent_path(newfile);
>> + return 0;
>
> I think it's a seriously bad idea to encode that knowledge in such a
> general sounding routine. We could however argue that this is about
> safely replacing the *target* file; not about safely removing the old
> file.

Not sure I am following here. Are you referring to the fact that if
the new file and old file are on different directories would make this
routine unreliable? Because yes that's the case if we want to make
both of them persistent, and I think we want to do so. Do you suggest
to correct this comment to remove the mention to the old file's parent
directory because we just care about having the new file as being
persistent? Or do you suggest that we should actually extend this
routine so as we fsync both the new and old file's parent directory if
they differ?

> Currently I'm inclined to apply this to master soon. But I think we
> might want to wait a while with backpatching. The recent fsync upgrade
> disaster kinda makes me a bit careful...

There have not been actual field complaints about that yet. That's
fine for me to wait a bit.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-03-04 05:59:49 Re: Patch: fix lock contention for HASHHDR.mutex
Previous Message Tom Lane 2016-03-04 05:46:04 Re: postgres_fdw vs. force_parallel_mode on ppc