Re: silent data loss with ext4 / all current versions

From: Andres Freund <andres(at)anarazel(dot)de>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: silent data loss with ext4 / all current versions
Date: 2016-03-15 17:46:16
Message-ID: 20160315174616.ju5ybldinilz6rxv@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016-03-15 15:39:50 +0100, Michael Paquier wrote:
> I have finally been able to spend some time reviewing what you pushed
> on back-branches, and things are in correct shape I think. One small
> issue that I have is that for EXEC_BACKEND builds, in
> write_nondefault_variables we still use one instance of rename().

Correctly so afaics, because write_nondefault_variables is by definition
non-durable. We write that stuff at every start / SIGHUP. Adding an
fsync there would be an unnecessary slowdown. I don't think it's good
policy adding fsync for stuff that definitely doesn't need it.

> Yeah, true. We definitely need to do something for that, even for HEAD
> it seems like an overkill to have something in for example src/common
> to allow frontends to have something if the fix is localized
> (pg_rewind may use something else), and it would be nice to finish
> wrapping that for the next minor release, so I propose the attached
> patches. At the same time, I think that adminpack had better be fixed
> as well, so there are actually three patches in this series, things
> that I shaped thinking about a backpatch btw, particularly for 0002.

I'm doubtful about "fixing" adminpack. We don't know how it's used, and
this could *seriously* increase its overhead for something potentially
used at a high rate.

> /*
> + * Wrapper of rename() similar to the backend version with the same function
> + * name aimed at making the renaming durable on disk. Note that this version
> + * does not fsync the old file before the rename as all the code paths leading
> + * to this function are already doing this operation. The new file is also
> + * normally not present on disk before the renaming so there is no need to
> + * bother about it.
> + */
> +static int
> +durable_rename(const char *oldfile, const char *newfile)
> +{
> + int fd;
> + char parentpath[MAXPGPATH];
> +
> + if (rename(oldfile, newfile) != 0)
> + {
> + /* durable_rename produced a log entry */
> + fprintf(stderr, _("%s: could not rename file \"%s\": %s\n"),
> + progname, current_walfile_name, strerror(errno));
> + return -1;
> + }
> +
> + /*
> + * To guarantee renaming of the file is persistent, fsync the file with its
> + * new name, and its containing directory.
> + */
> + fd = open(newfile, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR);

Why is S_IRUSR | S_IWUSR specified here?

Are you working on a fix for pg_rewind? Let's go with initdb -S in a
first iteration, then we can, if somebody is interest enough, work on
making this nicer in master.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2016-03-15 17:50:12 Re: silent data loss with ext4 / all current versions
Previous Message Corey Huinker 2016-03-15 17:46:06 Re: Soliciting Feedback on Improving Server-Side Programming Documentation