Re: silent data loss with ext4 / all current versions

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, dpage(at)pgadmin(dot)org, Guillaume Lelarge <guillaume(dot)lelarge(at)dalibo(dot)com>
Subject: Re: silent data loss with ext4 / all current versions
Date: 2016-03-17 14:05:42
Message-ID: CAB7nPqRnOw30gOXe2_SPLjh37bgm4V+txbYAPwoXb97nGQ297w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 16, 2016 at 2:46 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2016-03-15 15:39:50 +0100, Michael Paquier wrote:
>> Yeah, true. We definitely need to do something for that, even for HEAD
>> it seems like an overkill to have something in for example src/common
>> to allow frontends to have something if the fix is localized
>> (pg_rewind may use something else), and it would be nice to finish
>> wrapping that for the next minor release, so I propose the attached
>> patches. At the same time, I think that adminpack had better be fixed
>> as well, so there are actually three patches in this series, things
>> that I shaped thinking about a backpatch btw, particularly for 0002.
>
> I'm doubtful about "fixing" adminpack. We don't know how it's used, and
> this could *seriously* increase its overhead for something potentially
> used at a high rate.

I think that Dave or Guillaume added here in CC could bring some light
on the matter. Let's see if that's a problem for them. I would tend to
think that it is not that critical, still I would imagine that this
function is not called at a high frequency.

>> /*
>> + * Wrapper of rename() similar to the backend version with the same function
>> + * name aimed at making the renaming durable on disk. Note that this version
>> + * does not fsync the old file before the rename as all the code paths leading
>> + * to this function are already doing this operation. The new file is also
>> + * normally not present on disk before the renaming so there is no need to
>> + * bother about it.
>> + */
>> +static int
>> +durable_rename(const char *oldfile, const char *newfile)
>> +{
>> + int fd;
>> + char parentpath[MAXPGPATH];
>> +
>> + if (rename(oldfile, newfile) != 0)
>> + {
>> + /* durable_rename produced a log entry */
>> + fprintf(stderr, _("%s: could not rename file \"%s\": %s\n"),
>> + progname, current_walfile_name, strerror(errno));
>> + return -1;
>> + }
>> +
>> + /*
>> + * To guarantee renaming of the file is persistent, fsync the file with its
>> + * new name, and its containing directory.
>> + */
>> + fd = open(newfile, O_RDWR | PG_BINARY, S_IRUSR | S_IWUSR);
>
> Why is S_IRUSR | S_IWUSR specified here?

Oops. I have removed it and updated that as attached.

> Are you working on a fix for pg_rewind? Let's go with initdb -S in a
> first iteration, then we can, if somebody is interest enough, work on
> making this nicer in master.

I am really -1 for this approach. Wrapping initdb -S with
find_other_exec is intrusive in back-branches knowing that all the I/O
write operations manipulating file descriptors go through file_ops.c,
and we actually just need to fsync the target file in
close_target_file(), making the fix being a 7-line patch, and there is
no need to depend on another binary at all. The backup label file, as
well as the control file are using the same code path in file_ops.c...
And I like simple things :)

At the same time, I found a legit bug when the modified backup_label
file is created in createBackupLabel: the file is opened, written, but
not closed with close_target_file(), and it should be.
--
Michael

Attachment Content-Type Size
0001-Close-file-descriptor-associated-to-backup_label-cor.patch binary/octet-stream 924 bytes
0002-fsync-properly-files-modified-by-pg_rewind.patch binary/octet-stream 1.0 KB
0003-Avoid-potential-data-loss-in-pg_receivexlog.patch binary/octet-stream 4.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-03-17 14:10:25 Re: RFC: replace pg_stat_activity.waiting with something more descriptive
Previous Message Robert Haas 2016-03-17 14:01:49 Re: WIP: Upper planner pathification