Re: silent data loss with ext4 / all current versions

From: Andres Freund <andres(at)anarazel(dot)de>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: silent data loss with ext4 / all current versions
Date: 2016-02-01 16:07:06
Message-ID: 20160201160706.GJ8743@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016-01-25 16:30:47 +0900, Michael Paquier wrote:
> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> index a2846c4..b124f90 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -3278,6 +3278,14 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
> tmppath, path)));
> return false;
> }
> +
> + /*
> + * Make sure the rename is permanent by fsyncing the parent
> + * directory.
> + */
> + START_CRIT_SECTION();
> + fsync_fname(XLOGDIR, true);
> + END_CRIT_SECTION();
> #endif

Hm. I'm seriously doubtful that using critical sections for this is a
good idea. What's the scenario you're protecting against by declaring
this one? We intentionally don't error out in the isdir cases in
fsync_fname() cases anyway.

Afaik we need to fsync tmppath before and after the rename, and only
then the directory, to actually be safe.

> @@ -5297,6 +5313,9 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
> errmsg("could not rename file \"%s\" to \"%s\": %m",
> RECOVERY_COMMAND_FILE, RECOVERY_COMMAND_DONE)));
>
> + /* Make sure the rename is permanent by fsyncing the data directory. */
> + fsync_fname(".", true);
> +

Shouldn't RECOVERY_COMMAND_DONE be fsynced first here?

> ereport(LOG,
> (errmsg("archive recovery complete")));
> }
> @@ -6150,6 +6169,12 @@ StartupXLOG(void)
> TABLESPACE_MAP, BACKUP_LABEL_FILE),
> errdetail("Could not rename file \"%s\" to \"%s\": %m.",
> TABLESPACE_MAP, TABLESPACE_MAP_OLD)));
> +
> + /*
> + * Make sure the rename is permanent by fsyncing the data
> + * directory.
> + */
> + fsync_fname(".", true);
> }

Is it just me, or are the repeated four line comments a bit grating?

> /*
> @@ -6525,6 +6550,13 @@ StartupXLOG(void)
> TABLESPACE_MAP, TABLESPACE_MAP_OLD)));
> }
>
> + /*
> + * Make sure the rename is permanent by fsyncing the parent
> + * directory.
> + */
> + if (haveBackupLabel || haveTblspcMap)
> + fsync_fname(".", true);
> +

Isn't that redundant with the haveTblspcMap case above?

> /* Check that the GUCs used to generate the WAL allow recovery */
> CheckRequiredParameterValues();
>
> @@ -7305,6 +7337,12 @@ StartupXLOG(void)
> errmsg("could not rename file \"%s\" to \"%s\": %m",
> origpath, partialpath)));
> XLogArchiveNotify(partialfname);
> +
> + /*
> + * Make sure the rename is permanent by fsyncing the parent
> + * directory.
> + */
> + fsync_fname(XLOGDIR, true);

.partial should be fsynced first.

> }
> }
> }
> @@ -10905,6 +10943,9 @@ CancelBackup(void)
> BACKUP_LABEL_FILE, BACKUP_LABEL_OLD,
> TABLESPACE_MAP, TABLESPACE_MAP_OLD)));
> }
> +
> + /* fsync the data directory to persist the renames */
> + fsync_fname(".", true);
> }

Same.

> /*
> diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
> index 277c14a..8dda80b 100644
> --- a/src/backend/access/transam/xlogarchive.c
> +++ b/src/backend/access/transam/xlogarchive.c
> @@ -477,6 +477,12 @@ KeepFileRestoredFromArchive(char *path, char *xlogfname)
> path, xlogfpath)));
>
> /*
> + * Make sure the renames are permanent by fsyncing the parent
> + * directory.
> + */
> + fsync_fname(XLOGDIR, true);

Afaics the file under the temporary filename has not been fsynced up to
here.

> + /*
> * Create .done file forcibly to prevent the restored segment from being
> * archived again later.
> */
> @@ -586,6 +592,11 @@ XLogArchiveForceDone(const char *xlog)
> errmsg("could not rename file \"%s\" to \"%s\": %m",
> archiveReady, archiveDone)));
>
> + /*
> + * Make sure the rename is permanent by fsyncing the parent
> + * directory.
> + */
> + fsync_fname(XLOGDIR "/archive_status", true);
> return;
> }

Afaics the AllocateFile() call below is not protected at all, no?

How about introducing a 'safe_rename()' that does something roughly akin
to:
fsync(oldname);
fsync(fname) || true;
rename(oldfname, fname);
fsync(fname);
fsync(basename(fname));

I'd rather have that kind of logic somewhere once, instead of repeated a
dozen times...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-02-01 16:08:39 Re: silent data loss with ext4 / all current versions
Previous Message Alvaro Herrera 2016-02-01 15:49:46 Re: silent data loss with ext4 / all current versions