Re: silent data loss with ext4 / all current versions

From: Andres Freund <andres(at)anarazel(dot)de>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: silent data loss with ext4 / all current versions
Date: 2016-03-03 19:06:14
Message-ID: 20160303190614.7pvivlxyttgd3xcr@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

> +/*
> + * rename_safe -- rename of a file, making it on-disk persistent
> + *
> + * This routine ensures that a rename file persists in case of a crash by using
> + * fsync on the old and new files before and after performing the rename so as
> + * this categorizes as an all-or-nothing operation.
> + */
> +int
> +rename_safe(const char *oldfile, const char *newfile)
> +{
> + struct stat filestats;
> +
> + /*
> + * First fsync the old entry and new entry, it this one exists, to ensure
> + * that they are properly persistent on disk. Calling this routine with
> + * an existing new target file is fine, rename() will first remove it
> + * before performing its operation.
> + */
> + fsync_fname(oldfile, false);
> + if (stat(newfile, &filestats) == 0)
> + fsync_fname(newfile, false);

I don't think we want any stat()s here. I'd much, much rather check open
for ENOENT.

> +/*
> + * link_safe -- make a file hard link, making it on-disk persistent
> + *
> + * This routine ensures that a hard link created on a file persists on system
> + * in case of a crash by using fsync where on the link generated as well as on
> + * its parent directory.
> + */
> +int
> +link_safe(const char *oldfile, const char *newfile)
> +{

If we go for a new abstraction here, I'd rather make it
'replace_file_safe' or something, and move the link/rename code #ifdef
into it.

> + if (link(oldfile, newfile) < 0)
> + return -1;
> +
> + /*
> + * Make the link persistent in case of an OS crash, the new entry
> + * generated as well as its parent directory need to be flushed.
> + */
> + fsync_fname(newfile, false);
> +
> + /*
> + * Same for parent directory. This routine is never called to rename
> + * files across directories, but if this proves to become the case,
> + * flushing the parent directory if the old file would be necessary.
> + */
> + fsync_parent_path(newfile);
> + return 0;

I think it's a seriously bad idea to encode that knowledge in such a
general sounding routine. We could however argue that this is about
safely replacing the *target* file; not about safely removing the old
file.

Currently I'm inclined to apply this to master soon. But I think we
might want to wait a while with backpatching. The recent fsync upgrade
disaster kinda makes me a bit careful...

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2016-03-03 19:16:35 Re: improving GROUP BY estimation
Previous Message Alexander Korotkov 2016-03-03 19:05:09 Re: improving GROUP BY estimation