Re: silent data loss with ext4 / all current versions

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: silent data loss with ext4 / all current versions
Date: 2016-03-08 03:01:18
Message-ID: CAB7nPqRj2Dh9xdi=4BcG=0iQ+OmerDCZBuhUW5QmYmOkx4h4Uw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 7, 2016 at 3:38 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2016-03-05 19:54:05 -0800, Andres Freund wrote:
>> I started working on this; delayed by taking longer than planned on the
>> logical decoding stuff (quite a bit complicated by
>> e1a11d93111ff3fba7a91f3f2ac0b0aca16909a8). I'm not very happy with the
>> error handling as it is right now. For one, you have rename_safe return
>> error codes, and act on them in the callers, but on the other hand you
>> call fsync_fname which always errors out in case of failure. I also
>> don't like the new messages much.
>>
>> Will continue working on this tomorrow.
>
> So, here's my current version of this. I've not performed any testing
> yet, and it's hot of the press. There's some comment smithing
> needed. But otherwise I'm starting to like this.
>
> Changes:
> * renamed rename_safe to durable_rename
> * renamed replace_safe to durable_link_or_rename (there was never any
> replacing going on)
> * pass through elevel to the underlying routines, otherwise we could
> error out with ERROR when we don't want to. That's particularly
> important in case of things like InstallXLogFileSegment().
> * made fsync_fname use fsync_fname_ext, add 'accept permission errors'
> param
> * have walkdir call a wrapper, to add ignore_perms param
>
> What do you think?

I have spent a couple of hours looking at that in details, and the
patch is neat.

+ * This routine ensures that, after returning, the effect of renaming file
+ * persists in case of a crash. A crash while this routine is running will
+ * leave you with either the old, or the new file.
"you" is not really Postgres-like, "the server" or "the backend" perhaps?

+ /* XXX: perform close() before? might be outside a
transaction. Consider errno! */
ereport(elevel,
(errcode_for_file_access(),
errmsg("could not fsync file \"%s\": %m", fname)));
+ (void) CloseTransientFile(fd);
+ return -1;
close() should be called before. slot.c for example does so and we
don't want to link an fd here in case of elevel >= ERROR.

+ * It does so by using fsync on the sourcefile before the rename, and the
+ * target file and directory after.
fsync is issued as well on the target file if it exists. I think
that's worth mentioning in the header.

+ /* XXX: Add racy file existence check? */
+ if (rename(oldfile, newfile) < 0)
I am not sure we should worry about that, what do you think could
cause the old file from going missing all of a sudden. Other backend
processes are not playing with it in the code paths where this routine
is called. Perhaps adding a comment in the header to let users know
that would help?

Instead of "durable" I think that "persistent" makes more sense. We
want to make those renames persistent on disk on case of a crash. So I
would suggest the following routine names:
- rename_persistent
- rename_or_link_persistent
Having the verb first also helps identifying that this is a
system-level, rename()-like, routine.

> I sure wish we had the recovery testing et al. in all the back
> branches...

Sure, what we have now is something that should really be backpatched,
I was just waiting to have all the existing stability issues
addressed, the last one on my agenda being the failure of hamster for
test 005 I mentioned in another thread before sending patches for
other branches. I proposed a couple of potential regarding that
actually, see here:
http://www.postgresql.org/message-id/CAB7nPqSAZ9HnUcMoUa30JO2wJ8MnREm18p2a7McRA-ZrJxj3Vw@mail.gmail.com
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-03-08 03:06:45 Re: Badly designed error reporting code in controldata_utils.c
Previous Message Tom Lane 2016-03-08 02:53:18 Re: Typo in logicaldecoding docs