Re: silent data loss with ext4 / all current versions

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: silent data loss with ext4 / all current versions
Date: 2016-03-08 07:21:45
Message-ID: CAB7nPqRDwya5Qj4w0b5mu9Eac4AS1ykUU+WD_7z0zip0HsOhQg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 8, 2016 at 2:55 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2016-03-08 12:26:34 +0900, Michael Paquier wrote:
>> On Tue, Mar 8, 2016 at 12:18 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> > On 2016-03-08 12:01:18 +0900, Michael Paquier wrote:
>> >> I have spent a couple of hours looking at that in details, and the
>> >> patch is neat.
>> >
>> > Cool. Doing some more polishing right now. Will be back with an updated
>> > version soonish.
>> >
>> > Did you do some testing?
>>
>> Not much in details yet, I just ran a check-world with fsync enabled
>> for the recovery tests, plus quick manual tests with a cluster
>> manually set up. I'll do more with your new version now that I know
>> there will be one.
>
> Here's my updated version.
>
> Note that I've split the patch into two. One for the infrastructure, and
> one for the callsites.

Thanks for the updated patches and the split, this makes things easier
to look at. I have been doing some testing as well mainly manually
using with pgbench and nothing looks broken.

+ durable_link_or_rename(tmppath, path, ERROR);
+ durable_rename(path, xlogfpath, ERROR);
You may want to add a (void) cast in front of those calls for correctness.

- ereport(LOG,
- (errcode_for_file_access(),
- errmsg("could not link file \"%s\" to \"%s\"
(initialization of log file): %m",
- tmppath, path)));
We lose a portion of the error message here, but with the file name
that's easy to guess where that is happening. I am not complaining
(that's fine to me as-is), just mentioning for the archive's sake.

>> >> + /* XXX: Add racy file existence check? */
>> >> + if (rename(oldfile, newfile) < 0)
>> >
>> >> I am not sure we should worry about that, what do you think could
>> >> cause the old file from going missing all of a sudden. Other backend
>> >> processes are not playing with it in the code paths where this routine
>> >> is called. Perhaps adding a comment in the header to let users know
>> >> that would help?
>> >
>> > What I'm thinking of is adding a check whether the *target* file already
>> > exists, and error out in that case. Just like the link() based path
>> > normally does.
>>
>> Ah, OK. Well, why not. I'd rather have an assertion instead of an error though.
>
> I think it should definitely be an error if anything. But I'd rather
> only add it in master...

I guess I know why :) That's also why I was thinking about an assertion.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-03-08 07:24:54 Re: silent data loss with ext4 / all current versions
Previous Message Peter Geoghegan 2016-03-08 06:57:38 Re: Using quicksort for every external sort run