Skip site navigation (1) Skip section navigation (2)

Re: silent data loss with ext4 / all current versions

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, dpage(at)pgadmin(dot)org, Guillaume Lelarge <guillaume(dot)lelarge(at)dalibo(dot)com>
Subject: Re: silent data loss with ext4 / all current versions
Date: 2016-03-28 04:49:46
Message-ID: CAB7nPqSkTuFcCQ38PissdkeoEqaqZQM4zM8WUOLXRA8VsTSvZg@mail.gmail.com (view raw, whole thread or download thread mbox)
Thread:
Lists: pgsql-hackers
On Mon, Mar 28, 2016 at 8:25 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2016-03-18 15:08:32 +0900, Michael Paquier wrote:
>> +             fprintf(stderr, _("%s: could not rename file \"%s\": %s\n"),
>> +                             progname, current_walfile_name, strerror(errno));
>
> current_walfile_name doesn't look right, that's a largely independent
> global variable.

Stupid mistake.

>> +     if (fsync(fd) != 0)
>> +     {
>> +             close(fd);
>> +             fprintf(stderr, _("%s: could not fsync file \"%s\": %s\n"),
>> +                             progname, newfile, strerror(errno));
>> +             return -1;
>
> Close should be after the strerror() call (yes, that mistake already
> existed once in receivelog.c).

Right.

> It'd be easier to apply these if the rate of trivial issues were lower
> :(.

Sorry about that. I'll be more careful.

> Attached is an heavily revised version of the patch. Besides the above,
> I've:
> * copied fsync_fname_ext from initdb, I think it's easier to understand
>   code this way, and it'll make unifying the code easier

OK.

> * added fsyncing of the directory in mark_file_as_archived()
> * The WAL files also need to be fsynced when created in open_walfile():
>   - otherwise the directory entry itself isn't safely persistent, as we
>     don't fsync the directory in the stream->synchronous fsync() cases.
>   - we refuse to resume in open_walfile(), if a file isn't 16MB when
>     restarting. Without an fsync that's actually not unlikely after a
>     crash. Even with an fsync that's not guaranteed not to happen, but
>     the chance of it is much lower.
> I'm too tired to push this at the eleventh hour. Besides a heavily
> revised patch, backpatching will likely include a number of conflicts.
> If somebody in the US has the energy to take care of this...

Close enough to the US. Attached are backpatchable versions based on
the corrected version you sent. 9.3 and 9.4 share the same patch, more
work has been necessary for 9.2 but that's not huge either.

> So we're going to have another round of fsync stuff in the next set of
> releases anyway...

Yes, seeing how 9.5.2 is close by, I think that it would be wiser to
push this stuff after the upcoming minor release.
-- 
Michael

Attachment: pg_receivexlog-sync-94-93.patch
Description: invalid/octet-stream (7.0 KB)
Attachment: pg_receivexlog-sync-92.patch
Description: invalid/octet-stream (6.2 KB)
Attachment: pg_receivexlog-sync-95.patch
Description: invalid/octet-stream (7.2 KB)

In response to

pgsql-hackers by date

Next:From: Amit KapilaDate: 2016-03-28 05:30:43
Subject: Re: Relation extension scalability
Previous:From: David G. JohnstonDate: 2016-03-28 04:28:41
Subject: Re: Nested funtion

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group