Re: pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)
Date: 2016-09-23 15:15:22
Message-ID: CAB7nPqQL0fCp0eDcVD6+3+Je24xeApU14vKz_pBpNA0sTPwLgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 23, 2016 at 10:54 AM, Peter Eisentraut
<peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
> This is looking pretty good. More comments on this patch set:

Thanks for the review.

> 0001:
>
> Keep the file order alphabetical in Mkvcbuild.pm.
>
> Needs nls.mk updates for file move (in initdb and pg_basebackup
> directories).

Fixed.

> 0002:
>
> durable_rename needs close(fd) in non-error path (compare backend code).

Oops, leak.

> Changing from fsync() to fsync_name() in some cases means that
> inaccessible files are now ignored. I guess this would only happen in
> very obscure circumstances, but it's worth considering if that is OK.

In writeTimeLineHistoryFile() that's fine, the user should have
ownership of it as it has been created by receivelog.c. Similar remark
for .
In mark_file_as_archived, the previous open() call would have just failed.

> You added this comment:
> * XXX: This means that we might not restart if a crash occurs
> before the
> * fsync below. We probably should create the file in a temporary path
> * like the backend does...
> pg_receivexlog uses the .partial suffix for this reason. Why doesn't
> pg_basebackup do that?

Because it matters for pg_receivexlog as it has a retry logic and it
is able to rework on a partial segment. Thinking more about that this
comment does not make much sense, because for pg_basebackup a crash
would just cause the backup to be incomplete, so I have removed it.

> In open_walfile, could we move the fsync calls before the fstat or
> somewhere around there so we don't have to make the same calls in two
> different branches?

We could refactor a bit the code so as follows:
if (size == 16MB)
{
walfile = f;
}
else if (size == 0)
{
//do padding and lseek
}
else
error();
fsync();
I am not sure if this gains in clarity though, perhaps I looked too
much the current code.

> 0003:
>
> There was a discussion about renaming the --nosync option in initdb to
> --no-sync, but until that is done, the option in pg_basebackup should
> stay what initdb has right now.

Changed.

> There is a whitespace alignment error in usage().

Not sure I follow here.

> The -N option should be listed after -n.

In the switch()? Fixed.

> Some fsync calls are not covered by a do_sync conditional. I see some
> in close_walfile(), HandleCopyStream(), ProcessKeepaliveMsg().

Right. I looked at the rest and did not notice any extra mistakes.
--
Michael

Attachment Content-Type Size
0001-Relocation-fsync-routines-of-initdb-into-src-common.patch application/x-patch 21.1 KB
0002-Issue-fsync-more-carefully-in-pg_receivexlog-and-pg_.patch application/x-patch 10.1 KB
0003-Add-nosync-option-to-pg_basebackup.patch application/x-patch 10.7 KB
0004-Switch-pg_basebackup-commands-in-Postgres.pm-to-use-.patch application/x-patch 845 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-09-23 15:55:58 Re: pg_upgrade vs user created range type extension
Previous Message Tomas Vondra 2016-09-23 14:59:44 Re: Speed up Clog Access by increasing CLOG buffers