Re: pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

From: Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)
Date: 2016-05-13 14:49:24
Message-ID: 5735E974.3070903@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 13.05.2016 9:39, Michael Paquier wrote:
> Hi all,
>
> Beginning a new thread because the ext4 issues are closed, and because
> pg_basebackup data durability meritates a new thread. And in short
> about the problem: pg_basebackup makes no effort in being sure that
> the data it backs up is on disk, which is bad... One possible
> recommendation is to use initdb -S after running pg_basebackup, but
> making sure that data is on disk should be done before pg_basebackup
> ends.
>
> On Thu, May 12, 2016 at 8:09 PM, I wrote:
>> And actually this won't fly high if there is no equivalent of
>> walkdir() or if the fsync()'s are not applied recursively. On master
>> at least the refactoring had better be done cleanly first... For the
>> back branches, we could just have some recursive call like
>> fsync_recursively and keep that in src/bin/pg_basebackup. Andres, do
>> you think that this should be part of fe_utils or src/common/? I'd
>> tend to think the latter is more adapted as there is an equivalent in
>> the backend. On back-branches, we could just have something like
>> fsync_recursively that walks though the paths. An even more simple
>> approach would be to fsync() individually things that have been
>> written, but that would suck in performance.
>
> So, attached are two patches that apply on HEAD to address the problem
> of pg_basebackup that does not sync the data it writes. As
> pg_basebackup cannot use directly initdb -S because, as a client-side
> utility, it may be installed while initdb is not (see Fedora and
> RHEL), I have refactored the code so as the routines in initdb.c doing
> the fsync of PGDATA and other fsync stuff are in src/fe_utils/, and
> this is 0001.
>
> Patch 0002 is a set of fixes for pg_basebackup:
> - In plain mode, fsync_pgdata is used so as all the tablespaces are
> fsync'd at once. This takes care as well of the case where pg_xlog is
> a symlink.
> - In tar mode (no stdout), each tar file is synced individually, and
> the base directory is synced once at the end.
> In both cases, failures are not considered fatal.
>
> With pg_basebackup -X and pg_receivexlog, the manipulation of WAL
> files is made durable by using fsync and durable_rename where needed
> (credits to Andres mainly for this part).
>
> This set of patches is aimed only at HEAD. Back-patchable versions of
> this patch would need to copy fsync_pgdata and friends into
> streamutil.c for example.
>
> I am adding that to the next CF for review as a bug fix.
> Regards,
>
>
>
>
Hi!
Do we have any confidence that data file is not being corrupted? I.e
contains some corrupted page? Can pg_basebackup check page checksum (db
init with initdb -k) while backing up files?

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-05-13 15:05:23 10.0
Previous Message Robert Haas 2016-05-13 14:42:50 Re: Lets (not) break all the things. Was: [pgsql-advocacy] 9.6 -> 10.0