pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)
Date: 2016-05-13 06:39:35
Message-ID: CAB7nPqQ_B0j3n1t=8c1ZLHXF1b8Tf4XsXoUC9bP9t5Hab--SMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

Beginning a new thread because the ext4 issues are closed, and because
pg_basebackup data durability meritates a new thread. And in short
about the problem: pg_basebackup makes no effort in being sure that
the data it backs up is on disk, which is bad... One possible
recommendation is to use initdb -S after running pg_basebackup, but
making sure that data is on disk should be done before pg_basebackup
ends.

On Thu, May 12, 2016 at 8:09 PM, I wrote:
> And actually this won't fly high if there is no equivalent of
> walkdir() or if the fsync()'s are not applied recursively. On master
> at least the refactoring had better be done cleanly first... For the
> back branches, we could just have some recursive call like
> fsync_recursively and keep that in src/bin/pg_basebackup. Andres, do
> you think that this should be part of fe_utils or src/common/? I'd
> tend to think the latter is more adapted as there is an equivalent in
> the backend. On back-branches, we could just have something like
> fsync_recursively that walks though the paths. An even more simple
> approach would be to fsync() individually things that have been
> written, but that would suck in performance.

So, attached are two patches that apply on HEAD to address the problem
of pg_basebackup that does not sync the data it writes. As
pg_basebackup cannot use directly initdb -S because, as a client-side
utility, it may be installed while initdb is not (see Fedora and
RHEL), I have refactored the code so as the routines in initdb.c doing
the fsync of PGDATA and other fsync stuff are in src/fe_utils/, and
this is 0001.

Patch 0002 is a set of fixes for pg_basebackup:
- In plain mode, fsync_pgdata is used so as all the tablespaces are
fsync'd at once. This takes care as well of the case where pg_xlog is
a symlink.
- In tar mode (no stdout), each tar file is synced individually, and
the base directory is synced once at the end.
In both cases, failures are not considered fatal.

With pg_basebackup -X and pg_receivexlog, the manipulation of WAL
files is made durable by using fsync and durable_rename where needed
(credits to Andres mainly for this part).

This set of patches is aimed only at HEAD. Back-patchable versions of
this patch would need to copy fsync_pgdata and friends into
streamutil.c for example.

I am adding that to the next CF for review as a bug fix.
Regards,
--
Michael

Attachment Content-Type Size
0001-Relocation-fsync-routines-of-initdb-into-fe_utils.patch application/x-download 20.5 KB
0002-Issue-fsync-more-carefully-in-pg_receivexlog-and-pg_.patch application/x-download 10.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message George Neuner 2016-05-13 06:49:36 Re: NULL concatenation
Previous Message Amit Kapila 2016-05-13 05:01:54 Re: [sqlsmith] Failed assertion in parallel worker (ExecInitSubPlan)