Re: silent data loss with ext4 / all current versions

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, dpage(at)pgadmin(dot)org, Guillaume Lelarge <guillaume(dot)lelarge(at)dalibo(dot)com>
Subject: Re: silent data loss with ext4 / all current versions
Date: 2016-05-12 11:09:06
Message-ID: CAB7nPqSTkEHbOVC4aDp8zqCPdWyZs_b6P5G7VCPVeuXw5fWpDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 12, 2016 at 2:58 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Mon, Mar 28, 2016 at 8:25 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> I've also noticed that
>
> Coming back to this issue because...
>
>> a) pg_basebackup doesn't do anything about durability (it probably needs
>> a very similar patch to the one pg_rewind just received).
>
> I think that one of the QE tests running here just got bitten by that.
> A base backup was taken with pg_basebackup and more or less after a VM
> was plugged off. The trick is that for pg_basebackup we cannot rely on
> initdb: pg_basebackup is a client-side utility. In most of the PG
> packages (Fedora, RHEL), it is put on the client-side package, where
> initdb is not. So it seems to me that the correct fix is not to use
> initdb -S but to have copies of fsync_parent_path, durable_rename and
> fsync_fname_ext in streamutil.c, and then we reuse them for both
> pg_receivexlog and pg_basebackup. At least that's less risky for
> back-branches this way.
>
>> b) nor does pg_dump[all]
>
> I have not hacked up that yet, but I would think that we would need
> one extra copy of some of those fsync_* routines in src/bin/pg_dump/.
> There is another thread for that already... On master I guess we'd end
> with something centralized in src/common/, but let's close the
> existing holes first.
>
>> So we're going to have another round of fsync stuff in the next set of
>> releases anyway...
>
> The sooner the better I think. Any people caring about this problem
> are now limited in using initdb -S after calling pg_basebackup or
> pg_dump. That's a solution, though the flushes should be contained
> inside each utility.

And actually this won't fly high if there is no equivalent of
walkdir() or if the fsync()'s are not applied recursively. On master
at least the refactoring had better be done cleanly first... For the
back branches, we could just have some recursive call like
fsync_recursively and keep that in src/bin/pg_basebackup. Andres, do
you think that this should be part of fe_utils or src/common/? I'd
tend to think the latter is more adapted as there is an equivalent in
the backend. On back-branches, we could just have something like
fsync_recursively that walks though the paths. An even more simple
approach would be to fsync() individually things that have been
written, but that would suck in performance.

Thoughts from others?
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Adam Pearson 2016-05-12 11:26:19 Re: [GENERAL] NULL concatenation
Previous Message hari.prasath 2016-05-12 10:00:22 Re: Incremental refresh of materialized view - Patch