Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-18 10:19:28
Message-ID: CAMsr+YGHyJ61J6NithMFum63N_BVWAfWUhSEONuu6C_awiSvQg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 10 April 2018 at 20:15, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> On 10 April 2018 at 14:10, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
>> Well, I think that there is place for improving reporting of failure
>> in file_utils.c for frontends, or at worst have an exit() for any kind
>> of critical failures equivalent to a PANIC.
>
> Yup.
>
> In the mean time, speaking of PANIC, here's the first cut patch to
> make Pg panic on fsync() failures. I need to do some closer review and
> testing, but it's presented here for anyone interested.
>
> I intentionally left some failures as ERROR not PANIC, where the
> entire operation is done as a unit, and an ERROR will cause us to
> retry the whole thing.
>
> For example, when we fsync() a temp file before we move it into place,
> there's no point panicing on failure, because we'll discard the temp
> file on ERROR and retry the whole thing.
>
> I've verified that it works as expected with some modifications to the
> test tool I've been using (pushed).
>
> The main downside is that if we panic in redo, we don't try again. We
> throw our toys and shut down. But arguably if we get the same I/O
> error again in redo, that's the right thing to do anyway, and quite
> likely safer than continuing to ERROR on checkpoints indefinitely.
>
> Patch attached.
>
> To be clear, this patch only deals with the issue of us retrying
> fsyncs when it turns out to be unsafe. This does NOT address any of
> the issues where we won't find out about writeback errors at all.

Thinking about this some more, it'll definitely need a GUC to force it
to continue despite a potential hazard. Otherwise we go backwards from
the status quo if we're in a position where uptime is vital and
correctness problems can be tolerated or repaired later. Kind of like
zero_damaged_pages, we'll need some sort of
continue_after_fsync_errors .

Without that, we'll panic once, enter redo, and if the problem
persists we'll panic in redo and exit the startup process. That's not
going to help users.

I'll amend the patch accordingly as time permits.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-04-18 10:27:16 Re: Boolean partitions syntax
Previous Message Yuriy Zhuravlev 2018-04-18 10:16:15 Re: Setting rpath on llvmjit.so?