Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Anthony Iliopoulos <ailiop(at)altatus(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, ailiop(at)altatus(dot)com
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-02 20:38:06
Message-ID: 20180402203806.GN11627@technoir
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 02, 2018 at 12:32:45PM -0700, Andres Freund wrote:
> On 2018-04-02 20:53:20 +0200, Anthony Iliopoulos wrote:
> > On Mon, Apr 02, 2018 at 11:13:46AM -0700, Andres Freund wrote:
> > > Throwing away the dirty pages *and* persisting the error seems a lot
> > > more reasonable. Then provide a fcntl (or whatever) extension that can
> > > clear the error status in the few cases that the application that wants
> > > to gracefully deal with the case.
> >
> > Given precisely that the dirty pages which cannot been written-out are
> > practically thrown away, the semantics of fsync() (after the 4.13 fixes)
> > are essentially correct: the first call indicates that a writeback error
> > indeed occurred, while subsequent calls have no reason to indicate an error
> > (assuming no other errors occurred in the meantime).
>
> Meh^2.
>
> "no reason" - except that there's absolutely no way to know what state
> the data is in. And that your application needs explicit handling of
> such failures. And that one FD might be used in a lots of different
> parts of the application, that fsyncs in one part of the application
> might be an ok failure, and in another not. Requiring explicit actions
> to acknowledge "we've thrown away your data for unknown reason" seems
> entirely reasonable.

As long as fsync() indicates error on first invocation, the application
is fully aware that between this point of time and the last call to fsync()
data has been lost. Persisting this error any further does not change this
or add any new info - on the contrary it adds confusion as subsequent write()s
and fsync()s on other pages can succeed, but will be reported as failures.

The application will need to deal with that first error irrespective of
subsequent return codes from fsync(). Conceptually every fsync() invocation
demarcates an epoch for which it reports potential errors, so the caller
needs to take responsibility for that particular epoch.

Callers that are not affected by the potential outcome of fsync() and
do not react on errors, have no reason for calling it in the first place
(and thus masking failure from subsequent callers that may indeed care).

Best regards,
Anthony

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-04-02 20:51:09 Disabling memory display in EXPLAIN ANALYZE
Previous Message Alvaro Herrera 2018-04-02 20:32:27 Re: BRIN FSM vacuuming questions