Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Andres Freund <andres(at)anarazel(dot)de>
To: Anthony Iliopoulos <ailiop(at)altatus(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-02 19:32:45
Message-ID: 20180402193245.urpdavk3wtaycxfz@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018-04-02 20:53:20 +0200, Anthony Iliopoulos wrote:
> On Mon, Apr 02, 2018 at 11:13:46AM -0700, Andres Freund wrote:
> > Throwing away the dirty pages *and* persisting the error seems a lot
> > more reasonable. Then provide a fcntl (or whatever) extension that can
> > clear the error status in the few cases that the application that wants
> > to gracefully deal with the case.
>
> Given precisely that the dirty pages which cannot been written-out are
> practically thrown away, the semantics of fsync() (after the 4.13 fixes)
> are essentially correct: the first call indicates that a writeback error
> indeed occurred, while subsequent calls have no reason to indicate an error
> (assuming no other errors occurred in the meantime).

Meh^2.

"no reason" - except that there's absolutely no way to know what state
the data is in. And that your application needs explicit handling of
such failures. And that one FD might be used in a lots of different
parts of the application, that fsyncs in one part of the application
might be an ok failure, and in another not. Requiring explicit actions
to acknowledge "we've thrown away your data for unknown reason" seems
entirely reasonable.

> The error reporting is thus consistent with the intended semantics (which
> are sadly not properly documented). Repeated calls to fsync() simply do not
> imply that the kernel will retry to writeback the previously-failed pages,
> so the application needs to be aware of that.

Which isn't what I've suggested.

> Persisting the error at the fsync() level would essentially mean
> moving application policy into the kernel.

Meh.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeremy Finzel 2018-04-02 19:33:54 Re: Passing current_database to BackgroundWorkerInitializeConnection
Previous Message Andres Freund 2018-04-02 19:27:30 Re: Passing current_database to BackgroundWorkerInitializeConnection