Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-03-29 05:25:51
Message-ID: CAMsr+YEa4tv1UCBRQHzA1ycfdvryHFYJ1LhaJJNbjStO3=M9Hg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29 March 2018 at 13:06, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
wrote:

> On Thu, Mar 29, 2018 at 6:00 PM, Justin Pryzby <pryzby(at)telsasoft(dot)com>
> wrote:
> > The retries are the source of the problem ; the first fsync() can return
> EIO,
> > and also *clears the error* causing a 2nd fsync (of the same data) to
> return
> > success.
>
> What I'm failing to grok here is how that error flag even matters,
> whether it's a single bit or a counter as described in that patch. If
> write back failed, *the page is still dirty*. So all future calls to
> fsync() need to try to try to flush it again, and (presumably) fail
> again (unless it happens to succeed this time around).
> <http://www.enterprisedb.com>
>

You'd think so. But it doesn't appear to work that way. You can see
yourself with the error device-mapper destination mapped over part of a
volume.

I wrote a test case here.

https://github.com/ringerc/scrapcode/blob/master/testcases/fsync-error-clear.c

I don't pretend the kernel behaviour is sane. And it's possible I've made
an error in my analysis. But since I've observed this in the wild, and seen
it in a test case, I strongly suspect that's what I've described is just
what's happening, brain-dead or no.

Presumably the kernel marks the page clean when it dispatches it to the I/O
subsystem and doesn't dirty it again on I/O error? I haven't dug that deep
on the kernel side. See the stackoverflow post for details on what I found
in kernel code analysis.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2018-03-29 05:32:43 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Previous Message Tom Lane 2018-03-29 05:10:46 Re: pgsql: Add documentation for the JIT feature.