Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Anthony Iliopoulos <ailiop(at)altatus(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Geoff Winkless <pgsqladmin(at)geoff(dot)dj>, Greg Stark <stark(at)mit(dot)edu>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-09 22:33:16
Message-ID: CAEepm=1ihcRQEk5xKN3L6G04dVZijs8rrn7BC8F5FpfCSn0F6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 10, 2018 at 2:22 AM, Anthony Iliopoulos <ailiop(at)altatus(dot)com> wrote:
> On Mon, Apr 09, 2018 at 03:33:18PM +0200, Tomas Vondra wrote:
>> Well, there seem to be kernels that seem to do exactly that already. At
>> least that's how I understand what this thread says about FreeBSD and
>> Illumos, for example. So it's not an entirely insane design, apparently.
>
> It is reasonable, but even FreeBSD has a big fat comment right
> there (since 2017), mentioning that there can be no recovery from
> EIO at the block layer and this needs to be done differently. No
> idea how an application running on top of either FreeBSD or Illumos
> would actually recover from this error (and clear it out), other
> than remounting the fs in order to force dropping of relevant pages.
> It does provide though indeed a persistent error indication that
> would allow Pg to simply reliably panic. But again this does not
> necessarily play well with other applications that may be using
> the filesystem reliably at the same time, and are now faced with
> EIO while their own writes succeed to be persisted.

Right. For anyone interested, here is the change you mentioned, and
an interesting one that came a bit earlier last year:

https://reviews.freebsd.org/rS316941 -- drop buffers after device goes away
https://reviews.freebsd.org/rS326029 -- update comment about EIO contract

Retrying may well be futile, but at least future fsync() calls won't
report success bogusly. There may of course be more space-efficient
ways to represent that state as the comment implies, while never lying
to the user -- perhaps involving filesystem level or (pinned) inode
level errors that stop all writes until unmounted. Something tells me
they won't resort to flakey fsync() error reporting.

I wonder if anyone can tell us what Windows, AIX and HPUX do here.

> [1] https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-pillai.pdf

Very interesting, thanks.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-04-10 00:32:20 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Previous Message Tom Lane 2018-04-09 22:03:02 Re: pgsql: Support partition pruning at execution time