Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Greg Stark <stark(at)mit(dot)edu>
To: Anthony Iliopoulos <ailiop(at)altatus(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-03 11:26:05
Message-ID: CAM-w4HN5zTdb+kUVDqdvJ0neY54oZaUv2fnmS8w8yEtinECNYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3 April 2018 at 11:35, Anthony Iliopoulos <ailiop(at)altatus(dot)com> wrote:
> Hi Robert,
>
> Fully agree, and the errseq_t fixes have dealt exactly with the issue
> of making sure that the error is reported to all file descriptors that
> *happen to be open at the time of error*. But I think one would have a
> hard time defending a modification to the kernel where this is further
> extended to cover cases where:
>
> process A does write() on some file offset which fails writeback,
> fsync() gets EIO and exit()s.
>
> process B does write() on some other offset which succeeds writeback,
> but fsync() gets EIO due to (uncleared) failures of earlier process.

Surely that's exactly what process B would want? If it calls fsync and
gets a success and later finds out that the file is corrupt and didn't
match what was in memory it's not going to be happy.

This seems like an attempt to co-opt fsync for a new and different
purpose for which it's poorly designed. It's not an async error
reporting mechanism for writes. It would be useless as that as any
process could come along and open your file and eat the errors for
writes you performed. An async error reporting mechanism would have to
document which writes it was giving errors for and give you ways to
control that.

The semantics described here are useless for everyone. For a program
needing to know the error status of the writes it executed, it doesn't
know which writes are included in which fsync call. For a program
using fsync for its original intended purpose of guaranteeing that the
all writes are synced to disk it no longer has any guarantee at all.

> This would be a highly user-visible change of semantics from edge-
> triggered to level-triggered behavior.

It was always documented as level-triggered. This edge-triggered
concept is a completely surprise to application writers.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Ivanov 2018-04-03 11:28:37 Re: new function for tsquery creartion
Previous Message Tomas Vondra 2018-04-03 11:10:55 Re: [HACKERS] [PATCH] Incremental sort