Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-09 19:13:14
Message-ID: 20180409191314.4thzq6ptwvgyel6c@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2018-04-09 15:02:11 -0400, Robert Haas wrote:
> I think the simplest technological solution to this problem is to
> rewrite the entire backend and all supporting processes to use
> O_DIRECT everywhere. To maintain adequate performance, we'll have to
> write a complete I/O scheduling system inside PostgreSQL. Also, since
> we'll now have to make shared_buffers much larger -- since we'll no
> longer be benefiting from the OS cache -- we'll need to replace the
> use of malloc() with an allocator that pulls from shared_buffers.
> Plus, as noted, we'll need to totally rearchitect several of our
> critical frontend tools. Let's freeze all other development for the
> next year while we work on that, and put out a notice that Linux is no
> longer a supported platform for any existing release. Before we do
> that, we might want to check whether fsync() actually writes the data
> to disk in a usable way even with O_DIRECT. If not, we should just
> de-support Linux entirely as a hopelessly broken and unsupportable
> platform.

Let's lower the pitchforks a bit here. Obviously a grand rewrite is
absurd, as is some of the proposed ways this is all supposed to
work. But I think the case we're discussing is much closer to a near
irresolvable corner case than anything else.

We're talking about the storage layer returning an irresolvable
error. You're hosed even if we report it properly. Yes, it'd be nice if
we could report it reliably. But that doesn't change the fact that what
we're doing is ensuring that data is safely fsynced unless storage
fails, in which case it's not safely fsynced anyway.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sergei Kornilov 2018-04-09 19:19:47 Re: using index or check in ALTER TABLE SET NOT NULL
Previous Message Tom Lane 2018-04-09 19:02:58 Re: Verbosity of genbki.pl