Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Mark Dilger <hornschnorter(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-09 20:25:54
Message-ID: FA3F2D78-91E3-4F6D-9E48-9C7A6334569B@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On Apr 9, 2018, at 12:13 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2018-04-09 15:02:11 -0400, Robert Haas wrote:
>> I think the simplest technological solution to this problem is to
>> rewrite the entire backend and all supporting processes to use
>> O_DIRECT everywhere. To maintain adequate performance, we'll have to
>> write a complete I/O scheduling system inside PostgreSQL. Also, since
>> we'll now have to make shared_buffers much larger -- since we'll no
>> longer be benefiting from the OS cache -- we'll need to replace the
>> use of malloc() with an allocator that pulls from shared_buffers.
>> Plus, as noted, we'll need to totally rearchitect several of our
>> critical frontend tools. Let's freeze all other development for the
>> next year while we work on that, and put out a notice that Linux is no
>> longer a supported platform for any existing release. Before we do
>> that, we might want to check whether fsync() actually writes the data
>> to disk in a usable way even with O_DIRECT. If not, we should just
>> de-support Linux entirely as a hopelessly broken and unsupportable
>> platform.
>
> Let's lower the pitchforks a bit here. Obviously a grand rewrite is
> absurd, as is some of the proposed ways this is all supposed to
> work. But I think the case we're discussing is much closer to a near
> irresolvable corner case than anything else.
>
> We're talking about the storage layer returning an irresolvable
> error. You're hosed even if we report it properly. Yes, it'd be nice if
> we could report it reliably. But that doesn't change the fact that what
> we're doing is ensuring that data is safely fsynced unless storage
> fails, in which case it's not safely fsynced anyway.

I was reading this thread up until now as meaning that the standby could
receive corrupt WAL data and become corrupted. That seems a much bigger
problem than merely having the master become corrupted in some unrecoverable
way. It is a long standing expectation that serious hardware problems on
the master can result in the master needing to be replaced. But there has
not been an expectation that the one or more standby servers would be taken
down along with the master, leaving all copies of the database unusable.
If this bug corrupts the standby servers, too, then it is a whole different
class of problem than the one folks have come to expect.

Your comment reads as if this is a problem isolated to whichever server has
the problem, and will not get propagated to other servers. Am I reading
that right?

Can anybody clarify this for non-core-hacker folks following along at home?

mark

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2018-04-09 20:30:00 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Previous Message Stephen Frost 2018-04-09 20:23:44 Re: Shared PostgreSQL libraries and symbol versioning