Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-04 00:56:37
Message-ID: 20180404005637.GA27931@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 3, 2018 at 05:47:01PM -0400, Robert Haas wrote:
> Well, in PostgreSQL, we have a background process called the
> checkpointer which is the process that normally does all of the
> fsync() calls but only a subset of the write() calls. The
> checkpointer does not, however, necessarily have every file open all
> the time, so these fixes aren't sufficient to make sure that the
> checkpointer ever sees an fsync() failure.

There has been a lot of focus in this thread on the workflow:

write() -> blocks remain in kernel memory -> fsync() -> panic?

But what happens in this workflow:

write() -> kernel syncs blocks to storage -> fsync()

Is fsync() going to see a "kernel syncs blocks to storage" failure?

There was already discussion that if the fsync() causes the "syncs
blocks to storage", fsync() will only report the failure once, but will
it see any failure in the second workflow? There is indication that a
failed write to storage reports back an error once and clears the dirty
flag, but do we know it keeps things around long enough to report an
error to a future fsync()?

You would think it does, but I have to ask since our fsync() assumptions
have been wrong for so long.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2018-04-04 01:13:57 Re: [HACKERS] path toward faster partition pruning
Previous Message Michael Paquier 2018-04-04 00:52:25 Re: Running Installcheck remotely