Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-17 21:49:42
Message-ID: 20180417214941.GD13097@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 9, 2018 at 12:25:33PM -0700, Peter Geoghegan wrote:
> On Mon, Apr 9, 2018 at 12:13 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Let's lower the pitchforks a bit here. Obviously a grand rewrite is
> > absurd, as is some of the proposed ways this is all supposed to
> > work. But I think the case we're discussing is much closer to a near
> > irresolvable corner case than anything else.
>
> +1
>
> > We're talking about the storage layer returning an irresolvable
> > error. You're hosed even if we report it properly. Yes, it'd be nice if
> > we could report it reliably. But that doesn't change the fact that what
> > we're doing is ensuring that data is safely fsynced unless storage
> > fails, in which case it's not safely fsynced anyway.
>
> Right. We seem to be implicitly assuming that there is a big
> difference between a problem in the storage layer that we could in
> principle detect, but don't, and any other problem in the storage
> layer. I've read articles claiming that technologies like SMART are
> not really reliable in a practical sense [1], so it seems to me that
> there is reason to doubt that this gap is all that big.
>
> That said, I suspect that the problems with running out of disk space
> are serious practical problems. I have personally scoffed at stories
> involving Postgres databases corruption that gets attributed to
> running out of disk space. Looks like I was dead wrong.

Yes, I think we need to look at user expectations here.

If the device has a hardware write error, it is true that it is good to
detect it, and it might be permanent or temporary, e.g. NAS/NFS. The
longer the error persists, the more likely the user will expect
corruption. However, right now, any length outage could cause
corruption, and it will not be reported in all cases.

Running out of disk space is also something you don't expect to corrupt
your database --- you expect it to only prevent future writes. It seems
NAS/NFS and any thin provisioned storage will have this problem, and
again, not always reported.

So, our initial action might just be to educate users that write errors
can cause silent corruption, and out-of-space errors on NAS/NFS and any
thin provisioned storage can cause corruption.

Kernel logs (not just Postgres logs) should be monitored for these
issues and fail-over/recovering might be necessary.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-04-17 22:11:18 Re: pruning disabled for array, enum, record, range type partition keys
Previous Message Andres Freund 2018-04-17 21:41:42 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS