Quick Links

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Christophe Pettus <xof(at)thebuild(dot)com>, Craig Ringer <craig(at)2ndQuadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date:	2018-04-08 23:16:25
Message-ID:	20180408231625.bsfo3ic7owpzub75@alap3.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 2018-04-08 18:29:16 -0400, Bruce Momjian wrote:
> On Sun, Apr 8, 2018 at 09:38:03AM -0700, Christophe Pettus wrote:
> >
> > > On Apr 8, 2018, at 03:30, Craig Ringer <craig(at)2ndQuadrant(dot)com>
> > > wrote:
> > >
> > > These are way more likely than bit flips or other storage level
> > > corruption, and things that we previously expected to detect and
> > > fail gracefully for.
> >
> > This is definitely bad, and it explains a few otherwise-inexplicable
> > corruption issues we've seen. (And great work tracking it down!) I
> > think it's important not to panic, though; PostgreSQL doesn't have a
> > reputation for horrible data integrity. I'm not sure it makes sense
> > to do a major rearchitecting of the storage layer (especially with
> > pluggable storage coming along) to address this. While the failure
> > modes are more common, the solution (a PITR backup) is one that an
> > installation should have anyway against media failures.
>
> I think the big problem is that we don't have any way of stopping
> Postgres at the time the kernel reports the errors to the kernel log, so
> we are then returning potentially incorrect results and committing
> transactions that might be wrong or lost. If we could stop Postgres
> when such errors happen, at least the administrator could fix the
> problem of fail-over to a standby.
>
> An crazy idea would be to have a daemon that checks the logs and stops
> Postgres when it seems something wrong.

I think the danger presented here is far smaller than some of the
statements in this thread might make one think. In all likelihood, once
you've got an IO error that kernel level retries don't fix, your
database is screwed. Whether fsync reports that or not is really
somewhat besides the point. We don't panic that way when getting IO
errors during reads either, and they're more likely to be persistent
than errors during writes (because remapping on storage layer can fix
issues, but not during reads).

There's a lot of not so great things here, but I don't think there's any
need to panic.

We should fix things so that reported errors are treated with crash
recovery, and for the rest I think there's very fair arguments to be
made that that's far outside postgres's remit.

I think there's pretty good reasons to go to direct IO where supported,
but error handling doesn't strike me as a particularly good reason for
the move.

Greetings,

Andres Freund

In response to

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS at 2018-04-08 22:29:16 from Bruce Momjian

Responses

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS at 2018-04-08 23:27:57 from Christophe Pettus
Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS at 2018-04-09 02:00:41 from Craig Ringer

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Christophe Pettus	2018-04-08 23:27:57	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Previous Message	Christophe Pettus	2018-04-08 23:10:24	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS