Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Andres Freund <andres(at)anarazel(dot)de>
To: Christophe Pettus <xof(at)thebuild(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Craig Ringer <craig(at)2ndQuadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-09 01:55:10
Message-ID: 20180409015510.y5eh2bx4duu7hul5@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2018-04-08 16:27:57 -0700, Christophe Pettus wrote:
> > On Apr 8, 2018, at 16:16, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > We don't panic that way when getting IO
> > errors during reads either, and they're more likely to be persistent
> > than errors during writes (because remapping on storage layer can fix
> > issues, but not during reads).
>
> There is a distinction to be drawn there, though, because we
> immediately pass an error back to the client on a read, but a write
> problem in this situation can be masked for an extended period of
> time.

Only if you're "lucky" enough that your clients actually read that data,
and then you're somehow able to figure out across the whole stack that
these 0.001% of transactions that fail are due to IO errors. Or you also
need to do log analysis.

If you want to solve things like that you need regular reads of all your
data, including verifications etc.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2018-04-09 02:00:41 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Previous Message Craig Ringer 2018-04-09 01:35:06 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS