Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Christophe Pettus <xof(at)thebuild(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-09 13:54:19
Message-ID: 375f0e89-90fe-173d-fdd1-eb37d60a42ca@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/09/2018 04:00 AM, Craig Ringer wrote:
> On 9 April 2018 at 07:16, Andres Freund <andres(at)anarazel(dot)de
> <mailto:andres(at)anarazel(dot)de>> wrote:
>  
>
>
> I think the danger presented here is far smaller than some of the
> statements in this thread might make one think.
>
>
> Clearly it's not happening a huge amount or we'd have a lot of noise
> about Pg eating people's data, people shouting about how unreliable it
> is, etc. We don't. So it's not some earth shattering imminent threat to
> everyone's data. It's gone unnoticed, or the root cause unidentified,
> for a long time.
>

Yeah, it clearly isn't the case that everything we do suddenly got
pointless. It's fairly annoying, though.

> I suspect we've written off a fair few issues in the past as "it'd
> bad hardware" when actually, the hardware fault was the trigger for
> a Pg/kernel interaction bug. And blamed containers for things that
> weren't really the container's fault. But even so, if it were
> happening tons, we'd hear more noise.
>

Right. Write errors are fairly rare, and we've probably ignored a fair
number of cases demonstrating this issue. It kinda reminds me the wisdom
that not seeing planes with bullet holes in the engine does not mean
engines don't need armor [1].

[1]
https://medium.com/@penguinpress/an-excerpt-from-how-not-to-be-wrong-by-jordan-ellenberg-664e708cfc3d

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-04-09 13:59:48 Re: [sqlsmith] Failed assertion on pfree() via perform_pruning_combine_step
Previous Message Abhijit Menon-Sen 2018-04-09 13:47:03 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS