Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Christophe Pettus <xof(at)thebuild(dot)com>
To: Craig Ringer <craig(at)2ndQuadrant(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-08 16:38:03
Message-ID: 80B1A286-A073-417A-8968-605FE4A11B7B@thebuild.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On Apr 8, 2018, at 03:30, Craig Ringer <craig(at)2ndQuadrant(dot)com> wrote:
>
> These are way more likely than bit flips or other storage level corruption, and things that we previously expected to detect and fail gracefully for.

This is definitely bad, and it explains a few otherwise-inexplicable corruption issues we've seen. (And great work tracking it down!) I think it's important not to panic, though; PostgreSQL doesn't have a reputation for horrible data integrity. I'm not sure it makes sense to do a major rearchitecting of the storage layer (especially with pluggable storage coming along) to address this. While the failure modes are more common, the solution (a PITR backup) is one that an installation should have anyway against media failures.

--
-- Christophe Pettus
xof(at)thebuild(dot)com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-04-08 17:19:59 Re: WIP: a way forward on bootstrap data
Previous Message Teodor Sigaev 2018-04-08 16:31:16 Re: WIP: Covering + unique indexes.