Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Mark Dilger <hornschnorter(at)gmail(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-09 19:22:58
Message-ID: 15c2bbe5-efaa-8f46-87ab-e4cd9650d0fe@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/09/2018 08:29 PM, Mark Dilger wrote:
>
>> On Apr 9, 2018, at 10:26 AM, Joshua D. Drake <jd(at)commandprompt(dot)com> wrote:
>
>> We have plenty of YEARS of people not noticing this issue
>
> I disagree. I have noticed this problem, but blamed it on other things.
> For over five years now, I have had to tell customers not to use thin
> provisioning, and I have had to add code to postgres to refuse to perform
> inserts or updates if the disk volume is more than 80% full. I have lost
> count of the number of customers who are running an older version of the
> product (because they refuse to upgrade) and come back with complaints that
> they ran out of disk and now their database is corrupt. All this time, I
> have been blaming this on virtualization and thin provisioning.
>

Yeah. There's a big difference between not noticing an issue because it
does not happen very often vs. attributing it to something else. If we
had the ability to revisit past data corruption cases, we would probably
discover a fair number of cases caused by this.

The other thing we probably need to acknowledge is that the environment
changes significantly - things like thin provisioning are likely to get
even more common, increasing the incidence of these issues.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2018-04-09 19:23:35 Re: Fix pg_rewind which can be run as root user
Previous Message Sergei Kornilov 2018-04-09 19:19:47 Re: using index or check in ALTER TABLE SET NOT NULL