Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Geoff Winkless <pgsqladmin(at)geoff(dot)dj>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-10 01:59:03
Message-ID: CAMsr+YHXQyRAf6YVDUN_MbHvjLFVPpzh83c1HYKnUw+UM5=RRA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10 April 2018 at 04:37, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> On 2018-04-09 22:30:00 +0200, Tomas Vondra wrote:
>> Maybe. I'd certainly prefer automated recovery from an temporary I/O
>> issues (like full disk on thin-provisioning) without the database
>> crashing and restarting. But I'm not sure it's worth the effort.
>
> Oh, I agree on that one. But that's more a question of how we force the
> kernel's hand on allocating disk space. In most cases the kernel
> allocates the disk space immediately, even if delayed allocation is in
> effect. For the cases where that's not the case (if there are current
> ones, rather than just past bugs), we should be able to make sure that's
> not an issue by pre-zeroing the data and/or using fallocate.

Nitpick: In most cases the kernel reserves disk space immediately,
before returning from write(). NFS seems to be the main exception
here.

EXT4 and XFS don't allocate until later, it by performing actual
writes to FS metadata, initializing disk blocks, etc. So we won't
notice errors that are only detectable at actual time of allocation,
like thin provisioning problems, until after write() returns and we
face the same writeback issues.

So I reckon you're safe from space-related issues if you're not on NFS
(and whyyy would you do that?) and not thinly provisioned. I'm sure
there are other corner cases, but I don't see any reason to expect
space-exhaustion-related corruption problems on a sensible FS backed
by a sensible block device. I haven't tested things like quotas,
verified how reliable space reservation is under concurrency, etc as
yet.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-04-10 01:59:04 Re: Excessive PostmasterIsAlive calls slow down WAL redo
Previous Message Alvaro Herrera 2018-04-10 01:57:23 Re: Excessive PostmasterIsAlive calls slow down WAL redo