Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-10 06:10:21
Message-ID: 20180410061021.GC26769@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 10, 2018 at 01:37:19PM +0800, Craig Ringer wrote:
> On 10 April 2018 at 13:04, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>> And pg_basebackup. And pg_dump. And pg_dumpall. Anything using initdb
>> -S or fsync_pgdata would enter in those waters.
>
> ... but *only if they hit an I/O error* or they're on a FS that
> doesn't reserve space and hit ENOSPC.

Sure.

> It still does 99% of the job. It still flushes all buffers to
> persistent storage and maintains write ordering. It may not detect and
> report failures to the user how we'd expect it to, yes, and that's not
> great. But it's hardly throw up our hands and give up territory
> either. Also, at least for initdb, we can make initdb fsync() its own
> files before close(). Annoying but hardly the end of the world.

Well, I think that there is place for improving reporting of failure
in file_utils.c for frontends, or at worst have an exit() for any kind
of critical failures equivalent to a PANIC.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-04-10 06:15:47 Re: [sqlsmith] Failed assertion on pfree() via perform_pruning_combine_step
Previous Message Craig Ringer 2018-04-10 05:37:19 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS