Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Christophe Pettus <xof(at)thebuild(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-10 01:52:21
Message-ID: CAEepm=2Lwxmzd8Ajwq-O2sy5V7N7st040j3gycBQ2K8F_pf26w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 10, 2018 at 1:44 PM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> On 10 April 2018 at 03:59, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> I don't think that's as hard as some people argued in this thread. We
>> could very well open a pipe in postmaster with the write end open in
>> each subprocess, and the read end open only in checkpointer (and
>> postmaster, but unused there). Whenever closing a file descriptor that
>> was dirtied in the current process, send it over the pipe to the
>> checkpointer. The checkpointer then can receive all those file
>> descriptors (making sure it's not above the limit, fsync(), close() ing
>> to make room if necessary). The biggest complication would presumably
>> be to deduplicate the received filedescriptors for the same file,
>> without loosing track of any errors.
>
> Yep. That'd be a cheaper way to do it, though it wouldn't work on
> Windows. Though we don't know how Windows behaves here at all yet.
>
> Prior discussion upthread had the checkpointer open()ing a file at the
> same time as a backend, before the backend writes to it. But passing
> the fd when the backend is done with it would be better.

How would that interlock with concurrent checkpoints?

I can see how to make that work if the share-fd-or-fsync-now logic
happens in smgrwrite() when called by FlushBuffer() while you hold
io_in_progress, but not if you defer it to some random time later.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2018-04-10 01:54:30 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Previous Message Craig Ringer 2018-04-10 01:44:59 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS