Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Anthony Iliopoulos <ailiop(at)altatus(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-09 12:54:16
Message-ID: 20180409125415.GC4233@ai-wks
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 09, 2018 at 08:16:38PM +0800, Craig Ringer wrote:
>
> I'd like a middle ground where the kernel lets us register our interest and
> tells us if it lost something, without us having to keep eight million FDs
> open for some long period. "Tell us about anything that happens under
> pgdata/" or an inotify-style per-directory-registration option. I'd even
> say that's ideal.

I see what you are saying. So basically you'd always maintain the
notification descriptor open, where the kernel would inject events
related to writeback failures of files under watch (potentially
enriched to contain info regarding the exact failed pages and
the file offset they map to). The kernel wouldn't even have to
maintain per-page bits to trace the errors, since they will be
consumed by the process that reads the events (or discarded,
when the notification fd is closed).

Assuming this would be possible, wouldn't Pg still need to deal
with synchronizing writers and related issues (since this would
be merely a notification mechanism - not prevent any process
from continuing), which I understand would be rather intrusive
for the current Pg multi-process design.

But other than that, similarly this interface could in principle
be similarly implemented in the BSDs via kqueue(), I suppose,
to provide what you need.

Best regards,
Anthony

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?= 2018-04-09 13:00:55 Re: Transform for pl/perl
Previous Message Anthony Iliopoulos 2018-04-09 12:31:27 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS