Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-04 07:51:53
Message-ID: CAMsr+YF7P68Pv3C-_GN9k3-RfS9Scj+Vbe00OQTNdtVGTV7jBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4 April 2018 at 14:00, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:

> On 4 April 2018 at 13:29, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
> wrote:
>
>> On Wed, Apr 4, 2018 at 2:44 PM, Thomas Munro
>> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> > On Wed, Apr 4, 2018 at 2:14 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> >> Uh, are you sure it fixes our use-case? From the email description it
>> >> sounded like it only reported fsync errors for every open file
>> >> descriptor at the time of the failure, but the checkpoint process might
>> >> open the file _after_ the failure and try to fsync a write that
>> happened
>> >> _before_ the failure.
>> >
>> > I'm not sure of anything. I can see that it's designed to report
>> > errors since the last fsync() of the *file* (presumably via any fd),
>> > which sounds like the desired behaviour:
>> >
>> > [..]
>>
>> Scratch that. Whenever you open a file descriptor you can't see any
>> preceding errors at all, because:
>>
>> /* Ensure that we skip any errors that predate opening of the file */
>> f->f_wb_err = filemap_sample_wb_err(f->f_mapping);
>>
>> https://github.com/torvalds/linux/blob/master/fs/open.c#L752
>>
>> Our whole design is based on being able to open, close and reopen
>> files at will from any process, and in particular to fsync() from a
>> different process that didn't inherit the fd but instead opened it
>> later. But it looks like that might be able to eat errors that
>> occurred during asynchronous writeback (when there was nobody to
>> report them to), before you opened the file?
>>
>
> Holy hell. So even PANICing on fsync() isn't sufficient, because the
> kernel will deliberately hide writeback errors that predate our fsync()
> call from us?
>
> I'll see if I can expand my testcase for that. I'm presently dockerizing
> it to make it easier for others to use, but that turns out to be a major
> pain when using devmapper etc. Docker in privileged mode doesn't seem to
> play nice with device-mapper.
>
>
Done, you can find it in
https://github.com/ringerc/scrapcode/tree/master/testcases/fsync-error-clear
now.

Warning, this runs a Docker container in privileged mode on your system,
and it uses devicemapper. Read it before you run it, and while I've tried
to keep it safe, beware that it might eat your system.

For now it tests only xfs and EIO. Other FSs should be easy enough.

I haven't added coverage for multi-processing yet, but given what you found
above, I should. I'll probably just system() a copy of the same proc with
instructions to only fsync(). I'll do that next.

I haven't worked out a reliable way to trigger ENOSPC on fsync() yet, when
mapping without the error hole. It happens sometimes but I don't know why,
it almost always happens on write() instead. I know it can happen on nfs,
but I'm hoping for a saner example than that to test with. ext4 and xfs do
delayed allocation but eager reservation so it shouldn't happen to them.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2018-04-04 08:26:46 Re: Problem while setting the fpw with SIGHUP
Previous Message Thomas Munro 2018-04-04 07:32:04 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS