Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-04 14:09:09
Message-ID: CAMsr+YG-aP_LQ8muweGEyroOggEx5wVD2sFup9VmQGrkD0ZAAA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4 April 2018 at 22:00, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:

> It's the error reporting issues around closing and reopening files with
> outstanding buffered I/O that's really going to hurt us here. I'll be
> expanding my test case to cover that shortly.
>
>
Also, just to be clear, this is not in any way confined to xfs and/or lvm
as I originally thought it might be.

Nor is ext3/ext4's errors=remount-ro protective. data_err=abort doesn't
help either (so what does it do?).

What bewilders me is that running with data=journal doesn't seem to be safe
either. WTF?

[26438.846111] EXT4-fs (dm-0): mounted filesystem with journalled data
mode. Opts: errors=remount-ro,data_err=abort,data=journal
[26454.125319] EXT4-fs warning (device dm-0): ext4_end_bio:323: I/O error
10 writing to inode 12 (offset 0 size 0 starting block 59393)
[26454.125326] Buffer I/O error on device dm-0, logical block 59393
[26454.125337] Buffer I/O error on device dm-0, logical block 59394
[26454.125343] Buffer I/O error on device dm-0, logical block 59395
[26454.125350] Buffer I/O error on device dm-0, logical block 59396

and splat, there goes your data anyway.

It's possible that this is in some way related to using the device-mapper
"error" target and a loopback device in testing. But I don't really see how.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2018-04-04 14:25:03 Re: Rewrite of pg_dump TAP tests
Previous Message Teodor Sigaev 2018-04-04 14:09:03 Re: json(b)_to_tsvector with numeric values