Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Catalin Iacob <iacobcatalin(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-03-29 21:18:14
Message-ID: CAEepm=1KFaVPdOxYkP6bmtevOZHfdHTNf8bjZWSkJxoxy0X+7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 30, 2018 at 5:20 AM, Catalin Iacob <iacobcatalin(at)gmail(dot)com> wrote:
> Jeff's comments in the pull request that merged errseq_t are worth
> reading as well:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=088737f44bbf6378745f5b57b035e57ee3dc4750

Wow. It looks like there may be a separate question of when each
filesystem adopted this new infrastructure?

>> Yeah, I see why you want to PANIC.
>
> Indeed. Even doing that leaves question marks about all the kernel
> versions before v4.13, which at this point is pretty much everything
> out there, not even detecting this reliably. This is messy.

The pre-errseq_t problems are beyond our control. There's nothing we
can do about that in userspace (except perhaps abandon OS-buffered IO,
a big project). We just need to be aware that this problem exists in
certain kernel versions and be grateful to Layton for fixing it.

The dropped dirty flag problem is something we can and in my view
should do something about, whatever we might think about that design
choice. As Andrew Gierth pointed out to me in an off-list chat about
this, by the time you've reached this state, both PostgreSQL's buffer
and the kernel's buffer are clean and might be reused for another
block at any time, so your data might be gone from the known universe
-- we don't even have the option to rewrite our buffers in general.
Recovery is the only option.

Thank you to Craig for chasing this down and +1 for his proposal, on Linux only.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-03-29 21:18:20 Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
Previous Message Fujii Masao 2018-03-29 20:37:47 Re: [HACKERS] Replication status in logical replication