Quick Links

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date:	2018-04-23 20:14:48
Message-ID:	20180423201448.nxe6jc5tu63kzum7@alap3.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 2018-03-28 10:23:46 +0800, Craig Ringer wrote:
> TL;DR: Pg should PANIC on fsync() EIO return. Retrying fsync() is not OK at
> least on Linux. When fsync() returns success it means "all writes since the
> last fsync have hit disk" but we assume it means "all writes since the last
> SUCCESSFUL fsync have hit disk".

> But then we retried the checkpoint, which retried the fsync(). The retry
> succeeded, because the prior fsync() *cleared the AS_EIO bad page flag*.

Random other thing we should look at: Some filesystems (nfs yes, xfs
ext4 no) flush writes at close(2). We check close() return code, just
log it... So close() counts as an fsync for such filesystems().

I'm LSF/MM to discuss future behaviour of linux here, but that's how it
is right now.

Greetings,

Andres Freund

In response to

PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS at 2018-03-28 02:23:46 from Craig Ringer

Responses

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS at 2018-04-24 00:09:23 from Bruce Momjian
Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS at 2018-04-26 02:16:52 from Craig Ringer

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2018-04-23 21:10:20	"could not reattach to shared memory" on buildfarm member dory
Previous Message	Robert Haas	2018-04-23 20:14:45	Re: Built-in connection pooling