Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Christophe Pettus <xof(at)thebuild(dot)com>, Craig Ringer <craig(at)2ndQuadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-17 21:29:17
Message-ID: 20180417212917.GB13097@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Apr 9, 2018 at 03:42:35PM +0200, Tomas Vondra wrote:
> On 04/09/2018 12:29 AM, Bruce Momjian wrote:
> >
> > An crazy idea would be to have a daemon that checks the logs and
> > stops Postgres when it seems something wrong.
> >
>
> That doesn't seem like a very practical way. It's better than nothing,
> of course, but I wonder how would that work with containers (where I
> think you may not have access to the kernel log at all). Also, I'm
> pretty sure the messages do change based on kernel version (and possibly
> filesystem) so parsing it reliably seems rather difficult. And we
> probably don't want to PANIC after I/O error on an unrelated device, so
> we'd need to understand which devices are related to PostgreSQL.

My more-considered crazy idea is to have a postgresql.conf setting like
archive_command that allows the administrator to specify a command that
will be run _after_ fsync but before the checkpoint is marked as
complete. While we can have write flush errors before fsync and never
see the errors during fsync, we will not have write flush errors _after_
fsync that are associated with previous writes.

The script should check for I/O or space-exhaustion errors and return
false in that case, in which case we can stop and maybe stop and crash
recover. We could have an exit of 1 do the former, and an exit of 2 do
the later.

Also, if we are relying on WAL, we have to make sure WAL is actually
safe with fsync, and I am betting only the O_DIRECT methods actually
are safe:

#wal_sync_method = fsync # the default is the first option
# supported by the operating system:
# open_datasync
--> # fdatasync (default on Linux)
--> # fsync
--> # fsync_writethrough
# open_sync

I am betting the marked wal_sync_method methods are not safe since there
is time between the write and fsync.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2018-04-17 21:30:11 Re: Append's first_partial_plan
Previous Message Bruce Momjian 2018-04-17 21:19:53 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS