Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Andres Freund <andres(at)anarazel(dot)de>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Christophe Pettus <xof(at)thebuild(dot)com>, Craig Ringer <craig(at)2ndQuadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Robert Haas <robertmhaas(at)gmail(dot)com>, Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-17 21:41:42
Message-ID: 20180417214142.mjx3stcxrqrqenna@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018-04-17 17:32:45 -0400, Bruce Momjian wrote:
> On Mon, Apr 9, 2018 at 03:42:35PM +0200, Tomas Vondra wrote:
> > That doesn't seem like a very practical way. It's better than nothing,
> > of course, but I wonder how would that work with containers (where I
> > think you may not have access to the kernel log at all). Also, I'm
> > pretty sure the messages do change based on kernel version (and possibly
> > filesystem) so parsing it reliably seems rather difficult. And we
> > probably don't want to PANIC after I/O error on an unrelated device, so
> > we'd need to understand which devices are related to PostgreSQL.

You can certainly have access to the kernel log in containers. I'd
assume such a script wouldn't check various system logs but instead tail
/dev/kmsg or such. Otherwise the variance between installations would be
too big.

There's not *that* many different type of error messages and they don't
change that often. If we'd just detect error for the most common FSs
we'd probably be good. Detecting a few general storage layer message
wouldn't be that hard either, most things have been unified over the
last ~8-10 years.

> Replying to your specific case, I am not sure how we would use a script
> to check for I/O errors/space-exhaustion if the postgres user doesn't
> have access to it.

Not sure what you mean?

Space exhaustiion can be checked when allocating space, FWIW. We'd just
need to use posix_fallocate et al.

> Does O_DIRECT work in such container cases?

Yes.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2018-04-17 21:49:42 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Previous Message Alvaro Herrera 2018-04-17 21:40:52 Re: Append's first_partial_plan