Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Anthony Iliopoulos <ailiop(at)altatus(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date: 2018-04-04 02:40:16
Message-ID: CAMsr+YFdyknZQhV4Xj5hpAjBqtWw5p-hnJuSbvJ+eB4GF8L8MQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4 April 2018 at 05:47, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>
> Now, I hear the DIRECT_IO thing and I assume we're eventually going to
> have to go that way: Linux kernel developers seem to think that "real
> men use O_DIRECT" and so if other forms of I/O don't provide useful
> guarantees, well that's our fault for not using O_DIRECT. That's a
> political reason, not a technical reason, but it's a reason all the
> same.
>

I looked into buffered AIO a while ago, by the way, and just ... hell no.
Run, run as fast as you can.

The trouble with direct I/O is that it pushes a _lot_ of work back on
PostgreSQL regarding knowledge of the storage subsystem, I/O scheduling,
etc. It's absurd to have the kernel do this, unless you want it reliable,
in which case you bypass it and drive the hardware directly.

We'd need pools of writer threads to deal with all the blocking I/O. It'd
be such a nightmare. Hey, why bother having a kernel at all, except for
drivers?

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-04-04 02:44:22 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Previous Message David Rowley 2018-04-04 02:31:17 Re: [HACKERS] Runtime Partition Pruning