Re: Some thoughts on NFS

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>
Subject: Re: Some thoughts on NFS
Date: 2019-02-19 18:45:28
Message-ID: CA+Tgmoa4V=nwXo4C8Pkni-PE0DMmkPJavWBu8mLLdGzkjdUkyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 19, 2019 at 1:29 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Is that a new thing? I ran across PostgreSQL-over-iSCSI a number of
> > years ago and the evidence strongly suggested that it did not reliably
> > report disk errors back to PostgreSQL, leading to corruption.
>
> How many years ago are we talking? I think it's been mostly robust in
> the last 6-10 years, maybe?

I think it was ~9 years ago.

> But note that the postgres + linux fsync
> issues would have plagued that use case just as well as it did local
> storage, at a likely higher incidence of failures (i.e. us forgetting to
> retry fsyncs in checkpoints, and linux throwing away dirty data after
> fsync failure would both have caused problems that aren't dependent on
> iSCSI).

IIRC, and obviously that's difficult to do after so long, there were
clearly disk errors in the kernel logs, but no hint of a problem in
the PostgreSQL logs. So it wasn't just a case of us responding to
errors with sufficient vigor -- either they weren't being reported at
all, or only to system calls we weren't checking, e.g. close or
something.

> And I think it's not that likely that we'd not screw up a
> number of times implementing iSCSI ourselves - not to speak of the fact
> that that seems like an odd place to focus development on, given that
> it'd basically require all the infrastructure also needed for local DIO,
> which'd likely gain us much more.

I don't really disagree with you here, but I also think it's important
to be honest about what size hammer is likely to be sufficient to fix
the problem. Project policy for many years has been essentially
"let's assume the kernel guys know what they are doing," but, I don't
know, color me a little skeptical at this point. We've certainly made
lots of mistakes all of our own, and it's certainly true that
reimplementing large parts of what the kernel does in user space is
not very appealing ... but on the other hand it looks like filesystem
error reporting isn't even really reliable for local operation (unless
we do an incredibly complicated fd-passing thing that has deadlock
problems we don't know how to solve and likely performance problems
too, or convert the whole backend to use threads) or for NFS operation
(though maybe your suggestion will fix that) so the idea that iSCSI is
just going to be all right seems a bit questionable to me. Go ahead,
call me a pessimist...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-02-19 18:46:45 Re: WAL insert delay settings
Previous Message Tomas Vondra 2019-02-19 18:43:14 Re: WAL insert delay settings