Quick Links

Re: Some thoughts on NFS

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers(at)postgresql(dot)org, Craig Ringer <craig(dot)ringer(at)2ndquadrant(dot)com>
Subject:	Re: Some thoughts on NFS
Date:	2019-02-19 22:25:22
Message-ID:	CA+hUKGJ3J_ZYKpOFM9EF2BOA8y71MfP5_ipLPsSwpB+dTt+GBQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Feb 20, 2019 at 5:52 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > 1. Figure out how to get the ALLOCATE command all the way through the
> > stack from PostgreSQL to the remote NFS server, and know for sure that
> > it really happened. On the Debian buster Linux 4.18 system I checked,
> > fallocate() reports EOPNOTSUPP for fallocate(), and posix_fallocate()
> > appears to succeed but it doesn't really do anything at all (though I
> > understand that some versions sometimes write zeros to simulate
> > allocation, which in this case would be equally useless as it doesn't
> > reserve anything on an NFS server). We need the server and NFS client
> > and libc to be of the right version and cooperate and tell us that
> > they have really truly reserved space, but there isn't currently a way
> > as far as I can tell. How can we achieve that, without writing our
> > own NFS client?
> >
> > 2. Deal with the resulting performance suckage. Extending 8kb at a
> > time with synchronous network round trips won't fly.
>
> I think I'd just go for fsync();pwrite();fsync(); as the extension
> mechanism, iff we're detecting a tablespace is on NFS. The first fsync()
> to make sure there's no previous errors that we could mistake for
> ENOSPC, the pwrite to extend, the second fsync to make sure there's
> actually space. Then we can detect ENOSPC properly. That possibly does
> leave some errors where we could mistake ENOSPC as something more benign
> than it is, but the cases seem pretty narrow, due to the previous
> fsync() (maybe the other side could be thin provisioned and get an
> ENOSPC there - but in that case we didn't actually loose any data. The
> only dangerous scenario I can come up with is that the remote side is on
> thinly provisioned CoW system, and a concurrent write to an earlier
> block runs out of space - but seriously, good riddance to you).

This seems to make sense, and has the advantage that it uses
interfaces that exist right now. But it seems a bit like we'll have
to wait for them to finish building out the errseq_t support for NFS
to avoid various races around the mapping's AS_EIO flag (A: fsync() ->
EIO, B: fsync() -> SUCCESS, log checkpoint; A: panic), and then maybe
we'd have to get at least one of { fd-passing, direct IO, threads }
working on our side ...

--
Thomas Munro
https://enterprisedb.com

In response to

Re: Some thoughts on NFS at 2019-02-19 16:52:11 from Andres Freund

Responses

Re: Some thoughts on NFS at 2019-02-19 22:29:19 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2019-02-19 22:29:19	Re: Some thoughts on NFS
Previous Message	Thomas Munro	2019-02-19 22:08:45	Re: Some thoughts on NFS