Re: Patch to improve reliability of postgresql on linux nfs

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: George Barnett <gbarnett(at)atlassian(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch to improve reliability of postgresql on linux nfs
Date: 2011-09-12 13:27:00
Message-ID: 77114E33-65AC-413C-93FF-8A4556382C8F@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sep12, 2011, at 14:54 , Peter Eisentraut wrote:
> On mån, 2011-09-12 at 16:46 +1000, George Barnett wrote:
>> On 12/09/2011, at 3:59 PM, Florian Pflug wrote:
>>> Still, I agree with Noah and Kevin that we ought to deal more gracefully with this, i.e. resubmit after a partial read() or write(). AFAICS there's nothing to be gained by not doing that, and the increase in code complexity should be negligible. If we do that, however, I believe we might as well handle EINTR correctly, even if SA_RESTART should prevent us from ever seeing that.
>>
>> It does still concern me that pgsql did not deal with this as gracefully as other software. I hope the list will consider a patch to resolve that.
>
> We have signal handling configured so that system calls are not
> interrupted. So there is ordinarily no reason to do anything more
> graceful. The problem is that NFS is in this case not observing that
> setting. It's debatable whether it's worth supporting that; just saying
> that the code is correct as it stands.

SA_RESTART doesn't protect against partial reads/writes due to signal delivery,
it only removes the need to check for EINTR. In other words, it retries until
at least one byte has been written, not until all bytes have been written.

The GNU LibC documentation has this to say on the subject

"There is one situation where resumption never happens no matter which
choice you make: when a data-transfer function such as read or write is
interrupted by a signal after transferring part of the data. In this case,
the function returns the number of bytes already transferred, indicating
partial success."[1]

While it's true that reads and writes are by tradition non-interruptible, I
personally wouldn't bet that they'll stay that way forever. It all depends on
whether the timeouts involved in the communication with a disk are short enough
to mask the difference - once they get too long (or even infinite like in the
case of "hard" NFS mounts) you pay for non-interruptible primitives with
un-killable stuck processes. Since the current trend is to move storage further
away from processing, and to put non-deterministic networks like ethernet between
the two, I expect situations in which read/write primitives are interruptible
to increase, not decrease.

And BTW, the GNU LibC documentations doesn't seem to mention anything about
local reads and writes being non-interruptible. In fact, it even says

"A signal can arrive and be handled while an I/O primitive such as open or read
is waiting for an I/O device. If the signal handler returns, the system faces
the question: what should happen next?"[1]

Had the GNU people faith in local read and writes being non-interruptible, they'd
probably have said "network device" or "remove device", not "I/O device".

best regards,
Florian Pflug

[1] http://www.gnu.org/s/hello/manual/libc/Interrupted-Primitives.html#Interrupted-Primitives

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Pflug 2011-09-12 13:39:02 Re: Patch to improve reliability of postgresql on linux nfs
Previous Message Peter Eisentraut 2011-09-12 13:01:39 Re: What Would You Like To Do?