Re: EINTR error in SunOS

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: EINTR error in SunOS
Date: 2006-01-02 23:52:08
Message-ID: 87mzie5gt3.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:

> The kernel is trying to be helpful by returning EINTR to say "ok, it
> didn't complete. There's no error yet but it may yet work".

Well it only returns EINTR if a signal was received.

> With local hard drives if they don't respond, you assume they're broken.
> When NFS servers don't respond you assume someone has temporarily pulled a
> cable and it will come back soon. Huh?

Well firstly with local hard drives you never get EINTR. Interrupts won't be
delivered until after the syscall returns. You don't get EINTR because in the
original BSD implementation it was more efficient to implement it that way and
since disk i/o was always extremely fast it didn't threaten to delay your
signals.

You're mixing up operations timing out with signals being received. The reason
you don't want NFS filesystem operations timing out (and you really don't) is
that it's *possible* it will come back later.

If you're the sysadmin and you're told your NFS server is down so you fix it
and it comes back up properly you should be able to expect that the world
returns to normal.

If you have the "soft" option enabled then you now have to run around
restarting every other service in your data center because you don't know
which ones might have received an error and crashed.

Worse, if any of those programs failed to notice the error (and they're not
wrong to, traditionally certain operations never signaled errors) then your
data is now corrupt. Some updates have been made but not others, and later
updates may be based on the incorrect data.

Now on the other hand the "intr" option is entirely reasonable to enable as
long as you know you don't have software that doesn't expect it. It only kicks
in if an actual signal is received, such as the user hitting C-c. Even if the
server comes back 20m later the user isn't going to be upset that his C-c got
handled. The only problem is that some software doesn't expect to get EINTR
handles it poorly.

> I would vote for the kernel, if the server didn't respond within 5
> seconds, to simply return EIO. At least we know how to handle that...

How do you handle it? By having Postgres shut down? And then the NFS server
comes back and then what?

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2006-01-02 23:52:13 Re: psql & readline & win32
Previous Message Doug McNaught 2006-01-02 23:37:45 Re: EINTR error in SunOS