Re: Patch to improve reliability of postgresql on linux nfs

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Aidan Van Dyk <aidan(at)highrise(dot)ca>
Cc: George Barnett <gbarnett(at)atlassian(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch to improve reliability of postgresql on linux nfs
Date: 2011-09-13 14:14:23
Message-ID: 8CB04209-C5B3-48B1-A2E5-F41FAA6BC367@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sep13, 2011, at 15:05 , Aidan Van Dyk wrote:
> On Tue, Sep 13, 2011 at 7:30 AM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>> Sorry for the self-reply. I realized only after hitting send that I
>> got the ENOSPC handling wrong again - we probably ought to check for
>> ENOSPC as well as ret == 0. Also, it seems preferable to return the
>> number of bytes actually written instead of -1 if we hit an error during
>> retry.
>>
>> With this version, any return value other than <amount> signals an
>> error, the number of actually written bytes is reported even in the
>> case of an error (to the best of pg_write_nointr's knowledge), and
>> errno always indicates the kind of error.
>
> Personally, I'ld think that's ripe for bugs. If the contract is that
> ret != amount is the "error" case, then don't return -1 for an error
> *sometimes*.

Hm, but isn't that how write() works also? AFAIK (non-interruptible) write()
will return the number of bytes written, which may be less than the requested
number if there's not enough free space, or -1 in case of an error like
an invalid fd being passed.

> If you sometimes return -1 for an error, even though ret != amount is
> the *real* test, I'm going to guess there will be lots of chance for
> code to do:
> if (pg_write_no_intr(...) < 0)
> ...
>
> which will only catch some of the errors, and happily continue with the rest...

Yeah, but that's equally wrong for plain write(), so I'm not sure I share
your concern there. Also, I'm not sure how to improve that. We could always
return -1 in case of an error, and "amount" in case of success, but that makes
it impossible to determine how many bytes where actually written (and also feel
awkward). Or we could return 0 instead of -1 if there was an error and zero
bytes where written. But that feels awkward also...

One additional possibility would be to make the signature

boolean pg_write_nointr(int fd, const void *bytes, int len, int *written)

and simply return true on success and false on error. Callers who're interested
in the number of bytes actually written (in the case of an error) would need to
pass some non-NULL pointer for <written>, while all others would just pass NULL.

Thoughts?

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-09-13 14:17:20 Re: SSL key with passphrase
Previous Message Robert Haas 2011-09-13 14:02:51 Re: augmenting MultiXacts to improve foreign keys