Expansion of our checks for connection-loss errors

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Subject: Expansion of our checks for connection-loss errors
Date: 2020-10-08 19:15:54
Message-ID: 2621622.1602184554@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Over in the thread at [1], we've tentatively determined that the
reason buildfarm member lorikeet is currently failing is that its
network stack returns ECONNABORTED for (some?) connection failures,
whereas our code is only expecting ECONNRESET. Fujii Masao therefore
proposes that we treat ECONNABORTED the same as ECONNRESET. I think
this is a good idea, but after a bit of research I feel it does not
go far enough. I find these POSIX-standard errnos that also seem
likely candidates to be returned for a hard loss of connection:

ECONNABORTED
EHOSTUNREACH
ENETDOWN
ENETUNREACH

All of these have been in POSIX since SUSv2, so it seems unlikely
that we need to #ifdef any of them. (It is in any case pretty silly
that we have #ifdefs around a very small minority of our references
to ECONNRESET :-(.)

There are some other related errnos, such as ECONNREFUSED, that
don't seem like they'd be returned for a failure of a pre-existing
connection, so we don't need to include them in such tests.

Accordingly, I propose the attached patch (an expansion of
Fujii-san's) that causes us to test for all five errnos anyplace
we had been checking for ECONNRESET. I felt that this was getting to
the point where we'd better centralize the knowledge of what to check,
so the patch does that, via an inline function and an admittedly hacky
macro. I also upgraded some places such as strerror.c to have full
support for these symbols.

All of the machines I have (even as far back as HPUX 10.20) also
define ENETRESET and EHOSTDOWN. However, those symbols do not appear
in SUSv2. ENETRESET was added at some later point, but EHOSTDOWN is
still not in POSIX. For the moment I've left these second-tier
symbols out of the patch, but there's a case for adding them. I'm
not sure whether there'd be any point in trying to #ifdef them.

BTW, I took out the conditional defines of some of these errnos in
libpq's win32.h; AFAICS that's been dead code ever since we added
#define's for them to win32_port.h. Am I missing something?

This seems like a bug fix to me, so I'm inclined to back-patch.

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/E1kPc9v-0005L4-2l%40gemulon.postgresql.org

Attachment Content-Type Size
recognize-more-connection-loss-errnos-1.patch text/x-diff 7.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-10-08 19:43:32 Re: Wrong example in the bloom documentation
Previous Message Bruce Momjian 2020-10-08 18:32:00 Re: Wrong example in the bloom documentation