Re: libpq connection timeout mismanagement

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: libpq connection timeout mismanagement
Date: 2018-08-11 14:42:16
Message-ID: alpine.DEB.2.21.1808111611040.30788@lancre
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Tom,

> The patch that taught libpq about allowing multiple target hosts
> modified connectDBComplete() with the intent of making the
> connect_timeout (if specified) apply per-host, not to the complete
> connection attempt. It did not do a very good job though, because
> the timeout only gets reset when connectDBComplete() itself detects
> a timeout. If PQconnectPoll advances to a new host due to some
> other cause, the previous host's timeout continues to run, possibly
> causing a premature timeout failure for the new one.
>
> Another thing that I find pretty strange is that it is coded so that,
> in event of a timeout detection by connectDBComplete, we give up on the
> current connhost entry and advance to the next host, ignoring any
> additional addresses we might have for the current hostname. This seems
> at best poorly thought through. There's no good reason for libpq to
> assume that all the addresses returned by DNS point at the same machine,
> or share the same network failure points in between.
>
> Attached is an attempt to improve this. I'll park it in the Sept fest.

Patch does not "git apply", but is ok with "patch -p1". Compiles.
Global "make check" is okay.

Feature works for me, i.e. each target host seems to get at least its
timeout.

Code looks straightforward enough.

In passing:

The code suggests that timeout is always 2 or more

if (timeout < 2) timeout = 2;

but doc says "It is not recommended to use a timeout of less than 2
seconds", which is inconsistent. It should read "Timeout is always set to
at least 2 whatever the user asks.".

connect_timeout=2.9 is accepted and considered as meaning 2.
connect_timeout=-10 or connect_timeout=two are also accepted and mean
forever. Probably thanks to "atoi".

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Seltenreich 2018-08-11 15:45:43 [sqlsmith] ERROR: plan should not reference subplan's variable
Previous Message Tomas Vondra 2018-08-11 14:18:20 Re: logical decoding / rewrite map vs. maxAllocatedDescs