Re: [doc fix] PG10: wroing description on connect_timeout when multiple hosts are specified

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [doc fix] PG10: wroing description on connect_timeout when multiple hosts are specified
Date: 2017-05-15 17:32:15
Message-ID: CA+Tgmob1CHff46feC5LeOAVHDON=QzBoq-apmCm7KwG6urDGMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, May 14, 2017 at 11:45 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> I'll add this item in the PostgreSQL 10 Open Items.
>
> [Action required within three days. This is a generic notification.]

I think there is a good argument that the existing behavior is as per
the documentation, but I think we may want to change it anyway. What
the documentation is saying - or at least what I believe I intended
for it to say - is that connect_timeout is restarted for each new
host, so you could end up waiting longer than connect_timeout - but
not forever - if you specify multiple hosts. And I believe that
statement to be correct. Takayuki Tsunakawa is saying something
different. He's saying that when connect_timeout expires, we should
try the next host instead of giving up. That may or may not be a good
idea, but it doesn't contradict the passage from the documentation
which he quoted. That passage from the documentation doesn't say
anything at all about what happens when connect_timeout expires. It
only talks about how much time might pass before that happens.

Takayuki Tsunakawa raised a very similar issue in another thread
related to another open item, namely
https://www.postgresql.org/message-id/flat/0A3221C70F24FB45833433255569204D1F6F5659%40G01JPEXMBYT05
in which he argued that libpq ought to try then next host after a
connection failure regardless of the reason for the connection
failure. Tom, Michael Paquier, and I all disagreed; none of us
believe that this feature was intended to retry the connection to a
different host after an arbitrary error reported by the remote server.
This thread is essentially the same issue, except here the question
isn't what should happen after we connect to a server and it returns
an error, but rather what happens when we time out waiting to connect
to a server. When that happens, should we give up, or try the next
server?

Despite the chorus of support for the opposite conclusion on the other
thread, I'm inclined to think that it would be best to change the
behavior here as per the proposed patch. The point of being able to
specify multiple hosts is to be able to have multiple database servers
(or perhaps, multiple ways to access the same database server) and use
whichever one of those servers is currently up. I think that when the
server fails with a complaint like "I've never heard of the database
to which you want to connect" that's not a case of the server being
down, but some other kind of trouble that the administrator really
ought to fix; thus it's best to stop and report the error. But if
connect_timeout expires, that sounds a whole lot like the server being
down. It sounds morally equivalent to socket() or connect() failing
outright, which *would* trigger advancing to the next host.

So I'm inclined to accept the patch, but as a definitional change
rather than a bug fix. However, I'd like to hear some other opinions.
I'll wait until Friday for such opinions to arrive, and then update on
next steps.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-05-15 17:54:57 Re: [bug fix] PG10: libpq doesn't connect to alternative hosts when some errors occur
Previous Message Alvaro Herrera 2017-05-15 17:13:31 Re: Small improvement to compactify_tuples