Re: Timeout parameters

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: "Nagaura, Ryohei" <nagaura(dot)ryohei(at)jp(dot)fujitsu(dot)com>, "Jamison, Kirk" <k(dot)jamison(at)jp(dot)fujitsu(dot)com>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "AYahorau(at)ibagroup(dot)eu" <AYahorau(at)ibagroup(dot)eu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, "MikalaiKeida(at)ibagroup(dot)eu" <MikalaiKeida(at)ibagroup(dot)eu>
Subject: Re: Timeout parameters
Date: 2019-03-13 17:57:38
Message-ID: CA+TgmobSF23f72fhSGq0ek=xGhjic_o6EZHxx_75HVCMqK19SA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 13, 2019 at 1:25 PM Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
> Hmmm... ISTM that we are not talking about the same patch...

You are correct! I was talking about the patches that allow user
control of TCP_USER_TIMEOUT, which is apparently totally different
from the socket_timeout patch that you're talking about.

The first thing I notice about the socket_timeout patch is that the
documentation is definitely wrong:

+ Number of seconds to wait for socket read/write operation before
+ closing the connection, as a decimal integer. This can be
used both as a
+ force global query timeout and network problems detector. A value of

This can't be used to force a global query timeout, because any kind
of protocol message - e.g. a NOTICE or WARNING - would reset the
timeout, allowing the continue past the supposedly-global query
timeout. There may be some other ways that can happen, too, either
currently or in the future.

> My point is about the "socket_timeout" patch which timeout on not
> receiving an answer, but is not related to the network connection.

The above-quoted documentation disagrees with you about that, claiming
that this can be used as a network problem detector. However, I think
that the documentation is wrong about that, too. The mere fact that
the connection is idle does not show that there is a network problem;
there could, for example, be TCP keepalives being sent and received
behind the scenes, and if so, this patch would cause a connection that
is known to be valid to be disconnected for no reason.

Actually, I think that socket_timeout_v8.patch is a bad idea and
should be rejected, not because it doesn't send a cancel to the server
but just because it's not the right way of solving either of the
problems that it purports to solve. If you want a timeout for
queries, you need that to be administered by the server, which knows
when it starts and finishes executing a query; you cannot substitute a
client-side judgement based on the last time we received a byte of
data. Maybe a client-side judgement would work if the timer were
armed when we send a Query or Execute message and disarmed when we
receive ReadyForQuery. And, if you want a network problem detector,
then you should use keepalives and perhaps the TCP_USER_TIMEOUT stuff,
so that you don't kill perfectly good connections that just happen to
be idle.

The only argument I can really see for socket_timeout_v8.patch is that
there might be platforms where setting keepalives and/or
TCP_USER_TIMEOUT doesn't work. However, if that is the issue, then I
think we should look for equivalent facilities that do work on those
platforms and consider supporting those facilities, or else just
decide that if certain features are not supported on obscure platforms
then users of those platforms are going to have to live without those
features. If we add something like this, people are likely to try to
use it and then get frustrated when it doesn't work right.

It's also worth noting that it's generally a bad idea to try to make
one facilities do two different things with conflicting requirements.
It seems not unlikely that if we add this, somebody will come back and
ask us to add keepalives to the protocol itself (rather than relying
on TCP) to keep this from killing the connection in vain -- and then
somebody else will object to that because they're using it to kill off
queries that run too long (and have raised client_min_messages to
ERROR in an attempt to avoid getting any other messages meanwhile).
By munging together two different requirements into a single facility,
we make it impossible to decide which side of that argument is
correct. It will just come down to who yells the loudest.

Note that none of this is an argument against the TCP_USER_TIMEOUT
portion of this patch set.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-03-13 18:03:02 Re: hyrax vs. RelationBuildPartitionDesc
Previous Message Alvaro Herrera 2019-03-13 17:38:02 Re: hyrax vs. RelationBuildPartitionDesc