Re: Timeout parameters

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: nagaura(dot)ryohei(at)jp(dot)fujitsu(dot)com
Cc: coelho(at)cri(dot)ensmp(dot)fr, k(dot)jamison(at)jp(dot)fujitsu(dot)com, tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com, robertmhaas(at)gmail(dot)com, MikalaiKeida(at)ibagroup(dot)eu, AYahorau(at)ibagroup(dot)eu, michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Timeout parameters
Date: 2019-03-28 04:43:11
Message-ID: 20190328.134311.91428951.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

At Thu, 28 Mar 2019 01:11:23 +0000, "Nagaura, Ryohei" <nagaura(dot)ryohei(at)jp(dot)fujitsu(dot)com> wrote in <EDA4195584F5064680D8130B1CA91C4540EEAD(at)G01JPEXMBYT04>
> Hello, Fabien-san.
>
> > From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
> > About the backend v11 patch.
> > No space or newline before ";". Same comment about the libpq_ timeout.
> Fixed.
>
> > There is an error in the code, I think it should be < 0 to detect errors.
> Yes, thanks!
>
> > If the parameter has no effect on Windows, I do not see why its value should be
> > constrained to zero, it should just have no effect. Idem libpq_ timeout.
> I had a misunderstanding.
> Indeed, it doesn't need to be zero. Removed.
>
> > Documentation:
> > ISTM this is not about libpq connections but about TCP connections. There can be
> > non libpq implementations client side.
> Then, where do you think the correct place?
> I thought that this parameter should be explained here because the communication
> will be made through the library libpq same as keepalive features.

In TCP_backend patch:

+ <para>
+ Define a wrapper for <literal>TCP_USER_TIMEOUT</literal>
+ socket option of libpq connection.
+ </para>
+ <para>
+ Specifies the number of milliseconds after which a TCP connection can be
+ aborted by the operation system due to network problems when sending or
+ receiving the data through this connection. A value of zero uses the
+ system default. In sessions connected via a Unix-domain socket, this
+ parameter is ignored and always reads as zero. This parameter is
+ supported only on systems that support
+ <literal>TCP_USER_TIMEOUT</literal>; on other systems, it has no effect.
+ </para>

I think this is not mentioning backend. Why don't you copy'n
paste then modify the description of tcp_keepalives_idle? Perhaps
it needs a similar caveat related to Windows.

+static const char*
+show_tcp_user_timeout(void)
+{
+ /* See comments in assign_tcp_keepalives_idle */
+ static char nbuf[16];
+
+ snprintf(nbuf, sizeof(nbuf), "%d", tcp_user_timeout);
+ return nbuf;
+}

The reason for, say, tcp_keepalive_idle uses the hook is that it
doesn't show the GUC variable, but the actual value read by
getsockopt. This is just showing the GUC value. I think this
should behave the same way with tcp_keepalive* options. If not,
we should have an explanation of the reason there.

In TCP_interface patch:

+ <para>
+ Define a wrapper for <literal>TCP_USER_TIMEOUT</literal>
+ socket option of libpq connection.

What does the "wrapper" mean?

+ </para>
+ <para>
+ Specifies the number of milliseconds after which a TCP connection can be
+ aborted by the operation system due to network problems when sending or
+ receiving the data through this connection. A value of zero uses the
+ system default. In sessions connected via a Unix-domain socket, this
+ parameter is ignored and always reads as zero. This parameter is
+ supported only on systems that support
+ <literal>TCP_USER_TIMEOUT</literal>; on other systems, it has no effect.
+ </para>

I would suggest the same thing as the backend
part. Copy'n-paste-modify keepalives_idle would be better.

+ if (setsockopt(conn->sock, IPPROTO_TCP, TCP_USER_TIMEOUT,
+ (char *) &timeout, sizeof(timeout)) < 0 && errno != ENOPROTOOPT)
+ {
+ char sebuf[256];
+
+ appendPQExpBuffer(&conn->errorMessage,
+ libpq_gettext("setsockopt(TCP_USER_TIMEOUT) failed: %s\n"),

I suppose that the reason ENOPROTOOPT is excluded from error
condition is that the system call may fail with that errno on
older kernels, but I don't think that that justifies ignoring the
failure.

+ if (!IS_AF_UNIX(addr_cur->ai_family))
+ {
+ if (!setTCPUserTimeout(conn))
+ {
+ closesocket(conn->sock);
+ conn->sock = -1;
+ conn->addr_cur = addr_cur->ai_next;
+ goto keep_going;
+ }
+ }

I don't see a point in the added part having own "if
(!IS_AF_UNIX" block separately from tcp_keepalive options.

+ /* TCP USER TIMEOUT */
+ {"tcp_user_timeout", NULL, NULL, NULL,
+ "TCP_user_timeout", "", 10, /* strlen(INT32_MAX) == 10 */
+ offsetof(struct pg_conn, pgtcp_user_timeout)},
+

Similary, why isn't this placed just after tcp_keepalives
options?

+ char *pgtcp_user_timeout; /* tcp user timeout (numeric string) */

+ char *tcp_user_timeout; /* tcp user timeout */

The latter doesn't seem to be used.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-03-28 04:53:24 Re: Ordered Partitioned Table Scans
Previous Message Thomas Munro 2019-03-28 04:34:52 Re: pgsql: Compute XID horizon for page level index vacuum on primary.