Re: [EXTERNAL] Re: PQcancel does not use tcp_user_timeout, connect_timeout and keepalive settings

From: Andres Freund <andres(at)anarazel(dot)de>
To: Jelte Fennema <Jelte(dot)Fennema(at)microsoft(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [EXTERNAL] Re: PQcancel does not use tcp_user_timeout, connect_timeout and keepalive settings
Date: 2022-01-05 18:42:25
Message-ID: 20220105184225.5xfekapzg5o3uq7a@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-12-28 15:49:00 +0000, Jelte Fennema wrote:
> The first patch is a cleaned up version of my previous patch. I think I addressed
> all feedback on the previous version in that patch (e.g. removed windows code,
> fixed formatting).

To me it seems a bit problematic to introduce a divergence between windows /
everything else here. Isn't that just going to lead to other complaints just
like this thread, where somebody discovered the hard way that there's platform
dependent behaviour here?

> The second patch is a new one, it implements honouring of the connect_timeout
> connection option in PQcancel. This patch requires the first patch to also be applied,
> but since it seemed fairly separate and the code is not trivial I didn't want the first
> patch to be blocked on this.
>
> Finally, I would love it if once these fixes are merged the would also be backpatched to
> previous versions of libpq. Does that seem possible? As far as I can tell it would be fine,
> since it doesn't really change any of the public APIs. The only change is that the pg_cancel
> struct now has a few additional fields. But since that struct is defined in libpq-int.h, so that
> struct should not be used by users of libpq directly, right?.

I'm not really convinced this is a good patch to backpatch. There does seem to
be some potential for subtle breakage - code in signal handlers is notoriously
finnicky, it's a rarely exercised code path, etc. It's also not fixing
something that previously worked.

> + * NOTE: These socket options are currently not set for Windows. The
> + * reason is that signal safety in this function is very important, and it
> + * was not clear to if the functions required to set the socket options on
> + * Windows were signal-safe.
> + */
> +#ifndef WIN32
> + if (!IS_AF_UNIX(cancel->raddr.addr.ss_family))
> + {
> +#ifdef TCP_USER_TIMEOUT
> + if (cancel->pgtcp_user_timeout >= 0)
> + {
> + if (setsockopt(tmpsock, IPPROTO_TCP, TCP_USER_TIMEOUT,
> + (char *) &cancel->pgtcp_user_timeout,
> + sizeof(cancel->pgtcp_user_timeout)) < 0)
> + {
> + strlcpy(errbuf, "PQcancel() -- setsockopt(TCP_USER_TIMEOUT) failed: ", errbufsize);
> + goto cancel_errReturn;
> + }
> + }
> +#endif
> +
> + if (cancel->keepalives != 0)
> + {
> + int on = 1;
> +
> + if (setsockopt(tmpsock,
> + SOL_SOCKET, SO_KEEPALIVE,
> + (char *) &on, sizeof(on)) < 0)
> + {
> + strlcpy(errbuf, "PQcancel() -- setsockopt(SO_KEEPALIVE) failed: ", errbufsize);
> + goto cancel_errReturn;
> + }
> + }

This is very repetitive - how about introducing a helper function for this?

> @@ -4467,8 +4601,8 @@ retry3:
>
> crp.packetlen = pg_hton32((uint32) sizeof(crp));
> crp.cp.cancelRequestCode = (MsgType) pg_hton32(CANCEL_REQUEST_CODE);
> - crp.cp.backendPID = pg_hton32(be_pid);
> - crp.cp.cancelAuthCode = pg_hton32(be_key);
> + crp.cp.backendPID = pg_hton32(cancel->be_pid);
> + crp.cp.cancelAuthCode = pg_hton32(cancel->be_key);

Others might differ, but I'd separate changing the type passed to
internal_cancel() into its own commit.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Westermann (DWE) 2022-01-05 18:48:57 Are we missing a dot in initdb's output?
Previous Message Tom Lane 2022-01-05 18:38:42 Re: [PATCH] Run UTF8-dependent tests for citext [Re: daitch_mokotoff module]