RE: [Proposal] Add foreign-server health checks infrastructure

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Peter Smith' <smithpb2250(at)gmail(dot)com>
Cc: Katsuragi Yuta <katsuragiy(at)oss(dot)nttdata(dot)com>, Ted Yu <yuzhihong(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, vignesh C <vignesh21(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, "Shinya11(dot)Kato(at)oss(dot)nttdata(dot)com" <Shinya11(dot)Kato(at)oss(dot)nttdata(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Subject: RE: [Proposal] Add foreign-server health checks infrastructure
Date: 2023-02-20 06:43:29
Message-ID: TYAPR01MB58667F1E17C0B79BDFFB48F9F5A49@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Peter,

Thank you for reviewing! Latest version can be seen in [1].

> 1.
> PQcanConnCheck seemed like a strange API name. Maybe it can have the
> same prefix as the other?
>
> e.g.
>
> - PQconnCheck()
> - PGconnCheckSupported()
>
> or
>
> - PQconnCheck()
> - PGconnCheckable()
>

I choose PQconnCheckable().

> 2.
> PqconnCheck() function allows to check the status of socket by polling
> the socket. This function is currently available only on systems that
> support the non-standard POLLRDHUP extension to the poll system call,
> including Linux.
>
> PqcanConnCheck() checks whether above function is available or not.
>
> ~
>
> 2a.
> "status of socket" --> "status of the connection"
>
> ~
>
> 2b.
> "above function" --> "the above function"

Fixed.

> doc/src/sgml/libpq.sgml
>
> 3. PQconnCheck
>
> Returns the health of the socket.
>
> int PQconnCheck(PGconn *conn);
>
> Unlike PQstatus, this function checks socket health. This check is
> performed by polling the socket. This function is currently available
> only on systems that support the non-standard POLLRDHUP extension to
> the poll system call, including Linux. PQconnCheck returns greater
> than zero if the remote peer seems to be closed, returns 0 if the
> socket is valid, and returns -1 if the connection has been already
> invalid or an error is error occurred.
>
> ~
>
> 3a.
> Should these descriptions be referring to the health of the
> *connection* rather than the health of the socket?

Reworded.

> 3b.
> "has been already invalid" ?? wording

I checked codes and found that the socket becomes PGINVALID_SOCKET
after being closed. So I clarified that.

> 4. PQcanConnCheck
>
> Returns whether PQconnCheck is available on this platform.
> PQcanConnCheck returns 1 if the function is supported, otherwise
> returns 0.
>
> ~
>
> I thought this should be worded using "true" and "false" same as other
> boolean functions on this page.
>
> SUGGESTION
> Returns true (1) or false (0) to indicate if the PQconnCheck function
> is supported on this platform.

Fixed.

> ======
> src/interfaces/libpq/fe-misc.c
>
> 5.
> -static int pqSocketCheck(PGconn *conn, int forRead, int forWrite,
> - time_t end_time);
> +static int pqSocketIsReadableOrWritableOrValid(PGconn *conn, int forRead,
> + int forWrite, time_t end_time);
>
> I was not 100% sure overloading this API is the right thing to do.
> Doesn't this introduce a subtle side-effect on some of the existing
> callers? e.g. Previously pqWaitTimed would ALWAYS return 0 if
> forRead/forWrite were both false. But now other return values like
> errors will be possible. Is that OK?

I checked pqWaitTimed() and seems not to affect other parts.

As the first place, it is an internal function and will be never called from clients.
There are 2 functions that call that - connectDBComplete() and pqWait(), and both of
them are not set finish_time to zero. In connectDBComplete(), basically finish_time
is set to -1, and set to time(NULL) + connect_timeout if the timeout is enabled.

> 6. pqSocketPoll
>
> /*
> * Check a file descriptor for read and/or write data, possibly waiting.
> * If neither forRead nor forWrite are set, immediately return a timeout
> * condition (without waiting). Return >0 if condition is met, 0
> * if a timeout occurred, -1 if an error or interrupt occurred.
> *
> * Timeout is infinite if end_time is -1. Timeout is immediate (no blocking)
> * if end_time is 0 (or indeed, any time before now).
> *
> * Moreover, when neither forRead nor forWrite is requested and timeout is
> * disabled, try to check the health of socket.
> */
>
>
> The new comment "Moreover..." is contrary to the earlier part of the
> same comment which already said, "If neither forRead nor forWrite are
> set, immediately return a timeout condition (without waiting)."
>
> There might be side-effects to previous/existing callers of this
> function (e.g. pqWaitTimed via pqSocketCheck)

Comments were fixed. About the side-effect, please see previous discussion.

> 7.
> if (!forRead && !forWrite)
> - return 0;
> + {
> + /* Try to check the health if requested */
> + if (end_time == 0)
> +#if defined(POLLRDHUP)
> + input_fd.events = POLLRDHUP | POLLHUP | POLLNVAL;
> +#else
> + return 0;
> +#endif /* defined(POLLRDHUP) */
> + else
> + return 0;
> + }
>
> FYI - I think the new code can be simpler without needing #else by
> calling your other new function.
>
> SUGGESTION
> if (!forRead && !forWrite)
> {
> if (!PQcanConnCheck() || end_time != 0)
> return 0;
>
> /* Check the connection health when end_time is 0 */
> Assert(PQcanConnCheck() && end_time == 0);
> #if defined(POLLRDHUP)
> input_fd.events = POLLRDHUP | POLLHUP | POLLNVAL;
> #endif
> }

Fixed.

> 8. PQconnCheck
> +/*
> + * Check whether PQconnCheck() can work well on this platform.
> + *
> + * Returns 1 if this can use PQconnCheck(), otherwise 0.
> + */
> +int
> +PQcanConnCheck(void)
> +{
> +#if (defined(HAVE_POLL) && defined(POLLRDHUP))
> + return true;
> +#else
> + return false;
> +#endif
> +}
>
> ~
>
> 8a.
> "can work well" --> "works"
>
> ~
>
> 8b.
> Maybe better to say "true (1)" and "otherwise false (0)"

Fixed.

[1]: https://www.postgresql.org/message-id/TYAPR01MB58669C95604A0EB7BCC1B58CF5A49%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-02-20 06:44:15 Add support for unit "B" to pg_size_pretty()
Previous Message Hayato Kuroda (Fujitsu) 2023-02-20 06:42:32 RE: [Proposal] Add foreign-server health checks infrastructure