Re: [Proposal] Add foreign-server health checks infrastructure

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Katsuragi Yuta <katsuragiy(at)oss(dot)nttdata(dot)com>, Ted Yu <yuzhihong(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, vignesh C <vignesh21(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, "Shinya11(dot)Kato(at)oss(dot)nttdata(dot)com" <Shinya11(dot)Kato(at)oss(dot)nttdata(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Subject: Re: [Proposal] Add foreign-server health checks infrastructure
Date: 2023-02-20 01:23:32
Message-ID: CAHut+Puqtm1tYJNtggNKANDy7foJD2zuJog4Vb6u9YBE3vLgKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here is a code review only for patch v31-0001.

======
General Comment

1.
PQcanConnCheck seemed like a strange API name. Maybe it can have the
same prefix as the other?

e.g.

- PQconnCheck()
- PGconnCheckSupported()

or

- PQconnCheck()
- PGconnCheckable()

======
Commit Message

2.
PqconnCheck() function allows to check the status of socket by polling
the socket. This function is currently available only on systems that
support the non-standard POLLRDHUP extension to the poll system call,
including Linux.

PqcanConnCheck() checks whether above function is available or not.

~

2a.
"status of socket" --> "status of the connection"

~

2b.
"above function" --> "the above function"

======
doc/src/sgml/libpq.sgml

3. PQconnCheck

Returns the health of the socket.

int PQconnCheck(PGconn *conn);

Unlike PQstatus, this function checks socket health. This check is
performed by polling the socket. This function is currently available
only on systems that support the non-standard POLLRDHUP extension to
the poll system call, including Linux. PQconnCheck returns greater
than zero if the remote peer seems to be closed, returns 0 if the
socket is valid, and returns -1 if the connection has been already
invalid or an error is error occurred.

~

3a.
Should these descriptions be referring to the health of the
*connection* rather than the health of the socket?

~

3b.
"has been already invalid" ?? wording

~~~

4. PQcanConnCheck

Returns whether PQconnCheck is available on this platform.
PQcanConnCheck returns 1 if the function is supported, otherwise
returns 0.

~

I thought this should be worded using "true" and "false" same as other
boolean functions on this page.

SUGGESTION
Returns true (1) or false (0) to indicate if the PQconnCheck function
is supported on this platform.

======
src/interfaces/libpq/fe-misc.c

5.
-static int pqSocketCheck(PGconn *conn, int forRead, int forWrite,
- time_t end_time);
+static int pqSocketIsReadableOrWritableOrValid(PGconn *conn, int forRead,
+ int forWrite, time_t end_time);

I was not 100% sure overloading this API is the right thing to do.
Doesn't this introduce a subtle side-effect on some of the existing
callers? e.g. Previously pqWaitTimed would ALWAYS return 0 if
forRead/forWrite were both false. But now other return values like
errors will be possible. Is that OK?

~~~

6. pqSocketPoll

/*
* Check a file descriptor for read and/or write data, possibly waiting.
* If neither forRead nor forWrite are set, immediately return a timeout
* condition (without waiting). Return >0 if condition is met, 0
* if a timeout occurred, -1 if an error or interrupt occurred.
*
* Timeout is infinite if end_time is -1. Timeout is immediate (no blocking)
* if end_time is 0 (or indeed, any time before now).
*
* Moreover, when neither forRead nor forWrite is requested and timeout is
* disabled, try to check the health of socket.
*/

The new comment "Moreover..." is contrary to the earlier part of the
same comment which already said, "If neither forRead nor forWrite are
set, immediately return a timeout condition (without waiting)."

There might be side-effects to previous/existing callers of this
function (e.g. pqWaitTimed via pqSocketCheck)

~~~

7.
if (!forRead && !forWrite)
- return 0;
+ {
+ /* Try to check the health if requested */
+ if (end_time == 0)
+#if defined(POLLRDHUP)
+ input_fd.events = POLLRDHUP | POLLHUP | POLLNVAL;
+#else
+ return 0;
+#endif /* defined(POLLRDHUP) */
+ else
+ return 0;
+ }

FYI - I think the new code can be simpler without needing #else by
calling your other new function.

SUGGESTION
if (!forRead && !forWrite)
{
if (!PQcanConnCheck() || end_time != 0)
return 0;

/* Check the connection health when end_time is 0 */
Assert(PQcanConnCheck() && end_time == 0);
#if defined(POLLRDHUP)
input_fd.events = POLLRDHUP | POLLHUP | POLLNVAL;
#endif
}

~~~

8. PQconnCheck
+/*
+ * Check whether PQconnCheck() can work well on this platform.
+ *
+ * Returns 1 if this can use PQconnCheck(), otherwise 0.
+ */
+int
+PQcanConnCheck(void)
+{
+#if (defined(HAVE_POLL) && defined(POLLRDHUP))
+ return true;
+#else
+ return false;
+#endif
+}

~

8a.
"can work well" --> "works"

~

8b.
Maybe better to say "true (1)" and "otherwise false (0)"

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2023-02-20 02:27:26 RE: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message Andres Freund 2023-02-20 01:18:28 Re: cfbot failures