RE: [Proposal] Add foreign-server health checks infrastructure

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Katsuragi Yuta' <katsuragiy(at)oss(dot)nttdata(dot)com>
Cc: 'Ted Yu' <yuzhihong(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, vignesh C <vignesh21(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, "Shinya11(dot)Kato(at)oss(dot)nttdata(dot)com" <Shinya11(dot)Kato(at)oss(dot)nttdata(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Subject: RE: [Proposal] Add foreign-server health checks infrastructure
Date: 2023-03-03 12:24:52
Message-ID: TYAPR01MB586674A3C56AECB9B150BC84F5B39@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Katsuragi-san,

Thank you for reviewing! PSA new version.

> >> I rethought the pqSocketPoll part. Current interpretation of
> >> arguments seems a little bit confusing because a specific pattern
> >> of arguments has a different meaning. What do you think about
> >> introducing a new argument like `int forConnCheck`? This seems
> >> straightforward and readable.
> >
> > I think it may be better, so fixed.
> > But now we must consider another thing - will we support combination
> > of conncheck
> > and {read|write} or timeout? Especially about timeout, if someone
> > calls pqSocketPoll()
> > with forConnCheck = 1 and end_time = -1, the process may be stuck
> > because it waits
> > till socket peer closed connection.
> > Currently the forConnCheck can be specified with other request, but
> > timeout must be zero.
>
> Yes, we need to consider these.
> I'm wondering whether we need a special care for the combination
> of event and timeout. Surely, if forConnCheck is set and end_time = -1,
> pqSocketPoll blocks until the connection close. However, the
> behavior matches the meaning of the arguments and does not seem
> confusing (also not an error state). Do we need to restrict this
> kind of usage in the pqSocketPoll side? I think like the following
> might be fine.
>
> ```
> if (!forRead && !forWrite)
> {
> if (!(forConnCheck && PQconnCheckable()))
> return 0;
> }
> ```

Seems right, I was too pessimistic. Fixed.

> > While making the patch, I come up with idea that
> > postgres_fdw_verify_connection_states*
> > returns NULL if PQconnCheck() cannot work on this platform. This can be
> > done by
> > adding PQconnCheckable(). It may be reasonable because the checking is
> > not really
> > done on this environment. How do you think?
>
> I agree with you.

Changed.

> Followings are comments for v33. Please check.
> 0001:
> 1. the comment of PQconnCheck
> +/*
> + * Check whether the socket peer closed connection or not.
> + *
>
> Check whether the socket peer closed 'the' connection or not?

Changed.

> 2. the comment of pqSocketCheck
> - * or both. Returns >0 if one or more conditions are met, 0 if it
> timed
> - * out, -1 if an error occurred.
> + * or both. Moreover, this function can check the health of socket on
> some
> + * limited platforms if end_time is 0.
>
> the health of socket -> the health of the connection?
> if end_time is 0 -> if forConnCehck is specified?

Fixed.

> 3. the comment of pqSocketPoll
> - * If neither forRead nor forWrite are set, immediately return a
> timeout
> - * condition (without waiting). Return >0 if condition is met, 0
> - * if a timeout occurred, -1 if an error or interrupt occurred.
> + * Moreover, this function can check the health of socket on some
> limited
> + * platforms if end_time is 0.
>
> the health of socket -> the health of the connection?
> if end_time is 0 -> if forConnCehck is specified?

Fixed.

> 4. the code of pqSocketPoll
> +#if defined(POLLRDHUP)
> + if (forConnCheck)
> + input_fd.events |= POLLRDHUP;
> +#endif
>
> I think it is better to use PQconnCheckable() to remove the macro.

IIUC the macro is needed. In FreeBSD, macOS and other platforms do not have the
macro POLLRDHUP so they cannot compile. I checked by my CI and got following error.

```
...
FAILED: src/interfaces/libpq/libpq.a.p/fe-misc.c.o
...
../src/interfaces/libpq/fe-misc.c:1149:22: error: use of undeclared identifier 'POLLRDHUP'
input_fd.events |= POLLRDHUP;
````

It must be invisible from them.

> 5. the code of pqSocketCheck and the definition of pqSocketPoll
> - result = pqSocketPoll(conn->sock, forRead, forWrite,
> end_time);
> + result = pqSocketPoll(conn->sock, forRead, forWrite,
> forConnCheck,
> end_time);
>
> -pqSocketPoll(int sock, int forRead, int forWrite, time_t end_time)
> +pqSocketPoll(int sock, int forRead, int forWrite, int forConnCheck,
> time_t end_time)
>
> Should these be divided into two lines?

pgindent did not say anything about them, but I divided.

> 6. the comment of verify_cached_connections
> + * This function emits warnings if a disconnection is found, and this
> returns
> + * false only when the lastly verified server seems to be disconnected.

Updated. I think comments were not correct, so these were also fixed.

> It seems better to write the case where this function returns
> true.
> 7. the comment of postgres_fdw_verify_connection_states
> + * This function emits a warning if a disconnection is found. This
> returns
> + * false only when the verified server seems to be disconnected, and
> reutrns
> + * NULL if the connection check had not been done.
>
> It seems better to write the case where this function returns
> true.

> 8. the code of
> + (errcode(ERRCODE_CONNECTION_FAILURE),
> + errmsg("%s", str.data),
> + errdetail("Socket close is detected."),
> + errhint("Plsease check the health of
> server.")));
>
> Is it better to use "Connection close is detected" rather than
> "Socket close is detected"?

Changed.

> 9. the document of postgres_fdw
> The document of postgres_fdw_verify_connection_states_* is a little
> bit old. Could you update it?

Updated. IIUC postgres_fdw_verify_connection_states returns

* true, if the connection is verified.
* false, if the connection seems to be disconnected.
* NULL, if this is not the supported platform or connection has not been established.

And postgres_fdw_verify_connection_states_all returns

* true if all the connections are verified.
* false, if one of connections seems to be disconnected.
* NULL, if this is not the supported platform or this backend has never established connections

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
v34-0001-Add-PQconnCheck-and-PQconnCheckable-to-libpq.patch application/octet-stream 8.7 KB
v34-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch application/octet-stream 12.3 KB
v34-0003-add-test.patch application/octet-stream 5.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jacktby@gmail.com 2023-03-03 12:37:48 What's MultiXactId?
Previous Message Dean Rasheed 2023-03-03 12:22:38 Re: Add support for unit "B" to pg_size_pretty()