Re: [Proposal] Add foreign-server health checks infrastructure

From: vignesh C <vignesh21(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Katsuragi Yuta <katsuragiy(at)oss(dot)nttdata(dot)com>, Ted Yu <yuzhihong(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, "Shinya11(dot)Kato(at)oss(dot)nttdata(dot)com" <Shinya11(dot)Kato(at)oss(dot)nttdata(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Subject: Re: [Proposal] Add foreign-server health checks infrastructure
Date: 2023-03-08 03:46:38
Message-ID: CALDaNm2H9ZyiPAOJCKHo5dHYv-u6u+8bv_T9PkdrvqVHnxyrmQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 7 Mar 2023 at 09:53, Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Katsuragi-san,
>
> Thank you for reviewing! PSA new version patches.
>
> > I think we can update the status to ready for committer after
> > this fix, if there is no objection.
>
> That's a very good news for me! How about other people?
>
> > >> 7. the document of postgres_fdw_verify_connection_states_all
> > >> <literal>NULL</literal>
> > >> + is returned if the local session does not have connection
> > >> caches,
> > >> or this
> > >> + function is not available on this platform.
> > >>
> > >> I think there is a case where a connection cache exists but valid
> > >> connections do not exist and NULL is returned (disconnection case).
> > >> Almost the same document as the postgres_fdw_verify_connection_states
> > >> case (comment no.5) seems better.
> > >
> > > Yes, but completely same statement cannot be used because these is not
> > > specified foreign server. How about:
> > > NULL is returned if there are no established connections or this
> > > function ...
> >
> > Yes, to align with the postgres_fdw_verify_connection_states()
> > case, how about writing the connection is not valid. Like the
> > following?
> > 'NULL is returned if no valid connections are established or
> > this function...'
>
> Prefer yours, fixed.
>
> > This is my comment for v35. Please check.
> > 0002:
> > 1. the comment of verify_cached_connections (I missed one minor point.)
> > + * This function emits warnings if a disconnection is found. This
> > returns true
> > + * if disconnections cannot be found, otherwise returns false.
> >
> > I think false is returned only if disconnections are found and
> > true is returned in all other cases. So, modifying the description
> > like the following seems better.
> > 'This returns false if disconnections are found, otherwise
> > returns true.'
>
> Fixed.

Few comments:
1) There is no handling of forConnCheck in #else HAVE_POLL, if this is
intentional we could add some comments for the same:
static int
-pqSocketPoll(int sock, int forRead, int forWrite, time_t end_time)
+pqSocketPoll(int sock, int forRead,
+ int forWrite, int forConnCheck, time_t end_time)
{
/* We use poll(2) if available, otherwise select(2) */
#ifdef HAVE_POLL
@@ -1092,7 +1133,11 @@ pqSocketPoll(int sock, int forRead, int
forWrite, time_t end_time)
int timeout_ms;

if (!forRead && !forWrite)
- return 0;

2) Can this condition be added to the parent if condition:
if (!forRead && !forWrite)
- return 0;
+ {
+ /* Connection check can be available on some limted platforms */
+ if (!(forConnCheck && PQconnCheckable()))
+ return 0;
+ }

3) Can we add a comment for PQconnCheckable:
+/* Check whether the postgres server is still alive or not */
+extern int PQconnCheck(PGconn *conn);
+extern int PQconnCheckable(void);
+

4) "Note that if 0 is returned and forConnCheck is requested" doesn't
sound right, it can be changed to "Note that if 0 is returned when
forConnCheck is requested"
+ * If neither forRead, forWrite nor forConnCheck are set, immediately return a
+ * timeout condition (without waiting). Return >0 if condition is met, 0 if a
+ * timeout occurred, -1 if an error or interrupt occurred. Note that if 0 is
+ * returned and forConnCheck is requested, it means that the socket has not
+ * matched POLLRDHUP event and the connection is not closed by the socket peer.

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2023-03-08 03:49:42 Re: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message Anton A. Melnikov 2023-03-08 03:43:47 Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"