Re: [Proposal] Add foreign-server health checks infrastructure

From: Önder Kalacı <onderkalaci(at)gmail(dot)com>
To: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shinya11(dot)Kato(at)oss(dot)nttdata(dot)com" <Shinya11(dot)Kato(at)oss(dot)nttdata(dot)com>, "zyu(at)yugabyte(dot)com" <zyu(at)yugabyte(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Subject: Re: [Proposal] Add foreign-server health checks infrastructure
Date: 2022-10-03 14:19:10
Message-ID: CACawEhXAy4gjJ4G3-59Xv_yb8Xbtn1VMYP8SOj-0C9dG2zbAcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Hayato Kuroda,

> If the checking function is called not periodically but GetConnection(),
> it means that the health of foreign servers will be check only when remote
> connections are used.
> So following workload described in [1] cannot handle the issue.
>
> BEGIN --- remote operations--- local operations --- COMMIT
>
>
As far as I can see this patch is mostly useful for detecting the failures
on the initial remote command. This is especially common when the remote
server does a failover/switchover and postgres_fdw uses a cached connection
to access to the remote server.

The remote server failures within a transaction block sounds like a much
less common problem. Still, you can give a good error message before COMMIT
is sent to the remote server.

I agree that this doesn't solve the issue you described, but I'm not sure
if it is worthwhile to fix such a problem.

> > Can we have this function/logic on Postgres core, so that other
> extensions
> > can also use?
>
> I was not sure about any use-case, but I think it can because it does
> quite general things.
> Is there any good motivation to do that?
>

I think any extension that deals with multiple Postgres nodes can benefit
from such logic. In fact, the reason I realized this patch is that on the
Citus extension, we are trying to solve a similar problem [1], [2].

Thinking even more, I think any extension that uses libpq and WaitEventSets
can benefit from such a function.

> > Do you see any performance implications of creating/freeing waitEventSets
> > per check? I wonder if we can somehow re-use the same waitEventSet by
> > modifyWaitEvent? I guess no, but still, if this check causes a
> performance
> > implication, can we somehow cache 1 waitEventSet per connection?
>
> I have not tested yet, but I agreed this will be caused performance
> decrease.
> In next version first I will re-use the event set anyway, and it must be
> considered later.
> Actually I'm not sure your suggestion,
> but you mean to say that we can add a hash table that associates PGconn
> and WaitEventSet, right?
>
>
I think it also depends on where you decide to put
pgfdw_connection_check_internal(). If you prefer the postgres_fdw side,
could we maybe use ConnCacheEntry in contrib/postgres_fdw/connection.c?

But if you decide to put it into the Postgres side, the API
for pgfdw_connection_check_internal() -- or equivalent function -- could be
discussed. Do we pass a WaitEventSet and if it is NULL create a new one,
else use what is passed to the function? Not sure, maybe you can come up
with a better API.

Thanks,
Onder KALACI

[1]: Check WL_SOCKET_CLOSED before using any connection by onderkalaci ·
Pull Request #6259 · citusdata/citus (github.com)
<https://github.com/citusdata/citus/pull/6259>
[2]: Add a single connection retry in MULTI_CONNECTION_LOST state by
marcocitus · Pull Request #6283 · citusdata/citus (github.com)
<https://github.com/citusdata/citus/pull/6283>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2022-10-03 15:11:07 Miscellaneous tab completion issue fixes
Previous Message Jonathan S. Katz 2022-10-03 14:07:33 PostgreSQL 15 RC2 + GA release dates