Re: [Proposal] Add foreign-server health checks infrastructure

From: Önder Kalacı <onderkalaci(at)gmail(dot)com>
To: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shinya11(dot)Kato(at)oss(dot)nttdata(dot)com" <Shinya11(dot)Kato(at)oss(dot)nttdata(dot)com>, "zyu(at)yugabyte(dot)com" <zyu(at)yugabyte(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Subject: Re: [Proposal] Add foreign-server health checks infrastructure
Date: 2022-10-19 15:18:14
Message-ID: CACawEhUzpqYJ8mQmSjYgX0ePtPpvb2u9Onjf6pCjUGkoZ=-xSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

> As far as I can think of, it should probably be a single background task
> > checking whether the server is down. If so, sending an invalidation
> message
> > to all the backends such that related backends could act on the
> > invalidation and throw an error. This is to cover the use-case you
> > described on [1].
>
> Indeed your approach covers the use case I said, but I'm not sure whether
> it is really good.
> In your approach, once the background worker process will manage all
> foreign servers.
> It may be OK if there are a few servers, but if there are hundreds of
> servers,
> the time interval during checks will be longer.
>

I expect users typically will have a lot more backends than the servers. We
can have a threshold for spinning a new bg worker (e.g., every 10 servers
gets a new bg worker etc.). Still, I think that'd be an optimization that
is probably not necessary for the majority of the users?

> Currently, each FDW can decide whether we do health checks or not per the
> backend.
> For example, we can skip health checks if the foreign server is not used
> now.
> The background worker cannot control such a way.
> Based on the above, I do not agree that we introduce a new background
> worker and make it to do a health check.
>

Again, the definition of "health check" is probably different for me. I'd
expect the health check to happen continuously, ideally keeping track of
how many consecutive times it succeeded and/or last time it
failed/succeeded etc.

A transaction failing with a bad error message (or holding some resources
locally until the transaction is committed) doesn't sound essential to me.
Is there any specific workload are you referring for optimizing to rollback
a transaction earlier if a remote server dies? What kind of workload would
benefit from that? Maybe there is, but not clear to me and haven't seen
discussed on the thread (sorry if I missed).

I'm trying to understand if we are trying to solve a problem that does not
really exists. I'm bringing this up, because I often deal with
architectures where there is a local node and remote transaction on
different Postgres servers. And, I have not encountered many (or any)
pattern that'd benefit from this change much. In fact, I think, on the
contrary, this might add significant overhead for OLTP type of high query
throughput systems.

> Moreover, methods to connect to foreign servers and check health are
> different per FDW.
> In terms of mysql_fdw [1], we must do mysql_init() and
> mysql_real_connect().
> About file_fdw, we do not have to connect, but developers may want to
> calculate checksum and compare.
> Therefore, we must provide callback functions anyway.
>
>
I think providing callback functions is useful for any case. Each fdw (or
in general extension) should be able to provide its own "health check"
function.

Thanks,
Onder KALACI

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2022-10-19 15:37:04 Re: Move backup-related code to xlogbackup.c/.h
Previous Message Melih Mutlu 2022-10-19 14:59:22 Re: Mingw task for Cirrus CI