Re: [Proposal] Add foreign-server health checks infrastructure

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: kuroda(dot)hayato(at)fujitsu(dot)com
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Shinya11(dot)Kato(at)oss(dot)nttdata(dot)com, zyu(at)yugabyte(dot)com, masao(dot)fujii(at)oss(dot)nttdata(dot)com
Subject: Re: [Proposal] Add foreign-server health checks infrastructure
Date: 2022-02-17 08:08:32
Message-ID: 20220217.170832.627054018291482509.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Kuroda-san.

At Thu, 17 Feb 2022 04:11:09 +0000, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com> wrote in
> Dear Horiguchi-san,
>
> Thank you for giving your suggestions. I want to confirm your saying.
>
> > FWIW, I'm not sure this feature necessarily requires core support
> > dedicated to FDWs. The core have USER_TIMEOUT feature already and
> > FDWs are not necessarily connection based. It seems better if FDWs
> > can implement health check feature without core support and it seems
> > possible. Or at least the core feature should be more generic and
> > simpler. Why don't we just expose InTransactionHealthCheckCallbacks or
> > something and operating functions on it?
>
> I understood that core is too complicated and FDW side is too stupid, right?

I don't think the FDW side is stupid but seem too complex for the
benefit. And just think that maybe we don't need the core part.

> > Mmm. AFAICS the running command will stop with "canceling statement
> > due to user request", which is a hoax. We need a more decent message
> > there.
>
> +1 about better messages.
>
> > I understand that the motive of this patch is "to avoid wasted long
> > local work when fdw-connection dies".
>
> Yeah your understanding is right.
>
> > In regard to the workload in
> > your first mail, it is easily avoided by ending the transaction as soon
> > as remote access ends. This feature doesn't work for the case "begin;
> > <long local query>; <fdw access>". But the same measure also works in
> > that case. So the only case where this feature is useful is "begin;
> > <fdw-access>; <some long work>; <fdw-access>; end;". But in the first
> > place how frequently do you expecting remote-connection close happens?
> > If that happens so frequently, you might need to recheck the system
> > health before implementing this feature. Since it is correctly
> > detected when something really went wrong, I feel that it is a bit too
> > complex for the usefulness especially for the core part.
>
> Thanks for analyzing motivation.
> Indeed, some cases may be resolved by separating tx and this event rarely happens.
>
> > In conclusion, as my humble opinion I would like to propose to reduce
> > this feature to:
> >
> > - Just periodically check health (in any aspect) of all live
> > connections regardless of the session state.
>
> I understood here as removing following mechanism from core:
>
> * disable timeout at end of tx.
> * skip if held off or read commands

I think we're on the same page. Anyway query cancel interrupt is
ignored while rading input.

> > - If an existing connection is found to be dead, just try canceling
> > the query (or sending query cancel).
> > One issue with it is how to show the decent message for the query
> > cancel, but maybe we can have a global variable that suggests the
> > reason for the cancel.
>
> Currently I have no good idea for that but I'll try.

However, I would like to hear others' opnions about the direction, of
course.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-02-17 08:14:05 Re: [Proposal] Add foreign-server health checks infrastructure
Previous Message Michael Paquier 2022-02-17 07:52:49 Re: Time to drop plpython2?