RE: [Proposal] Add foreign-server health checks infrastructure

From: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Fujii Masao' <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, 'Shinya Kato' <Shinya11(dot)Kato(at)oss(dot)nttdata(dot)com>, "zyu(at)yugabyte(dot)com" <zyu(at)yugabyte(dot)com>
Subject: RE: [Proposal] Add foreign-server health checks infrastructure
Date: 2022-02-01 04:37:07
Message-ID: TYAPR01MB5866D5297140DA5908ECFD27F5269@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Fujii-san,

Thank you for reviewing! I attached the latest version.

> When more than one FDWs are used, even if one FDW calls this function to
> disable the timeout, its remote-server-check-callback can still be called. Is this
> OK?
> Please imagine the case where two FDWs are used and they registered their
> callbacks. Even when one FDW calls TryDisableRemoteServerCheckingTimeout(),
> if another FDW has not called that yet, the timeout is still being enabled. If the
> timeout is triggered during that period, even the callback registered by the FDW
> that has already called TryDisableRemoteServerCheckingTimeout() would be
> called.

Indeed and it should be avoided. I added a counter to CheckingRemoteServersCallbackItem.
The register function returns the registered item, and it must be set as the argument for
enable and disable functions.
Callback functions will be called when item->counter is larger than zero.

> + if (remote_servers_connection_check_interval > 0)
> + {
> + CallCheckingRemoteServersCallbacks();
> +
> enable_timeout_after(CHECKING_REMOTE_SERVERS_TIMEOUT,
> +
> remote_servers_connection_check_interval);
>
> LockErrorCleanup() needs to be called before reporting the error, doesn't it?

You are right. I think this suggests that error-reporting should be done in the core layer.
For meaningful error reporting, I changed a callback specification
that it should return ForeignServer* which points to downed remote server.

> This can cause an error even while DoingCommandRead == true. Is that safe?

I read codes again and I think it is not safe. It is OK when whereToSendOutput is DestRemote,
but not good in InteractiveBackend().

I changed that if-statement for CheckingRemoteServersTimeoutPending is moved just after
ClientConnectionLost, because the that may throw a FATAL error and disconnect from client.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
pachset.zip application/x-zip-compressed 7.7 KB
v07_0001_add_checking_infrastracture.patch application/octet-stream 13.4 KB
v07_0002_add_doc.patch application/octet-stream 3.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2022-02-01 04:38:04 Re: Add checkpoint and redo LSN to LogCheckpointEnd log message
Previous Message Bharath Rupireddy 2022-02-01 04:22:26 Add DBState to pg_control_system function