RE: [Proposal] Add foreign-server health checks infrastructure

From: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Fujii Masao' <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shinya11(dot)Kato(at)oss(dot)nttdata(dot)com" <Shinya11(dot)Kato(at)oss(dot)nttdata(dot)com>, "zyu(at)yugabyte(dot)com" <zyu(at)yugabyte(dot)com>
Subject: RE: [Proposal] Add foreign-server health checks infrastructure
Date: 2022-02-24 02:34:55
Message-ID: TYAPR01MB5866FC683843ED8BD09505FEF53D9@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Fujii-san,

Thank you for your quick reviewing! I attached new version.
I found previous patches have wrong name. Sorry.

> The connection check timer is re-scheduled repeatedly even while the backend is
> in idle state or is running a local transaction that doesn't access to any foreign
> servers. I'm not sure if it's really worth checking the connections even in those
> states. Even without the periodic connection checks, if the connections are closed
> in those states, subsequent GetConnection() will detect that closed connection
> and re-establish the connection when starting remote transaction. Thought?

Indeed. We can now control the timer in fdw layer, so disable_timeout() was added
at the bottom of pgfdw_xact_callback().

> When a closed connection is detected in idle-in-transaction state and SIGINT is
> raised, nothing happens because there is no query running to be canceled by
> SIGINT. Also in this case the connection check timer gets disabled. So we can still
> execute queries that don't access to foreign servers, in the same transaction, and
> then the transaction commit fails. Is this expected behavior?

It's not happy, but I'm not sure about a good solution. I made a timer reschedule
if connection lost had detected. But if queries in the transaction are quite short,
catching SIGINT may be fail.

> When I shutdowned the foreign server while the local backend is in
> idle-in-transaction state, the connection check timer was triggered and detected
> the closed connection. Then when I executed COMMIT command, I got the
> following WARNING message. Is this a bug?
>
> WARNING: leaked hash_seq_search scan for hash table 0x7fd2ca878f20

Fixed. It is caused because hash_seq_term() was not called when checker detects
a connection lost.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
v13_0001_expose_cancel_message.patch application/octet-stream 1.5 KB
v13_0002_add_health_check.patch application/octet-stream 7.1 KB
v13_0003_add_doc.patch application/octet-stream 1.4 KB
v13_0004_add_test.zip application/x-zip-compressed 855 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joseph Koshakow 2022-02-24 02:35:11 Re: Extract epoch from Interval weird behavior
Previous Message Amit Kapila 2022-02-24 02:33:33 Re: row filtering for logical replication