RE: [Proposal] Add foreign-server health checks infrastructure

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Önder Kalacı' <onderkalaci(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "Shinya11(dot)Kato(at)oss(dot)nttdata(dot)com" <Shinya11(dot)Kato(at)oss(dot)nttdata(dot)com>, "zyu(at)yugabyte(dot)com" <zyu(at)yugabyte(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Subject: RE: [Proposal] Add foreign-server health checks infrastructure
Date: 2022-11-25 11:41:45
Message-ID: TYAPR01MB5866AAA4EB0632B4001AA2C5F50E9@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Önder,

Sorry for my late reply. I understood that we got agreement the basic design of first version. Thanks!
I attached new version patches.

> I think more experienced hackers could guide us here. I don't see a problem
> with that, but checking other functions like pqSocketPoll(), I see that
> Postgres uses the select system call. I wonder if we can have something
> similar with select? Seems no, but wanted to bring up in case you know more
> about that?

Here I have been still checking...

> I guess it kind of makes sense to have the flexibility to check all
> connections vs only in tx connections.
>
> Though, maybe we should follow a similar approach
> to postgres_fdw_disconnect(servername) / postgres_fdw_disconnect_all()
> postgres_fdw_verify_connection_states(servername) /
> postgres_fdw_verify_connection_states_all()
>
> That seems like a more natural API considering the other UDFs. You can also
> use in combination with postgres_fdw_get_connections()

I think you are right, fixed.

> Hmm, I don't know the reason. Is that maybe epoll is available on a smaller
> number of platforms and libpq can be used on more platforms as being a
> client library?

Here I have been still checking...

> So, should we also be using all these flags like: input_fd.events =
> POLLRDHUP | *errflags*; ?

Fixed.

> Is erroring out always necessary? Maybe we should just return true/false,
> and let the caller decide what to do? For example, you might want to
> rollback to savepoint and retry? If the caller wants to rollback the whole
> transaction, that is possible anyway.

I changed as you suggested once, but I was not sure we can really do such a thing.
Please see the following case.

```
BEGIN;

SAVEPOINT s;
SELECT 1 FROM ft1 LIMIT 1; -- connect to foreign server

SELECT pg_terminate_backend(pid, 180000) FROM pg_stat_activity
WHERE application_name = 'fdw_retry_check'; -- kill the remote connection

SELECT 1 FROM ft1 LIMIT 1; -- will fail
-> ERROR: FATAL: terminating connection due to administrator command

ROLLBACK TO SAVEPOINT s; -- OK, go back to savepoint...
COMMIT;
-> ERROR: connection to server "loopback" was lost
```

The test means that even if we go back to the savepoint,
we could not commit the transaction because postgres_fdw tries to connect to the remove
in pgfdw_xact_callback().

Do we really have a flexibility when the remote connection seems to be closed?
If we do not have, I will revert the change.

> Do we really need this new file or just an oversight in your patch?

Previously there were three file because sometimes the ordering of
output was changed, but it was no more needed.
Currently this patch contains two comparison files, one is for linux,
and another one is for unsupported OSes like Windows.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
v19-0001-Add-PQConncheck-to-libpq.patch application/octet-stream 4.1 KB
v19-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch application/octet-stream 8.2 KB
v19-0003-add-test.patch application/octet-stream 566.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2022-11-25 11:59:01 Re: Avoid distributing new catalog snapshot again for the transaction being decoded.
Previous Message Amit Langote 2022-11-25 11:28:37 Re: ExecRTCheckPerms() and many prunable partitions