RE: [Proposal] Add foreign-server health checks infrastructure

From: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Shinya Kato' <Shinya11(dot)Kato(at)oss(dot)nttdata(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: [Proposal] Add foreign-server health checks infrastructure
Date: 2021-11-18 12:43:01
Message-ID: TYAPR01MB58662CD4FD98AA475B3D10F9F59B9@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Kato-san,

Thank you for your interest!

> > I also want you to review the postgres_fdw part,
> > but I think it should not be attached because cfbot cannot understand
> > such a dependency
> > and will throw build error. Do you know how to deal with them in this
> > case?
>
> I don't know how to deal with them, but I hope you will attach the PoC,
> as it may be easier to review.

OK, I attached the PoC along with the dependent patches. Please see the zip file.
add_helth_check_... patch is written by me, and other two patches are
just copied from [1].
In the new callback function ConnectionHash is searched sequentially and
WaitEventSetWait() is performed for WL_SOCKET_CLOSED socket event.
This event is added by the dependent ones.

===
How to use
===

I'll explain how to use it briefly.

1. boot two postmaster processes. One is coordinator, and another is worker
2. set remote_servers_connection_check_interval to non-zero value at the coordinator
3. create tables to worker DB-cluster.
4. create foreign server, user mapping, and foreign table to coordinator.
5. connect to coordinator via psql.
6. open a transaction and access to foreing tables.
7. do "pg_ctl stop" command to woker DB-cluser.
8. execute some commands that does not access an foreign table.
9. Finally the following output will be get:

ERROR: Postgres foreign server XXX might be down.

===
Example in some steps
===

3. at worker

```
postgres=# \d
List of relations
Schema | Name | Type | Owner
--------+--------+-------+--------
public | remote | table | hayato
(1 row)
```

4. at coordinator

```
postgres=# select * from pg_foreign_server ;
oid | srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | srvoptions
-------+---------+----------+--------+---------+------------+--------+-----------------------------
16406 | remote | 10 | 16402 | | | | {port=5433,dbname=postgres}
(1 row)

postgres=# select * from pg_user_mapping ;
oid | umuser | umserver | umoptions
-------+--------+----------+---------------
16407 | 10 | 16406 | {user=hayato}
(1 row)

postgres=# \d
List of relations
Schema | Name | Type | Owner
--------+--------+---------------+--------
public | local | table | hayato
public | remote | foreign table | hayato
(2 rows)
```

6-9. at coordinator

```
postgres=# begin;
BEGIN
postgres=*# select * from remote ;
id
----
1
(1 row)

postgres=*# select * from local ;
ERROR: Postgres foreign server remote might be down.
postgres=!#
```

Note that some keepalive settings are needed
if you want to detect cable breakdown events.
In my understanding following parameters are needed as server options:

* keepalives_idle
* keepalives_count
* keepalives_interval

[1]: https://commitfest.postgresql.org/35/3098/

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
patches.zip application/x-zip-compressed 6.1 KB
v01_add_checking_infrastracture.patch application/octet-stream 11.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2021-11-18 12:45:29 Re: postgres_fdw: commit remote (sub)transactions in parallel during pre-commit
Previous Message Amul Sul 2021-11-18 12:39:07 Re: Should rename "startup process" to something else?