Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away

From: Kasahara Tatsuhito <kasahara(dot)tatsuhito(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Subject: Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away
Date: 2020-07-10 07:55:41
Message-ID: CAP0=ZV+X_f-T6F+H_dO_QypJMkrn5NGfijHTScPG8CDDOC6tzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Wed, Jul 8, 2020 at 9:40 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> One way, we could solve the above problem is that, upon firing the new
> foreign query from local backend using the cached connection,
> (assuming the remote backend that was cached in the local backed got
> killed for some reasons), instead of failing the query in the local
> backend, upon detecting error from the remote backend, we could just
> delete the cached old entry and try getting another connection to
> remote backend, cache it and proceed to submit the query. This has to
> happen only at the beginning of remote xact.
+1.

I think this is a very useful feature.
In an environment with connection pooling for local, if a remote
server has a failover or switchover,
this feature would prevent unexpected errors of local queries after
recovery of the remote server.

I haven't looked at the code in detail yet, some comments here.

1. To keep the previous behavior (and performance), how about allowing
the user to specify
whether or not to retry as a GUC parameter or in the FOREIGN SERVER OPTION?

2. How about logging a LOG message when retry was success to let us know
the retry feature worked or how often the retries worked ?

> I couldn't think of adding a test case to the existing postgres_fdw
> regression test suite with an automated scenario of the remote backend
> getting killed.

Couldn't you confirm this by adding a test case like the following?
===================================================
BEGIN;
-- Generate a connection to remote
SELECT * FROM ft1 LIMIT 1;

-- retrieve pid of postgres_fdw and kill it
-- could use the other unique identifier (not postgres_fdw but
fdw_retry_check, etc ) for application name
SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE
backend_type = 'client backend' AND application_name = 'postgres_fdw'

-- COMMIT, so next query will should success if connection-retry works
COMMIT;
SELECT * FROM ft1 LIMIT 1;
===================================================

Best regards,

--
Tatsuhito Kasahara
kasahara.tatsuhito _at_ gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2020-07-10 08:29:18 Re: Default setting for enable_hashagg_disk
Previous Message Laurenz Albe 2020-07-10 07:24:25 Re: Postgres is not able to handle more than 4k tables!?