Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>
Subject: Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away
Date: 2020-07-13 04:43:19
Message-ID: CAExHW5vn_JpBOB8Xr1y=w=W1U9Um6UX-X_GPS2+Vd15vBwNrfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 8, 2020 at 6:10 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> I couldn't think of adding a test case to the existing postgres_fdw
> regression test suite with an automated scenario of the remote backend
> getting killed.

You could get a backend's PID using PQbackendPID and then kill it by
calling pg_terminate_backend() to kill the remote backend to automate
scenario of remote backend being killed.

>
> I would like to thank Ashutosh Bapat (ashutosh(dot)bapat(dot)oss(at)gmail(dot)com)
> for the suggestion to fix this and the review of my initial patch
> attached in [2]. I tried to address the review comments provided on my
> initial patch [3].
>
> For, one of the Ashutosh's review comments from [3] "the fact that the
> same connection may be used by multiple plan nodes", I tried to have
> few use cases where there exist joins on two foreign tables on the
> same remote server, in a single query, so essentially, the same
> connection was used for multiple plan nodes. In this case we avoid
> retrying for the second GetConnection() request for the second foreign
> table, with the check entry->xact_depth <= 0 , xact_depth after the
> first GetConnection() and the first remote query will become 1 and we
> don't hit the retry logic and seems like we are safe here. Please add
> If I'm missing something here.
>
> Request the community to consider the patch for further review if the
> overall idea seems beneficial.

I think this idea will be generally useful if your work on dropping
stale connection uses idle_connection_timeout or something like that
on the remote server.

About the patch. It seems we could just catch the error from
begin_remote_xact() in GetConnection() and retry connection if the
error is "bad connection". Retrying using PQreset() might be better
than calling PQConnect* always.

>
> [1] https://www.postgresql.org/message-id/CAExHW5t21B_XPQy_hownm1Qq%3DhMrgOhX%2B8gDj3YEKFvpk%3DVRgw%40mail.gmail.com
> [2] https://www.postgresql.org/message-id/CALj2ACXp6DQ3iLGx5g%2BLgVtGwC4F6K9WzKQJpyR4FfdydQzC_g%40mail.gmail.com
> [3] https://www.postgresql.org/message-id/CAExHW5u3Gyv6Q1BEr6zMg0t%2B59e8c4KMfKVrV3Z%3D4UKKjJ19nQ%40mail.gmail.com
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-07-13 04:43:51 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message Ashutosh Bapat 2020-07-13 04:42:22 Re: Bug with indexes on whole-row expressions