Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>
Subject: Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away
Date: 2020-07-14 12:43:09
Message-ID: CAExHW5uxHdLk9o9HUrQFPzd9qkUh36buAdAA=hTCpXdv9ZU6HA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Has this been added to CF, possibly next CF?

On Tue, Jul 14, 2020 at 7:27 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> >
> > You could get a backend's PID using PQbackendPID and then kill it by
> > calling pg_terminate_backend() to kill the remote backend to automate
> > scenario of remote backend being killed.
> >
>
> I already added the test case in v2 patch itself(added one more test
> case in v3 patch), using the similar approach.
>
> >
> > > For, one of the Ashutosh's review comments from [3] "the fact that the
> > > same connection may be used by multiple plan nodes", I tried to have
> > > few use cases where there exist joins on two foreign tables on the
> > > same remote server, in a single query, so essentially, the same
> > > connection was used for multiple plan nodes. In this case we avoid
> > > retrying for the second GetConnection() request for the second foreign
> > > table, with the check entry->xact_depth <= 0 , xact_depth after the
> > > first GetConnection() and the first remote query will become 1 and we
> > > don't hit the retry logic and seems like we are safe here. Please add
> > > If I'm missing something here.
> > >
> > > Request the community to consider the patch for further review if the
> > > overall idea seems beneficial.
> >
> > I think this idea will be generally useful if your work on dropping
> > stale connection uses idle_connection_timeout or something like that
> > on the remote server.
>
> Assuming we use idle_connection_timeout or some other means(as it is
> not yet finalized, I will try to respond in that mail chain) to drop
> stale/idle connections from the local backend, I think we have two
> options 1) deleting that cached entry from the connection cache
> entirely using disconnect_pg_server() and hash table remove. This
> frees up some space and we don't have to deal with the connection
> invalidations and don't have to bother on resetting cached entry's
> other parameters such as xact_depth, have_prep_stmt etc. 2) or we
> could just drop the connections using disconnect_pg_server(), retain
> the hash entry, reset other parameters, and deal with the
> invalidations. This is like, we maintain unnecessary info in the
> cached entry, where we actually don't have a connection at all and
> keep holding some space for cached entry.
>
> IMO, if we go with option 1, then it will be good.
>
> Anyways, this connection retry feature will not have any dependency on
> the idle_connection_timeout or dropping stale connection feature,
> because there can be many reasons where remote backends go away/get
> killed.
>
> If I'm not sidetracking - if we use something like
> idle_session_timeout [1] on the remote server, this retry feature will
> be very useful.
>
> >
> > About the patch. It seems we could just catch the error from
> > begin_remote_xact() in GetConnection() and retry connection if the
> > error is "bad connection". Retrying using PQreset() might be better
> > than calling PQConnect* always.
> >
>
> Thanks for the comment, it made life easier. Added the patch with the
> changes. Please take a look at the v3 patch and let me know if still
> something needs to be done.
>
> [1] - https://www.postgresql.org/message-id/763A0689-F189-459E-946F-F0EC4458980B%40hotmail.com
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2020-07-14 13:10:01 Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away
Previous Message Justin Pryzby 2020-07-14 12:40:35 Re: DROP relation IF EXISTS Docs and Tests - Bug Fix