Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away
Date: 2020-07-14 01:57:10
Message-ID: CALj2ACUZ-nRMKoWHgv_r=1XkHHk5QSPzEfAcf-fs0vpdrfShrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> You could get a backend's PID using PQbackendPID and then kill it by
> calling pg_terminate_backend() to kill the remote backend to automate
> scenario of remote backend being killed.
>

I already added the test case in v2 patch itself(added one more test
case in v3 patch), using the similar approach.

>
> > For, one of the Ashutosh's review comments from [3] "the fact that the
> > same connection may be used by multiple plan nodes", I tried to have
> > few use cases where there exist joins on two foreign tables on the
> > same remote server, in a single query, so essentially, the same
> > connection was used for multiple plan nodes. In this case we avoid
> > retrying for the second GetConnection() request for the second foreign
> > table, with the check entry->xact_depth <= 0 , xact_depth after the
> > first GetConnection() and the first remote query will become 1 and we
> > don't hit the retry logic and seems like we are safe here. Please add
> > If I'm missing something here.
> >
> > Request the community to consider the patch for further review if the
> > overall idea seems beneficial.
>
> I think this idea will be generally useful if your work on dropping
> stale connection uses idle_connection_timeout or something like that
> on the remote server.

Assuming we use idle_connection_timeout or some other means(as it is
not yet finalized, I will try to respond in that mail chain) to drop
stale/idle connections from the local backend, I think we have two
options 1) deleting that cached entry from the connection cache
entirely using disconnect_pg_server() and hash table remove. This
frees up some space and we don't have to deal with the connection
invalidations and don't have to bother on resetting cached entry's
other parameters such as xact_depth, have_prep_stmt etc. 2) or we
could just drop the connections using disconnect_pg_server(), retain
the hash entry, reset other parameters, and deal with the
invalidations. This is like, we maintain unnecessary info in the
cached entry, where we actually don't have a connection at all and
keep holding some space for cached entry.

IMO, if we go with option 1, then it will be good.

Anyways, this connection retry feature will not have any dependency on
the idle_connection_timeout or dropping stale connection feature,
because there can be many reasons where remote backends go away/get
killed.

If I'm not sidetracking - if we use something like
idle_session_timeout [1] on the remote server, this retry feature will
be very useful.

>
> About the patch. It seems we could just catch the error from
> begin_remote_xact() in GetConnection() and retry connection if the
> error is "bad connection". Retrying using PQreset() might be better
> than calling PQConnect* always.
>

Thanks for the comment, it made life easier. Added the patch with the
changes. Please take a look at the v3 patch and let me know if still
something needs to be done.

[1] -
https://www.postgresql.org/message-id/763A0689-F189-459E-946F-F0EC4458980B%40hotmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v3-Retry-Cached-Remote-Connections-For-postgres_fdw.patch application/octet-stream 8.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message movead.li@highgo.ca 2020-07-14 02:02:10 Re: POC and rebased patch for CSN based snapshots
Previous Message Amit Langote 2020-07-14 01:56:15 Re: [PATCH] Performance Improvement For Copy From Binary Files