Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, ashutosh(dot)bapat(dot)oss(at)gmail(dot)com
Subject: Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Date: 2020-07-02 10:59:33
Message-ID: CALj2ACXp6DQ3iLGx5g+LgVtGwC4F6K9WzKQJpyR4FfdydQzC_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > If we were to use idle_session_timeout (from patch [1]) for the remote
> > session to go off without
> > having to delete the corresponding entry from local connection cache and
> > after that if we submit foreign query from local session, then below
> > error would occur,
> > which may not be an expected behaviour. (I took the patch from [1] and
> > intentionally set the
> > idle_session_timeout to a low value on remote server, issued a
> > foreign_tbl query which
> > caused remote session to open and after idle_session_timeout , the
> > remote session
> > closes and now issue the foreign_tbl query from local session)
> >
> > postgres=# SELECT * FROM foreign_tbl;
> > ERROR: server closed the connection unexpectedly
> > This probably means the server terminated abnormally
> > before or while processing the request.
> > CONTEXT: remote SQL command: START TRANSACTION ISOLATION LEVEL
REPEATABLE READ
> > postgres=#
>
> This is actually strange. AFAIR the code, without looking at the
> current code, when a query picks a foreign connection it checks its
> state. It's possible that the connection has not been marked bad by
> the time you fire new query. If the problem exists probably we should
> fix it anyway since the backend at the other end of the connection has
> higher chances of being killed while the connection was sitting idle
> in the cache.
>

Thanks Ashutosh for the suggestion. One way, we could solve the above
problem is that, upon firing the new foreign query from local backend using
cached
connection, (assuming the remote backend/session that was cached in the
local backed got
killed by some means), instead of failing the query in the local
backend/session, upon
detecting error from remote backend, we could just delete the cached old
entry and try getting another
connection to remote backend/session, cache it and proceed to submit the
query. This has to happen only at
the beginning of remote xact.

This way, instead of failing(as mentioned above " server closed the
connection unexpectedly"),
the query succeeds if the local session is able to get a new remote backend
connection.

I worked on a POC patch to prove the above point. Attaching the patch.
Please note that, the patch doesn't contain comments and has some issues
like having some new
variable in PGconn structure and the things like.

If the approach makes some sense, then I can rework properly on the patch
and probably
can open another thread for the review and other stuff.

The way I tested the patch:

1. select * from foreign_tbl;
/*from local session - this results in a
remote connection being cached in
the connection cache and
a remote backend/session is opened.
*/
2. kill the remote backend/session
3. select * from foreign_tbl;
/*from local session - without patch
this throws error "ERROR: server closed the connection unexpectedly"
with path - try to use
the cached connection at the beginning of remote xact, upon receiving
error from remote postgres
server, instead of aborting the query, delete the cached entry, try to
get a new connection, if it
gets, cache it and use that for executing the query, query succeeds.
*/

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

On Wed, Jul 1, 2020 at 7:13 PM Masahiko Sawada <
masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:

> On Wed, 1 Jul 2020 at 18:14, Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> >
> > >
> > > I've not looked at your patch deeply but if this problem is talking
> > > only about postgres_fdw I think we should improve postgres_fdw, not
> > > adding a GUC to the core. It’s not that all FDW plugins use connection
> > > cache and postgres_fdw’s connection cache is implemented within
> > > postgres_fdw, I think we should focus on improving postgres_fdw. I
> > > also think it’s not a good design that the core manages connections to
> > > remote servers connected via FDW. I wonder if we can add a
> > > postgres_fdw option for this purpose, say keep_connection [on|off].
> > > That way, we can set it per server so that remote connections to the
> > > particular server don’t remain idle.
> > >
> >
> > If I understand it correctly, your suggestion is to add
> > keep_connection option and use that while defining the server object.
> > IMO having keep_connection option at the server object level may not
> > serve the purpose being discussed here.
> > For instance, let's say I create a foreign server in session 1 with
> > keep_connection on, and I want to use that
> > server object in session 2 with keep_connection off and session 3 with
> > keep_connection on and so on.
> > One way we can change the server's keep_connection option is to alter
> > the server object, but that's not a good choice,
> > as we have to alter it at the system level.
>
> Is there use-case in practice where different backends need to have
> different connection cache setting even if all of them connect the
> same server? I thought since the problem that this feature is trying
> to resolve is not to eat up the remote server connections capacity by
> disabling connection cache, we’d like to disable connection cache to
> the particular server, for example, which sets low max_connections.
>
> Regards,
>
> --
> Masahiko Sawada http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Attachment Content-Type Size
v1-0001-Retry-cached-remote-connections-in-case-if-remote.patch application/octet-stream 6.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2020-07-02 11:12:26 Re: Remove Deprecated Exclusive Backup Mode
Previous Message David Rowley 2020-07-02 10:57:44 Re: Hybrid Hash/Nested Loop joins and caching results from subplans