Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Date: 2020-12-16 12:50:27
Message-ID: CALj2ACUkmhS4X6+u6V2z9dhO5dDitF0+S2WP+G4JgYhB6e7yNw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 14, 2020 at 8:03 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> On 2020/12/14 14:36, Bharath Rupireddy wrote:
> > On Mon, Dec 14, 2020 at 9:38 AM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com <mailto:masao(dot)fujii(at)oss(dot)nttdata(dot)com>> wrote:
> > > On 2020/12/12 15:05, Bharath Rupireddy wrote:
> > > > On Sat, Dec 12, 2020 at 12:19 AM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com <mailto:masao(dot)fujii(at)oss(dot)nttdata(dot)com> <mailto:masao(dot)fujii(at)oss(dot)nttdata(dot)com <mailto:masao(dot)fujii(at)oss(dot)nttdata(dot)com>>> wrote:
> > > > > I was thinking that in the case of drop of user mapping or server, hash_search(ConnnectionHash) in GetConnection() cannot find the cached connection entry invalidated by that drop. Because "user->umid" used as hash key is changed. So I was thinking that that invalidated connection will not be closed nor reconnected.
> > > > >
> > > >
> > > > You are right in saying that the connection leaks.
> > > >
> > > > Use case 1:
> > > > 1) Run foreign query in session1 with server1, user mapping1
> > > > 2) Drop user mapping1 in another session2, invalidation message gets logged which will have to be processed by other sessions
> > > > 3) Run foreign query again in session1, at the start of txn, the cached entry gets invalidated via pgfdw_inval_callback(). Whatever may be the type of foreign query (select, update, explain, delete, insert, analyze etc.), upon next call to GetUserMapping() from postgres_fdw.c, the cache lookup fails(with ERROR: user mapping not found for "XXXX") since the user mapping1 has been dropped in session2 and the query will also fail before reaching GetConnection() where the connections associated with invalidated entries would have got disconnected.
> > > >
> > > > So, the connection associated with invalidated entry would remain until the local session exits which is a problem to solve.
> > > >
> > > > Use case 2:
> > > > 1) Run foreign query in session1 with server1, user mapping1
> > > > 2) Try to drop foreign server1, then we would not be allowed to do so because of dependency. If we use CASCADE, then the dependent user mapping1 and foreign tables get dropped too [1].
> > > > 3) Run foreign query again in session1, at the start of txn, the cached entry gets invalidated via pgfdw_inval_callback(), it fails because there is no foreign table and user mapping1.
> > > >
> > > > But, note that the connection remains open in session1, which is again a problem to solve.
> > > >
> > > > To solve the above connection leak problem, it looks like the right place to close all the invalid connections is pgfdw_xact_callback(), once registered, which gets called at the end of every txn in the current session(by then all the sub txns also would have been finished). Note that if there are too many invalidated entries, then one of the following txn has to bear running this extra code, but that's okay than having leaked connections. Thoughts? If okay, I can code a separate patch.
> > >
> > > Thanks for further analysis! Sounds good. Also +1 for making it as separate patch. Maybe only this patch needs to be back-patched.
> >
> > Thanks. Yeah once agreed on the fix, +1 to back patch. Shall I start a separate thread for connection leak issue and patch, so that others might have different thoughts??
>
> Yes, of course!

Thanks. I posted the patch in a separate thread[1] for fixing the
connection leak problem.

[1] - https://www.postgresql.org/message-id/flat/CALj2ACVNcGH_6qLY-4_tXz8JLvA%2B4yeBThRfxMz7Oxbk1aHcpQ%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2020-12-16 13:35:35 Re: On login trigger: take three
Previous Message Fujii Masao 2020-12-16 12:49:07 Deadlock between backend and recovery may not be detected