Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Date: 2021-01-29 05:38:23
Message-ID: CALj2ACW1jAJTi-SmBCStZ6+a1C3DeAc8-_Zuw1LYFzo8D33k2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 29, 2021 at 10:55 AM Fujii Masao
<masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> On 2021/01/29 14:12, Bharath Rupireddy wrote:
> > On Fri, Jan 29, 2021 at 10:28 AM Fujii Masao
> > <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> >> On 2021/01/29 11:09, Tom Lane wrote:
> >>> Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> writes:
> >>>> On Fri, Jan 29, 2021 at 1:52 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2021-01-26%2019%3A59%3A40
> >>>>> This is a CLOBBER_CACHE_ALWAYS build, so I suspect what it's
> >>>>> telling us is that the patch's behavior is unstable in the face
> >>>>> of unexpected cache flushes.
> >>>
> >>>> Thanks a lot! It looks like the syscache invalidation messages are
> >>>> generated too frequently with -DCLOBBER_CACHE_ALWAYS build due to
> >>>> which pgfdw_inval_callback gets called many times in which the cached
> >>>> entries are marked as invalid and closed if they are not used in the
> >>>> txn. The new function postgres_fdw_get_connections outputs the
> >>>> information of the cached connections such as name if the connection
> >>>> is still open and their validity. Hence the output of the
> >>>> postgres_fdw_get_connections became unstable in the buildfarm member.
> >>>> I will further analyze making tests stable, meanwhile any suggestions
> >>>> are welcome.
> >>>
> >>> I do not think you should regard this as "we need to hack the test
> >>> to make it stable". I think you should regard this as "this is a
> >>> bug". A cache flush should not cause user-visible state changes.
> >>> In particular, the above analysis implies that you think a cache
> >>> flush is equivalent to end-of-transaction, which it absolutely
> >>> is not.
> >>>
> >>> Also, now that I've looked at pgfdw_inval_callback, it scares
> >>> the heck out of me. Actually disconnecting a connection during
> >>> a cache inval callback seems quite unsafe --- what if that happens
> >>> while we're using the connection?
> >>
> >> If the connection is still used in the transaction, pgfdw_inval_callback()
> >> marks it as invalidated and doesn't close it. So I was not thinking that
> >> this is so unsafe.
> >>
> >> The disconnection code in pgfdw_inval_callback() was added in commit
> >> e3ebcca843 to fix connection leak issue, and it's back-patched. If this
> >> change is really unsafe, we need to revert it immediately at least from back
> >> branches because the next minor release is scheduled soon.
> >
> > I think we can remove disconnect_pg_server in pgfdw_inval_callback and
> > make entries only invalidated. Anyways, those connections can get
> > closed at the end of main txn in pgfdw_xact_callback. Thoughts?
>
> But this revives the connection leak issue. So isn't it better to
> to do that after we confirm that the current code is really unsafe?

IMO, connections will not leak, because the invalidated connections
eventually will get closed in pgfdw_xact_callback at the main txn end.

IIRC, when we were finding a way to close the invalidated connections
so that they don't leaked, we had two options:

1) let those connections (whether currently being used in the xact or
not) get marked invalidated in pgfdw_inval_callback and closed in
pgfdw_xact_callback at the main txn end as shown below

if (PQstatus(entry->conn) != CONNECTION_OK ||
PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
entry->changing_xact_state ||
entry->invalidated). ----> by adding this
{
elog(DEBUG3, "discarding connection %p", entry->conn);
disconnect_pg_server(entry);
}

2) close the unused connections right away in pgfdw_inval_callback
instead of marking them invalidated. Mark used connections as
invalidated in pgfdw_inval_callback and close them in
pgfdw_xact_callback at the main txn end.

We went with option (2) because we thought this would ease some burden
on pgfdw_xact_callback closing a lot of invalid connections at once.

Hope that's fine.

I will respond to postgres_fdw_get_connections issue separately.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2021-01-29 05:41:52 Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Previous Message Fujii Masao 2021-01-29 05:25:48 Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit