Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Zhihong Yu <zyu(at)yugabyte(dot)com>, Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit
Date: 2021-01-22 13:13:04
Message-ID: 62431587-ed3a-f333-51a9-967ccf2a78db@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021/01/22 3:29, Fujii Masao wrote:
>
>
> On 2021/01/22 1:17, Bharath Rupireddy wrote:
>> On Thu, Jan 21, 2021 at 8:58 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>>> My opinion is to check "!all", but if others prefer using such boolean flag,
>>> I'd withdraw my opinion.
>>
>> I'm really sorry, actually if (!all) is enough there, my earlier
>> understanding was wrong.
>>
>>> +               if ((all || entry->server_hashvalue == hashvalue) &&
>>>
>>> What about making disconnect_cached_connections() accept serverid instead
>>> of hashvalue, and perform the above comparison based on serverid? That is,
>>> I'm thinking "if (all || entry->serverid == serverid)". If we do that, we can
>>> simplify postgres_fdw_disconnect() a bit more by getting rid of the calculation
>>> of hashvalue.
>>
>> That's a good idea. I missed this point. Thanks.
>>
>>> +               if ((all || entry->server_hashvalue == hashvalue) &&
>>> +                        entry->conn)
>>>
>>> I think that it's better to make the check of "entry->conn" independent
>>> like other functions in postgres_fdw/connection.c. What about adding
>>> the following check before the above?
>>>
>>>                  /* Ignore cache entry if no open connection right now */
>>>                  if (entry->conn == NULL)
>>>                          continue;
>>
>> Done.
>>
>>> +                                       /*
>>> +                                        * If the server has been dropped in the current explicit
>>> +                                        * transaction, then this entry would have been invalidated
>>> +                                        * in pgfdw_inval_callback at the end of drop sever
>>> +                                        * command. Note that this connection would not have been
>>> +                                        * closed in pgfdw_inval_callback because it is still being
>>> +                                        * used in the current explicit transaction. So, assert
>>> +                                        * that here.
>>> +                                        */
>>> +                                       Assert(entry->invalidated);
>>>
>>> As this comment explains, even when the connection is used in the transaction,
>>> its server can be dropped in the same transaction. The connection can remain
>>> until the end of transaction even though its server has been already dropped.
>>> I'm now wondering if this behavior itself is problematic and should be forbid.
>>> Of course, this is separate topic from this patch, though..
>>>
>>> BTW, my just idea for that is;
>>> 1. change postgres_fdw_get_connections() return also serverid and xact_depth.
>>> 2. make postgres_fdw define the event trigger on DROP SERVER command so that
>>>       an error is thrown if the connection to the server is still in use.
>>>       The event trigger function uses postgres_fdw_get_connections() to check
>>>       if the server connection is still in use or not.
>>>
>>> I'm not sure if this just idea is really feasible or not, though...
>>
>> I'm not quite sure if we can create such a dependency i.e. blocking
>> "drop foreign server" when at least one session has an in use cached
>> connection on it?
>
> Maybe my explanation was not clear... I was thinking to prevent the server whose connection is used *within the current transaction* from being dropped. IOW, I was thinking to forbid the drop of server if xact_depth of its connection is more than one. So one session can drop the server even when its connection is open in other session if it's not used within the transaction (i.e., xact_depth == 0).
>
> BTW, for now, if the connection is used within the transaction, other session cannot drop the corresponding server because the transaction holds the lock on the relations that depend on the server. Only the session running that transaction can drop the server. This can cause the issue in discussion.
>
> So, my just idea is to disallow even that session running the transaction to drop the server. This means that no session can drop the server while its connection is used within the transaction (xact_depth > 0).
>
>
>> What if a user wants to drop a server from one
>> session, all other sessions one after the other keep having in-use
>> connections related to that server, (though this use case sounds
>> impractical) will the drop server ever be successful? Since we can
>> have hundreds of sessions in real world postgres environment, I don't
>> know if it's a good idea to create such dependency.
>>
>> As you suggested, this point can be discussed in a separate thread and
>> if any of the approaches proposed by you above is finalized we can
>> extend postgres_fdw_get_connections anytime.
>>
>> Thoughts?
>
> I will consider more before starting separate discussion!
>
>
>>
>> Attaching v16 patch set, addressing above review comments and also
>> added a test case suggested upthread that postgres_fdw_disconnect()
>> with existing server name returns false that is when the cache doesn't
>> have active connection.
>>
>> Please review the v16 patch set further.
>
> Thanks! Will review that later.

+ /*
+ * For the given server, if we closed connection or it is still in
+ * use, then no need of scanning the cache further. We do this
+ * because the cache can not have multiple cache entries for a
+ * single foreign server.
+ */

On second thought, ISTM that single foreign server can have multiple cache
entries. For example,

CREATE ROLE foo1 SUPERUSER;
CREATE ROLE foo2 SUPERUSER;
CREATE EXTENSION postgres_fdw;
CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw OPTIONS (port '5432');
CREATE USER MAPPING FOR foo1 SERVER loopback OPTIONS (user 'postgres');
CREATE USER MAPPING FOR foo2 SERVER loopback OPTIONS (user 'postgres');
CREATE TABLE t (i int);
CREATE FOREIGN TABLE ft (i int) SERVER loopback OPTIONS (table_name 't');
SET SESSION AUTHORIZATION foo1;
SELECT * FROM ft;
SET SESSION AUTHORIZATION foo2;
SELECT * FROM ft;

Then you can see there are multiple open connections for the same server
as follows. So we need to scan all the entries even when the serverid is
specified.

SELECT * FROM postgres_fdw_get_connections();

server_name | valid
-------------+-------
loopback | t
loopback | t
(2 rows)

This means that user (even non-superuser) can disconnect the connection
established by another user (superuser), by using postgres_fdw_disconnect_all().
Is this really OK?

+ if (all || (OidIsValid(serverid) && entry->serverid == serverid))
+ {

I don't think that "OidIsValid(serverid)" condition is necessary here.
But you're just concerned about the case where the caller mistakenly
specifies invalid oid and all=false? One idea to avoid that inconsistent
combination of inputs is to change disconnect_cached_connections()
as follows.

-disconnect_cached_connections(Oid serverid, bool all)
+disconnect_cached_connections(Oid serverid)
{
HASH_SEQ_STATUS scan;
ConnCacheEntry *entry;
+ bool all = !OidIsValid(serverid);

+ * in pgfdw_inval_callback at the end of drop sever

Typo: "sever" should be "server".

+-- ===================================================================
+-- test postgres_fdw_disconnect function
+-- ===================================================================

This regression test is placed at the end of test file. But isn't it better
to place that just after the regression test "test connection invalidation
cases" because they are related?

+ <screen>
+postgres=# SELECT * FROM postgres_fdw_disconnect('loopback1');
+ postgres_fdw_disconnect

The tag <screen> should start from the beginning.

As I commented upthread, what about replacing the example query with
"SELECT postgres_fdw_disconnect('loopback1');" because it's more common?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vik Fearing 2021-01-22 13:14:29 CTAS command tags
Previous Message Heikki Linnakangas 2021-01-22 13:12:13 Re: PoC Refactor AM analyse API