Re: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Zhihong Yu <zyu(at)yugabyte(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "ashutosh(dot)bapat(dot)oss(at)gmail(dot)com" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, "m(dot)usama(at)gmail(dot)com" <m(dot)usama(at)gmail(dot)com>, "ikedamsh(at)oss(dot)nttdata(dot)com" <ikedamsh(at)oss(dot)nttdata(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "sulamul(at)gmail(dot)com" <sulamul(at)gmail(dot)com>, "alvherre(at)2ndquadrant(dot)com" <alvherre(at)2ndquadrant(dot)com>, "thomas(dot)munro(at)gmail(dot)com" <thomas(dot)munro(at)gmail(dot)com>, "ildar(at)adjust(dot)com" <ildar(at)adjust(dot)com>, "horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "chris(dot)travers(at)adjust(dot)com" <chris(dot)travers(at)adjust(dot)com>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, "ishii(at)sraoss(dot)co(dot)jp" <ishii(at)sraoss(dot)co(dot)jp>
Subject: Re: Transactions involving multiple postgres foreign servers, take 2
Date: 2021-02-05 05:45:40
Message-ID: CAD21AoAZeLnpUNnUabNPWe7edi_+wq0Q0O=mZ=OwgvOh8ug2dw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 2, 2021 at 5:18 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
>
>
>
> On 2021/01/27 14:08, Masahiko Sawada wrote:
> > On Wed, Jan 27, 2021 at 10:29 AM Fujii Masao
> > <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote:
> >>
> >>
> >> You fixed some issues. But maybe you forgot to attach the latest patches?
> >
> > Yes, I've attached the updated patches.
>
> Thanks for updating the patch! I tried to review 0001 and 0002 as the self-contained change.
>
> + * An FDW that implements both commit and rollback APIs can request to register
> + * the foreign transaction by FdwXactRegisterXact() to participate it to a
> + * group of distributed tranasction. The registered foreign transactions are
> + * identified by OIDs of server and user.
>
> I'm afraid that the combination of OIDs of server and user is not unique. IOW, more than one foreign transactions can have the same combination of OIDs of server and user. For example, the following two SELECT queries start the different foreign transactions but their user OID is the same. OID of user mapping should be used instead of OID of user?
>
> CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw;
> CREATE USER MAPPING FOR postgres SERVER loopback OPTIONS (user 'postgres');
> CREATE USER MAPPING FOR public SERVER loopback OPTIONS (user 'postgres');
> CREATE TABLE t(i int);
> CREATE FOREIGN TABLE ft(i int) SERVER loopback OPTIONS (table_name 't');
> BEGIN;
> SELECT * FROM ft;
> DROP USER MAPPING FOR postgres SERVER loopback ;
> SELECT * FROM ft;
> COMMIT;

Good catch. I've considered using user mapping OID or a pair of user
mapping OID and server OID as a key of foreign transactions but I
think it also has a problem if an FDW caches the connection by pair of
server OID and user OID whereas the core identifies them by user
mapping OID. For instance, mysql_fdw manages connections by pair of
server OID and user OID.

For example, let's consider the following execution:

BEGIN;
SET ROLE user_A;
INSERT INTO ft1 VALUES (1);
SET ROLE user_B;
INSERT INTO ft1 VALUES (1);
COMMIT;

Suppose that an FDW identifies the connections by {server OID, user
OID} and the core GTM identifies the transactions by user mapping OID,
and user_A and user_B use the public user mapping to connect server_X.
In the FDW, there are two connections identified by {user_A, sever_X}
and {user_B, server_X} respectively, and therefore opens two
transactions on each connection, while GTM has only one FdwXact entry
because the two connections refer to the same user mapping OID. As a
result, at the end of the transaction, GTM ends only one foreign
transaction, leaving another one.

Using user mapping OID seems natural to me but I'm concerned that
changing role in the middle of transaction is likely to happen than
dropping the public user mapping but not sure. We would need to find
more better way.

>
> + /* Commit foreign transactions if any */
> + AtEOXact_FdwXact(true);
>
> Don't we need to pass XACT_EVENT_PARALLEL_PRE_COMMIT or XACT_EVENT_PRE_COMMIT flag? Probably we don't need to do this if postgres_fdw is only user of this new API. But if we make this new API generic one, such flags seem necessary so that some foreign data wrappers might have different behaviors for those flags.
>
> Because of the same reason as above, AtEOXact_FdwXact() should also be called after CallXactCallbacks(is_parallel_worker ? XACT_EVENT_PARALLEL_COMMIT : XACT_EVENT_COMMIT)?

Agreed.

In AtEOXact_FdwXact() we call either CommitForeignTransaction() or
RollbackForeignTransaction() with FDWXACT_FLAG_ONEPHASE flag for each
foreign transaction. So for example in commit case, we will call new
FDW APIs in the following order:

1. Call CommitForeignTransaction() with XACT_EVENT_PARALLEL_PRE_COMMIT
flag and FDWXACT_FLAG_ONEPHASE flag for each foreign transaction.
2. Commit locally.
3. Call CommitForeignTransaction() with XACT_EVENT_PARALLEL_COMMIT
flag and FDWXACT_FLAG_ONEPHASE flag for each foreign transaction.

In the future when we have a new FDW API to prepare foreign
transaction, the sequence will be:

1. Call PrepareForeignTransaction() for each foreign transaction.
2. Call CommitForeignTransaction() with XACT_EVENT_PARALLEL_PRE_COMMIT
flag for each foreign transaction.
3. Commit locally.
4. Call CommitForeignTransaction() with XACT_EVENT_PARALLEL_COMMIT
flag for each foreign transaction.

So we expect FDW that wants to support 2PC not to commit foreign
transaction if CommitForeignTransaction() is called with
XACT_EVENT_PARALLEL_PRE_COMMIT flag and no FDWXACT_FLAG_ONEPHASE flag.

>
> + /*
> + * Abort foreign transactions if any. This needs to be done before marking
> + * this transaction as not running since FDW's transaction callbacks might
> + * assume this transaction is still in progress.
> + */
> + AtEOXact_FdwXact(false);
>
> Same as above.
>
> +/*
> + * This function is called at PREPARE TRANSACTION. Since we don't support
> + * preparing foreign transactions yet, raise an error if the local transaction
> + * has any foreign transaction.
> + */
> +void
> +AtPrepare_FdwXact(void)
> +{
> + if (FdwXactParticipants != NIL)
> + ereport(ERROR,
> + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("cannot PREPARE a transaction that has operated on foreign tables")));
> +}
>
> This means that some foreign data wrappers suppporting the prepare transaction (though I'm not sure if such wappers actually exist or not) cannot use the new API? If we want to allow those wrappers to use new API, AtPrepare_FdwXact() should call the prepare callback and each wrapper should emit an error within the callback if necessary.

I think if we support the prepare callback and allow FDWs to prepare
foreign transactions, we have to call CommitForeignTransaction() on
COMMIT PREPARED for foreign transactions that are associated with the
local prepared transaction. But how can we know which foreign
transactions are? Even a client who didn’t do PREPARE TRANSACTION
could do COMMIT PREPARED. We would need to store the information of
which foreign transactions are associated with the local transaction
somewhere. The 0004 patch introduces WAL logging along with prepare
API and we store that information to a WAL record. I think it’s better
at this time to disallow PREPARE TRANSACTION when at least one foreign
transaction is registered via FDW API.

>
> + foreach(lc, FdwXactParticipants)
> + {
> + FdwXactParticipant *fdw_part = (FdwXactParticipant *) lfirst(lc);
> +
> + if (fdw_part->server->serverid == serverid &&
> + fdw_part->usermapping->userid == userid)
>
> Isn't this ineffecient when starting lots of foreign transactions because we need to scan all the entries in the list every time?

Agreed. I'll change it to a hash map.

>
> +static ConnCacheEntry *
> +GetConnectionCacheEntry(Oid umid)
> +{
> + bool found;
> + ConnCacheEntry *entry;
> + ConnCacheKey key;
> +
> + /* First time through, initialize connection cache hashtable */
> + if (ConnectionHash == NULL)
> + {
> + HASHCTL ctl;
> +
> + ctl.keysize = sizeof(ConnCacheKey);
> + ctl.entrysize = sizeof(ConnCacheEntry);
> + ConnectionHash = hash_create("postgres_fdw connections", 8,
> + &ctl,
> + HASH_ELEM | HASH_BLOBS);
>
> Currently ConnectionHash is created under TopMemoryContext. With the patch, since GetConnectionCacheEntry() can be called in other places, ConnectionHash may be created under the memory context other than TopMemoryContext? If so, that's safe?

hash_create() creates a hash map under TopMemoryContext unless
HASH_CONTEXT is specified. So I think ConnectionHash is still created
in the same memory context.

>
> - if (PQstatus(entry->conn) != CONNECTION_OK ||
> - PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
> - entry->changing_xact_state ||
> - entry->invalidated)
> ...
> + if (PQstatus(entry->conn) != CONNECTION_OK ||
> + PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
> + entry->changing_xact_state)
>
> Why did you get rid of the condition "entry->invalidated"?

My bad. I'll fix it.

> >
> > If we want to perform some operations at the end of the top
> > transaction per FDW, not per foreign transaction, we will either still
> > need to use XactCallback or need to rethink the FDW API design. But
> > given that we call commit and rollback FDW API for only foreign
> > servers that actually started a transaction, I’m not sure if there are
> > such operations in practice. IIUC there is not at least from the
> > normal (not-sub) transaction termination perspective.
>
> One feature in my mind that may not match with this new API is to perform transaction commits on multiple servers in parallel. That's something like the following. As far as I can recall, another proposed version of 2pc on postgres_fdw patch included that feature. If we want to implement this to increase the performance of transaction commit in the future, I'm afraid that new API will prevent that.
>
> foreach(foreign transactions)
> send commit command
>
> foreach(foreign transactions)
> wait for reply of commit

What I'm thinking is to pass a flag, say FDWXACT_ASYNC, to
Commit/RollbackForeignTransaction() and add a new API to wait for the
operation to complete, say CompleteForeignTransaction(). If
commit/rollback callback in an FDW is called with FDWXACT_ASYNC flag,
it should send the command and immediately return the handler (e.g.,
PQsocket() in postgres_fdw). The GTM gathers the handlers and poll
events on them. To complete the command, the GTM calls
CompleteForeignTransaction() to wait for the command to complete.
Please refer to XA specification for details (especially xa_complete()
and TMASYNC flag). A pseudo-code is something like the followings:

foreach (foreign transactions)
call CommitForeignTransaction(FDWXACT_ASYNC);
append the returned fd to the array.

while (true)
{
poll event on fds;
call CompleteForeignTransaction() for fd owner;
if (success)
remove fd from the array;

if (array is empty)
break;
}

>
> On second thought, new per-transaction commit/rollback callback is essential when users or the resolver process want to resolve the specifed foreign transaction, but not essential when backends commit/rollback foreign transactions. That is, even if we add per-transaction new API for users and resolver process, backends can still use CallXactCallbacks() when they commit/rollback foreign transactions. Is this understanding right?

I haven’t tried that but I think that's possible if we can know
commit/rollback callback (e.g., postgresCommitForeignTransaction() etc
in postgres_fdw) is called via SQL function (pg_resolve_foreign_xact()
SQL function) or called by the resolver process. That is, we register
foreign transaction via FdwXactRegisterXact(), don’t do nothing in
postgresCommit/RollbackForeignTransaction() if these are called by the
backend, and perform COMMIT/ROLLBACK in pgfdw_xact_callback() in
asynchronous manner. On the other hand, if
postgresCommit/RollbackForeignTransaction() is called via SQL function
or by the resolver these functions commit/rollback the transaction.

>
> > Regarding cursor_number, it essentially needs to be unique at least
> > within a transaction so we can manage it per transaction or per
> > connection. But the current postgres_fdw rather ensure uniqueness
> > across all connections. So it seems to me that this can be fixed by
> > making individual connection have cursor_number and resetting it in
> > pgfdw_cleanup_after_transaction(). I think this can be in a separate
> > patch.
>
> Maybe, so let's work on this later, at least after we confirm that
> this change is really necessary.

Okay.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Nancarrow 2021-02-05 05:54:50 Re: Parallel INSERT (INTO ... SELECT ...)
Previous Message Dilip Kumar 2021-02-05 05:31:54 Re: [HACKERS] Custom compression methods