RE: Transactions involving multiple postgres foreign servers, take 2

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: 'Masahiko Sawada' <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Muhammad Usama <m(dot)usama(at)gmail(dot)com>, amul sul <sulamul(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ildar Musin <ildar(at)adjust(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Chris Travers <chris(dot)travers(at)adjust(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Subject: RE: Transactions involving multiple postgres foreign servers, take 2
Date: 2020-07-20 07:57:49
Message-ID: TYAPR01MB2990FAD79C5414740C507A58FE7B0@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
> I also believe they do. But I'm concerned that some FDW needs to start
> a transaction differently when using 2PC. For instance, IIUC MySQL
> also supports 2PC but the transaction needs to be started with "XA
> START id” when the transaction needs to be prepared. The transaction
> started with XA START can be closed by XA END followed by XA PREPARE
> or XA COMMIT ONE PHASE. It means that when starts a new transaction
> the transaction needs to prepare the transaction identifier and to
> know that 2PC might be used. It’s quite different from PostgreSQL. In
> PostgreSQL, we can start a transaction by BEGIN and end it by PREPARE
> TRANSACTION, COMMIT, or ROLLBACK. The transaction identifier is
> required when PREPARE TRANSACTION.

I guess Postgres is rather a minority in this regard. All I know is XA and its Java counterpart (Java Transaction API: JTA). In XA, the connection needs to be associated with an XID before its transaction work is performed.
If some transaction work is already done before associating with XID, xa_start() returns an error like this:

[XA specification]
--------------------------------------------------
[XAER_OUTSIDE]
The resource manager is doing work outside any global transaction on behalf of
the application.
--------------------------------------------------

[Java Transaction API (JTA)]
--------------------------------------------------
void start(Xid xid, int flags) throws XAException

This method starts work on behalf of a transaction branch.
...

3.4.7 Local and Global Transactions
The resource adapter is encouraged to support the usage of both local and global
transactions within the same transactional connection. Local transactions are
transactions that are started and coordinated by the resource manager internally. The
XAResource interface is not used for local transactions.

When using the same connection to perform both local and global transactions, the
following rules apply:

. The local transaction must be committed (or rolled back) before starting a
global transaction in the connection.
. The global transaction must be disassociated from the connection before any
local transaction is started.
--------------------------------------------------

(FWIW, jdbc_fdw would expect to use JTA for this FDW 2PC?)

> I haven’t tested the above yet and it’s just a desk plan. it's
> definitely a good idea to try integrating this 2PC feature to FDWs
> other than postgres_fdw to see if design and interfaces are
> implemented sophisticatedly.

Yes, if we address this 2PC feature as an FDW enhancement, we need to make sure that at least some well-known DBMSs should be able to implement the new interface. The following part may help devise the interface:

[References from XA specification]
--------------------------------------------------
The primary use of xa_start() is to register a new transaction branch with the RM.
This marks the start of the branch. Subsequently, the AP, using the same thread of
control, uses the RM’s native interface to do useful work. All requests for service
made by the same thread are part of the same branch until the thread dissociates
from the branch (see below).

3.3.1 Registration of Resource Managers
Normally, a TM involves all associated RMs in a transaction branch. (The TM’s set of
RM switches, described in Section 4.3 on page 21 tells the TM which RMs are
associated with it.) The TM calls all these RMs with xa_start(), xa_end(), and
xa_prepare (), although an RM that is not active in a branch need not participate further
(see Section 2.3.2 on page 8). A technique to reduce overhead for infrequently-used
RMs is discussed below.

Dynamic Registration

Certain RMs, especially those involved in relatively few global transactions, may ask
the TM to assume they are not involved in a transaction. These RMs must register with
the TM before they do application work, to see whether the work is part of a global
transaction. The TM never calls these RMs with any form of xa_start(). An RM
declares dynamic registration in its switch (see Section 4.3 on page 21). An RM can
make this declaration only on its own behalf, and doing so does not change the TM’s
behaviour with respect to other RMs.

When an AP requests work from such an RM, before doing any work, the RM contacts
the TM by calling ax_reg(). The RM must call ax_reg() from the same thread of control
that the AP would use if it called ax_reg() directly. The TM returns to the RM the
appropriate XID if the AP is in a global transaction.

The implications of dynamically registering are as follows: when a thread of control
begins working on behalf of a transaction branch, the transaction manager calls
xa_start() for all resource managers known to the thread except those having
TMREGISTER set in their xa_switch_t structure. Thus, those resource managers with
this flag set must explicitly join a branch with ax_reg(). Secondly, when a thread of
control is working on behalf of a branch, a transaction manager calls xa_end() for all
resource managers known to the thread that either do not have TMREGISTER set in
their xa_switch_t structure or have dynamically registered with ax_reg().

int
xa_start(XID *xid, int rmid, long flags)

DESCRIPTION
A transaction manager calls xa_start() to inform a resource manager that an application
may do work on behalf of a transaction branch.
...
A transaction manager calls xa_start() only for those resource managers that do not
have TMREGISTER set in the flags element of their xa_switch_t structure. Resource
managers with TMREGISTER set must use ax_reg() to join a transaction branch (see
ax_reg() for details).
--------------------------------------------------

> > To track the foreign transaction status, we can add GetTransactionStatus() to
> the FDW interface as an alternative, can't we?
>
> I haven't thought such an interface but it sounds like the transaction
> status is managed on both the core and FDWs. Could you elaborate on
> that?

I don't have such deep analysis. I just thought that the core could keep track of the local transaction status, and ask each participant FDW about its transaction status to determine an action.

> > If the WAL records of multiple concurrent transactions are written and
> synced separately, i.e. group commit doesn't take effect, then the OLTP
> transaction performance will be unacceptable.
>
> I agree that it'll be a large performance penalty. I'd like to have it
> but I’m not sure we should have it in the first version from the
> perspective of complexity.

I think at least we should have a rough image of how we can reach the goal. Otherwise, the current design/implementation may have to be overhauled with great efforts in the near future. Apart from that, I feel it's unnatural that the commit processing is serialized at the transaction resolver while the DML processing of multiple foreign transactions can be performed in parallel.

> Since the procedure of 2PC is originally
> high cost, in my opinion, the user should not use as much as possible
> in terms of performance. Especially in OLTP, its cost will directly
> affect the latency. I’d suggest designing database schema so
> transaction touches only one foreign server but do you have concrete
> OLTP usecase where normally requires 2PC, and how many servers
> involved within a distributed transaction?

I can't share the details, but some of our customers show interest in Postgres scale-out or FDW 2PC for the following use cases:

* Multitenant OLTP where the data specific to one tenant is stored on one database server. On the other hand, some data are shared among all tenants, and they are stored on a separate server. The shared data and the tenant-specific data is updated in the same transaction (I don't know the frequency of such transactions.)

* An IoT use case where each edge database server monitors and tracks the movement of objects in one area. Those edge database servers store the records of objects they manage. When an object gets out of one area and moves to another, the record for the object is moved between the two edge database servers using an atomic distributed transaction.

(I wonder if TPC-C or TPC-E needs distributed transaction...)

Regards
Takayuki Tsunakawa

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2020-07-20 08:28:21 Re: Transactions involving multiple postgres foreign servers, take 2
Previous Message Dilip Kumar 2020-07-20 06:31:36 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions