Re: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Muhammad Usama <m(dot)usama(at)gmail(dot)com>, Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, amul sul <sulamul(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ildar Musin <ildar(at)adjust(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Chris Travers <chris(dot)travers(at)adjust(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Subject: Re: Transactions involving multiple postgres foreign servers, take 2
Date: 2020-09-29 06:03:12
Message-ID: CA+fd4k79eD4NV4Lrw10h1UN6WUWbhwC25fyMnSSxE+wQdGwwKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 29 Sep 2020 at 11:37, tsunakawa(dot)takay(at)fujitsu(dot)com
<tsunakawa(dot)takay(at)fujitsu(dot)com> wrote:
>
> From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
> > No. Please imagine a case where a user executes PREPARE TRANSACTION on
> > the transaction that modified data on foreign servers. The backend
> > process prepares both the local transaction and foreign transactions.
> > But another client can execute COMMIT PREPARED on the prepared
> > transaction. In this case, another backend newly connects foreign
> > servers and commits prepared foreign transactions. Therefore, the new
> > connection cache entry can be created during COMMIT PREPARED which
> > could lead to an error but since the local prepared transaction is
> > already committed the backend must not fail with an error.
> >
> > In the latter case, I’m assumed that the backend continues to retry
> > foreign transaction resolution until the user requests cancellation.
> > Please imagine the case where the server-A connects a foreign server
> > (say, server-B) and server-B connects another foreign server (say,
> > server-C). The transaction initiated on server-A modified the data on
> > both local and server-B which further modified the data on server-C
> > and executed COMMIT. The backend process on server-A (say, backend-A)
> > sends PREPARE TRANSACTION to server-B then the backend process on
> > server-B (say, backend-B) connected by backend-A prepares the local
> > transaction and further sends PREPARE TRANSACTION to server-C. Let’s
> > suppose a temporary connection failure happens between server-A and
> > server-B before the backend-A sending COMMIT PREPARED (i.g, 2nd phase
> > of 2PC). When the backend-A attempts to sends COMMIT PREPARED to
> > server-B it realizes that the connection to server-B was lost but
> > since the user doesn’t request cancellatino yet the backend-A retries
> > to connect server-B and suceeds. Since now that the backend-A
> > established a new connection to server-B, there is another backend
> > process on server-B (say, backend-B’). Since the backend-B’ doen’t
> > have a connection to server-C yet, it creates new connection cache
> > entry, which could lead to an error. IOW, on server-B different
> > processes performed PREPARE TRANSACTION and COMMIT PREPARED and
> > the
> > later process created a connection cache entry.
>
> Thank you, I understood the situation. I don't think it's a good design to not address practical performance during normal operation by fearing the rare error case.
>
> The transaction manager (TM) or the FDW implementor can naturally do things like the following:
>
> * Use palloc_extended(MCXT_ALLOC_NO_OOM) and hash_search(HASH_ENTER_NULL) to return control to the caller.
>
> * Use PG_TRY(), as its overhead is relatively negligible to connection establishment.

I suppose you mean that the FDW implementor uses PG_TRY() to catch an
error but not do PG_RE_THROW(). I'm concerned that it's safe to return
the control to the caller and continue trying to resolve foreign
transactions without neither rethrowing an error nor transaction
abort.

IMHO, it's rather a bad design something like "high performance but
doesn't work fine in a rare failure case", especially for the
transaction management feature.

>
> * If the commit fails, the TM asks the resolver to take care of committing the remote transaction, and returns success to the user.
>
>
> > Regarding parallel and asynchronous execution, I basically agree on
> > supporting asynchronous execution as the XA specification also has,
> > although I think it's better not to include it in the first version
> > for simplisity.
> >
> > Overall, my suggestion for the first version is to support synchronous
> > execution of prepare, commit, and rollback, have one resolver process
> > per database, and have resolver take 2nd phase of 2PC. As the next
> > step we can add APIs for asynchronous execution, have multiple
> > resolvers on one database and so on.
>
> We don't have to rush to commit a patch that is likely to exhibit non-practical performance, as we still have much time left for PG 14. The design needs to be more thought for the ideal goal and refined. By making efforts to sort through the ideal design, we may be able to avoid rework and API inconsistency. As for the API, we haven't validated yet that the FDW implementor can use XA, have we?

Yes, we still need to check if FDW implementor other than postgres_fdw
is able to support these APIs. I agree that we need more discussion on
the design. My suggestion is to start a small, simple feature as the
first step and not try to include everything in the first version.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-09-29 06:11:42 Assertion failure with barriers in parallel hash join
Previous Message Bharath Rupireddy 2020-09-29 05:59:45 Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit