RE: Transactions involving multiple postgres foreign servers, take 2

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: 'Masahiko Sawada' <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Muhammad Usama <m(dot)usama(at)gmail(dot)com>, Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, amul sul <sulamul(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ildar Musin <ildar(at)adjust(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Chris Travers <chris(dot)travers(at)adjust(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Subject: RE: Transactions involving multiple postgres foreign servers, take 2
Date: 2020-09-16 05:52:49
Message-ID: TYAPR01MB299068D29020EBCC6EF4065AFE210@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
> The resolver process has two functionalities: resolving foreign
> transactions automatically when the user issues COMMIT (the case you
> described in the second paragraph), and resolving foreign transaction
> when the corresponding backend no longer exist or when the server
> crashes during in the middle of 2PC (described in the third
> paragraph).
>
> Considering the design without the resolver process, I think we can
> easily replace the latter with the manual resolution. OTOH, it's not
> easy for the former. I have no idea about better design for now,
> although, as you described, if we could ensure that the process
> doesn't raise an error during resolving foreign transactions after
> committing the local transaction we would not need the resolver
> process.

Yeah, the resolver background process -- someone independent of client sessions -- is necessary, because the client session disappears sometime. When the server that hosts the 2PC coordinator crashes, there are no client sessions. Our DBMS Symfoware also runs background threads that take care of resolution of in-doubt transactions due to a server or network failure.

Then, how does the resolver get involved in 2PC to enable parallel 2PC? Two ideas quickly come to mind:

(1) Each client backend issues prepare and commit to multiple remote nodes asynchronously.
If the communication fails during commit, the client backend leaves the commit notification task to the resolver.
That is, the resolver lends a hand during failure recovery, and doesn't interfere with the transaction processing during normal operation.

(2) The resolver takes some responsibility in 2PC processing during normal operation.
(send prepare and/or commit to remote nodes and get the results.)
To avoid serial execution per transaction, the resolver bundles multiple requests, send them in bulk, and wait for multiple replies at once.
This allows the coordinator to do its own prepare processing in parallel with those of participants.
However, in Postgres, this requires context switches between the client backend and the resolver.

Our Symfoware takes (2). However, it doesn't suffer from the context switch, because the server is multi-threaded and further implements or uses more lightweight entities than the thread.

> Or the second idea would be that the backend commits only the local
> transaction then returns the acknowledgment of COMMIT to the user
> without resolving foreign transactions. Then the user manually
> resolves the foreign transactions by, for example, using the SQL
> function pg_resolve_foreign_xact() within a separate transaction. That
> way, even if an error occurred during resolving foreign transactions
> (i.g., executing COMMIT PREPARED), it’s okay as the user is already
> aware of the local transaction having been committed and can retry to
> resolve the unresolved foreign transaction. So we won't need the
> resolver process while avoiding such inconsistency.
>
> But a drawback would be that the transaction commit doesn't ensure
> that all foreign transactions are completed. The subsequent
> transactions would need to check if the previous distributed
> transaction is completed to see its results. I’m not sure it’s a good
> design in terms of usability.

I don't think it's a good design as you are worried. I guess that's why Postgres-XL had to create a tool called pgxc_clean and ask the user to resolve transactions with it.

pgxc_clean
https://www.postgres-xl.org/documentation/pgxcclean.html

"pgxc_clean is a Postgres-XL utility to maintain transaction status after a crash. When a Postgres-XL node crashes and recovers or fails over, the commit status of the node may be inconsistent with other nodes. pgxc_clean checks transaction commit status and corrects them."

Regards
Takayuki Tsunakawa

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-09-16 05:53:16 Re: Subscription test 013_partition.pl fails under CLOBBER_CACHE_ALWAYS
Previous Message Ashutosh Sharma 2020-09-16 05:48:46 Re: recovering from "found xmin ... from before relfrozenxid ..."