Re: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Cc: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Muhammad Usama <m(dot)usama(at)gmail(dot)com>, amul sul <sulamul(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ildar Musin <ildar(at)adjust(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Chris Travers <chris(dot)travers(at)adjust(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Subject: Re: Transactions involving multiple postgres foreign servers, take 2
Date: 2020-07-17 11:04:44
Message-ID: CA+fd4k7VD7ueNQJ5UBcH4B3YqCm4GeLkJDat2gXb=NYrpKJVkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 17 Jul 2020 at 14:22, tsunakawa(dot)takay(at)fujitsu(dot)com
<tsunakawa(dot)takay(at)fujitsu(dot)com> wrote:
>
> From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
> I have briefly checked the only oracle_fdw but in general I think that
> > if an existing FDW supports transaction begin, commit, and rollback,
> > these can be ported to new FDW transaction APIs easily.
>
> Does oracle_fdw support begin, commit and rollback?
>
> And most importantly, do other major DBMSs, including Oracle, provide the API for preparing a transaction? In other words, will the FDWs other than postgres_fdw really be able to take advantage of the new FDW functions to join the 2PC processing? I think we need to confirm that there are concrete examples.

I also believe they do. But I'm concerned that some FDW needs to start
a transaction differently when using 2PC. For instance, IIUC MySQL
also supports 2PC but the transaction needs to be started with "XA
START id” when the transaction needs to be prepared. The transaction
started with XA START can be closed by XA END followed by XA PREPARE
or XA COMMIT ONE PHASE. It means that when starts a new transaction
the transaction needs to prepare the transaction identifier and to
know that 2PC might be used. It’s quite different from PostgreSQL. In
PostgreSQL, we can start a transaction by BEGIN and end it by PREPARE
TRANSACTION, COMMIT, or ROLLBACK. The transaction identifier is
required when PREPARE TRANSACTION.

With MySQL, I guess FDW needs a way to tell the (next) transaction
needs to be started with XA START so it can be prepared. It could be a
custom GUC or an SQL function. Then when starts a new transaction on
MySQL server, FDW can generate and store a transaction identifier into
somewhere alongside the connection. At the prepare phase, it passes
the transaction identifier via GetPrepareId() API to the core.

I haven’t tested the above yet and it’s just a desk plan. it's
definitely a good idea to try integrating this 2PC feature to FDWs
other than postgres_fdw to see if design and interfaces are
implemented sophisticatedly.

>
> What I'm worried is that if only postgres_fdw can implement the prepare function, it's a sign that FDW interface will be riddled with functions only for Postgres. That is, the FDW interface is getting away from its original purpose "access external data as a relation" and complex. Tomas Vondra showed this concern as follows:
>
> Horizontal scalability/sharding
> https://www.postgresql.org/message-id/flat/CANP8%2BjK%3D%2B3zVYDFY0oMAQKQVJ%2BqReDHr1UPdyFEELO82yVfb9A%40mail.gmail.com#2c45f0ee97855449f1f7fedcef1d5e11
>
>
> [Tomas Vondra's remarks]
> --------------------------------------------------
> > This strikes me as a bit of a conflict of interest with FDW which
> > seems to want to hide the fact that it's foreign; the FDW
> > implementation makes it's own optimization decisions which might
> > make sense for single table queries but breaks down in the face of
> > joins.
>
> +1 to these concerns
>
> In my mind, FDW is a wonderful tool to integrate PostgreSQL with
> external data sources, and it's nicely shaped for this purpose, which
> implies the abstractions and assumptions in the code.
>
> The truth however is that many current uses of the FDW API are actually
> using it for different purposes because there's no other way to do that,
> not because FDWs are the "right way". And this includes the attempts to
> build sharding on FDW, I think.
>
> Situations like this result in "improvements" of the API that seem to
> improve the API for the second group, but make the life harder for the
> original FDW API audience by making the API needlessly complex. And I
> say "seem to improve" because the second group eventually runs into the
> fundamental abstractions and assumptions the API is based on anyway.
>
> And based on the discussions at pgcon, I think this is the main reason
> why people cringe when they hear "FDW" and "sharding" in the same sentence.
>
> ...
> My other worry is that we'll eventually mess the FDW infrastructure,
> making it harder to use for the original purpose. Granted, most of the
> improvements proposed so far look sane and useful for FDWs in general,
> but sooner or later that ceases to be the case - there sill be changes
> needed merely for the sharding. Those will be tough decisions.
> --------------------------------------------------
>
>
> > Regarding the comparison between FDW transaction APIs and transaction
> > callbacks, I think one of the benefits of providing FDW transaction
> > APIs is that the core is able to manage the status of foreign
> > transactions. We need to track the status of individual foreign
> > transactions to support atomic commit. If we use transaction callbacks
> > (XactCallback) that many FDWs are using, I think we will end up
> > calling the transaction callback and leave the transaction work to
> > FDWs, leading that the core is not able to know the return values of
> > PREPARE TRANSACTION for example. We can add more arguments passed to
> > transaction callbacks to get the return value from FDWs but I don’t
> > think it’s a good idea as transaction callbacks are used not only by
> > FDW but also other external modules.
>
> To track the foreign transaction status, we can add GetTransactionStatus() to the FDW interface as an alternative, can't we?

I haven't thought such an interface but it sounds like the transaction
status is managed on both the core and FDWs. Could you elaborate on
that?

>
>
> > With the current version patch (v23), it supports only
> > INSERT/DELETE/UPDATE. But I'm going to change the patch so that it
> > supports other writes SQLs as Fujii-san also pointed out.
>
> OK. I've just read that Fujii san already pointed out a similar thing. But I wonder if we can know that the UDF executed on the foreign server has updated data. Maybe we can know or guess it by calling txid_current_if_any() or checking the transaction status in FE/BE protocol, but can we deal with other FDWs other than postgres_fdw?

Ah, my answer was not enough. It was only about tracking local writes.

Regarding tracking of writes on the foreign server, I think there are
restrictions. Currently, the executor registers a foreign sever as a
participant of 2PC before calling BeginForeignInsert(),
BeginForeignModify(), and BeginForeignScan() etc with a flag
indicating whether writes is going to happen on the foreign server. So
even if an UDF in a SELECT statement that could update data were to be
pushed down to the foreign server, the foreign server would be marked
as *not* modified. I’ve not tested yet but I guess that since FDW also
is allowed to register the foreign server along with that flag anytime
before commit, FDW is able to forcibly change that flag if it knows
the SELECT query is going to modify the data on the remote server.

>
>
> > No, in the current design, the backend who received a query from the
> > client does PREPARE, and then the transaction resolver process, a
> > background worker, does COMMIT PREPARED.
>
> This "No" means the current implementation cannot group commits from multiple transactions?

Yes.

> Does the transaction resolver send COMMIT PREPARED and waits for its response for each transaction one by one? For example,
>
> [local server]
> Transaction T1 and T2 performs 2PC at the same time.
> Transaction resolver sends COMMIT PREPARED for T1 and then waits for the response.
> T1 writes COMMIT PREPARED record locally and sync the WAL.
> Transaction resolver sends COMMIT PREPARED for T2 and then waits for the response.
> T2 writes COMMIT PREPARED record locally and sync the WAL.
>
> [foreign server]
> T1 writes COMMIT PREPARED record locally and sync the WAL.
> T2 writes COMMIT PREPARED record locally and sync the WAL.

Just to be clear, the transaction resolver writes FDWXACT_REMOVE
records instead of COMMIT PREPARED record to remove foreign
transaction entry. But, yes, the transaction resolver works like the
above you explained.

> If the WAL records of multiple concurrent transactions are written and synced separately, i.e. group commit doesn't take effect, then the OLTP transaction performance will be unacceptable.

I agree that it'll be a large performance penalty. I'd like to have it
but I’m not sure we should have it in the first version from the
perspective of complexity. Since the procedure of 2PC is originally
high cost, in my opinion, the user should not use as much as possible
in terms of performance. Especially in OLTP, its cost will directly
affect the latency. I’d suggest designing database schema so
transaction touches only one foreign server but do you have concrete
OLTP usecase where normally requires 2PC, and how many servers
involved within a distributed transaction?

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2020-07-17 11:24:07 Re: max_slot_wal_keep_size and wal_keep_segments
Previous Message Dagfinn Ilmari Mannsåker 2020-07-17 10:58:41 Re: renaming configure.in to configure.ac