RE: Transactions involving multiple postgres foreign servers, take 2

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: 'Masahiko Sawada' <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Muhammad Usama <m(dot)usama(at)gmail(dot)com>, amul sul <sulamul(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ildar Musin <ildar(at)adjust(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Chris Travers <chris(dot)travers(at)adjust(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Subject: RE: Transactions involving multiple postgres foreign servers, take 2
Date: 2020-07-17 05:21:05
Message-ID: TYAPR01MB2990D4ECA2CEC44E7D43E5A8FE7C0@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
I have briefly checked the only oracle_fdw but in general I think that
> if an existing FDW supports transaction begin, commit, and rollback,
> these can be ported to new FDW transaction APIs easily.

Does oracle_fdw support begin, commit and rollback?

And most importantly, do other major DBMSs, including Oracle, provide the API for preparing a transaction? In other words, will the FDWs other than postgres_fdw really be able to take advantage of the new FDW functions to join the 2PC processing? I think we need to confirm that there are concrete examples.

What I'm worried is that if only postgres_fdw can implement the prepare function, it's a sign that FDW interface will be riddled with functions only for Postgres. That is, the FDW interface is getting away from its original purpose "access external data as a relation" and complex. Tomas Vondra showed this concern as follows:

Horizontal scalability/sharding
https://www.postgresql.org/message-id/flat/CANP8%2BjK%3D%2B3zVYDFY0oMAQKQVJ%2BqReDHr1UPdyFEELO82yVfb9A%40mail.gmail.com#2c45f0ee97855449f1f7fedcef1d5e11

[Tomas Vondra's remarks]
--------------------------------------------------
> This strikes me as a bit of a conflict of interest with FDW which
> seems to want to hide the fact that it's foreign; the FDW
> implementation makes it's own optimization decisions which might
> make sense for single table queries but breaks down in the face of
> joins.

+1 to these concerns

In my mind, FDW is a wonderful tool to integrate PostgreSQL with
external data sources, and it's nicely shaped for this purpose, which
implies the abstractions and assumptions in the code.

The truth however is that many current uses of the FDW API are actually
using it for different purposes because there's no other way to do that,
not because FDWs are the "right way". And this includes the attempts to
build sharding on FDW, I think.

Situations like this result in "improvements" of the API that seem to
improve the API for the second group, but make the life harder for the
original FDW API audience by making the API needlessly complex. And I
say "seem to improve" because the second group eventually runs into the
fundamental abstractions and assumptions the API is based on anyway.

And based on the discussions at pgcon, I think this is the main reason
why people cringe when they hear "FDW" and "sharding" in the same sentence.

...
My other worry is that we'll eventually mess the FDW infrastructure,
making it harder to use for the original purpose. Granted, most of the
improvements proposed so far look sane and useful for FDWs in general,
but sooner or later that ceases to be the case - there sill be changes
needed merely for the sharding. Those will be tough decisions.
--------------------------------------------------

> Regarding the comparison between FDW transaction APIs and transaction
> callbacks, I think one of the benefits of providing FDW transaction
> APIs is that the core is able to manage the status of foreign
> transactions. We need to track the status of individual foreign
> transactions to support atomic commit. If we use transaction callbacks
> (XactCallback) that many FDWs are using, I think we will end up
> calling the transaction callback and leave the transaction work to
> FDWs, leading that the core is not able to know the return values of
> PREPARE TRANSACTION for example. We can add more arguments passed to
> transaction callbacks to get the return value from FDWs but I don’t
> think it’s a good idea as transaction callbacks are used not only by
> FDW but also other external modules.

To track the foreign transaction status, we can add GetTransactionStatus() to the FDW interface as an alternative, can't we?

> With the current version patch (v23), it supports only
> INSERT/DELETE/UPDATE. But I'm going to change the patch so that it
> supports other writes SQLs as Fujii-san also pointed out.

OK. I've just read that Fujii san already pointed out a similar thing. But I wonder if we can know that the UDF executed on the foreign server has updated data. Maybe we can know or guess it by calling txid_current_if_any() or checking the transaction status in FE/BE protocol, but can we deal with other FDWs other than postgres_fdw?

> No, in the current design, the backend who received a query from the
> client does PREPARE, and then the transaction resolver process, a
> background worker, does COMMIT PREPARED.

This "No" means the current implementation cannot group commits from multiple transactions?
Does the transaction resolver send COMMIT PREPARED and waits for its response for each transaction one by one? For example,

[local server]
Transaction T1 and T2 performs 2PC at the same time.
Transaction resolver sends COMMIT PREPARED for T1 and then waits for the response.
T1 writes COMMIT PREPARED record locally and sync the WAL.
Transaction resolver sends COMMIT PREPARED for T2 and then waits for the response.
T2 writes COMMIT PREPARED record locally and sync the WAL.

[foreign server]
T1 writes COMMIT PREPARED record locally and sync the WAL.
T2 writes COMMIT PREPARED record locally and sync the WAL.

If the WAL records of multiple concurrent transactions are written and synced separately, i.e. group commit doesn't take effect, then the OLTP transaction performance will be unacceptable.

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2020-07-17 05:34:54 Re: expose parallel leader in CSV and log_line_prefix
Previous Message Takashi Menjo 2020-07-17 05:14:44 Re: Remove page-read callback from XLogReaderState.